Fraud & AML - features for machine learning

Hi Dan,

I am really interested in extracting features from graph data for machine learning model. Could you please explain more about the 118 features extracted (e.g. pagerank, betweness, closeness, etc.) and slide 12 on the Detecting Fraud and Money Laundering in Real-Time with a Graph Database Part 2 presentation?.

Thank you very much in advance!

Karla

Hi KT,

Slide 15 of “Graph Gurus 3 Anti-Fraud and AML Part 1” actually provide some examples on how these features are composited. Basically we extract the topology info from the relationship among phones and use that as our basic features, which finally used to build 118 features for phone fraud. However, as you mentioned, the features could also comes from another algorithm such as the result from a pagerank, or community detection etc. These collected features for phone can be stored as attributes on the phone vertex and GQuery support update vertex/edge attributes inside query. Therefore, you have multiple ways to build your own features, either it can be from graph relationship or from the results of previous running queries. If you are interested in the examples of how these feature collection is done in GQuery you can use the following sample code: https://github.com/tigergraph/ecosys/blob/master/guru_scripts/fraud_detection_demo/FeatureCollection.gsql, which collects three features from the graph.

Slide 12 of " Detecting Fraud and Money Laundering in Real-Time with a Graph Database Part 2" is actually talking about how can you use the Gquery to write queries to search for specific graph patterns on your data. For example, “Detection of accounts with loop money transactions” is one such complex query that you can use GQuery to build. It is not a simple loop detection query, but it has special filter conditions on how the path of the loop. In the following part of that webinar, I actually go through the Gquery code to show you how can this is done in GQuery language. You can refer the code in the following link: https://github.com/tigergraph/ecosys/blob/master/guru_scripts/loop_detection_demo/queries/circleDetection.gsql

The data and graph schema is also in the above link.

If you have any other further questions, please let me know.

Best Wishes,

Dan