Pairwise Comparison via SELECT ... ACCUM rather than FOREACH

luyilun32661 · August 31, 2021, 7:54am

Hello TG Team,

One of my use case is determining which insurance claims belong to the same accident.

A car accident will likely correspond to multiple insurance claims since each party has his/her own insurance policy. In my data, there only exists claims but not accidents. Hence, I want to generate car accident data from insurance claim data, via GSQL.

My graph has the following schema:
Car - (ASSOCIATE_WITH) -> Claim
Claim vertex has attribute accident_time.

So initially I have claims associated with cars as such:

claim1: {car1, car2, car3}
claim2: {car2, car3}
claim3: {car1, car3}
claim4: {car4, car5}
claim5: {car5, car6}

I used the following query to perform pairwise comparison between claims, and store the set of claims which should belong to the same accident in a vertex attached set accumulator. The query checked the time gap between two claims and whether two claims have common associated cars.

SetAccum<VERTEX<Claim>> @claims;

 _t1 = SELECT c1 FROM Claim:c1 - (<ASSOCIATE_WITH.ASSOCIATE_WITH>) - Claim:c2
            WHERE datetime_diff(c1.accident_time, c2.accident_time) <= n_days*24*60*60
               AND count(c1.neighbors("REV_ASSOCIATE_WITH") INTERSECT c2.neighbors("REV_ASSOCIATE_WITH")) > 0
               AND c1 != c2
           ACCUM c1.@claims += c2
           POST-ACCUM c1.@claims += c1;

Printing out @claims for each claim, you would have:

claim1: {claim1, claim2, claim3}
claim2: {claim1, claim2, claim3}
**claim3: {claim1, claim2, claim3} **
claim4: {claim4, claim5}
claim5: {claim4, claim5}

Next I need to perform pairwise comparison between claims again. I will compare the @claims of the two claims. If the @claims are the same, then the two claims belong to the same accident with a generated unique accident id.

The expected output is:
accident_id 1: {claim1, claim2, claim3}
accident_id 2: {claim4, claim5}

I first tried to implement it using SELECT…ACCUM… , but failed since the value of the accumulator cannot accessed during accumulation. Hence, I had to stick to for loop, but it is too slow for my use case, since I have several millions of insurance claims to compare. So is there a way to solve my problem using the SELECT … ACCUM clause with which I can take advantage of its parallelism?

Thanks!

Dan_Barkus · August 31, 2021, 2:05pm

Hi Luyilun,

You’re on the right track with the SELECT … ACCUM. You can use a POST-ACCUM statement to do the comparisons that you’re looking for.

For example:

  highCostProviders = SELECT p FROM actions - () - Encounter - (ENCOUNTER_HAS_PROVIDER) - Providers:p
  ACCUM
    p.@highCost += 1
  POST-ACCUM
    p.@edges += p.outdegree("ENCOUNTER_HAS_PROVIDER"),
    p.@pctHigh += p.@highCost / p.@edges;

Because POST-ACCUM is performed after ACCUM, you an access the values set in the ACCUM statement.

You can read more about POST-ACCUM here: https://docs.tigergraph.com/start/accumulators-tutorial#accum-vs-post-accum
https://docs.tigergraph.com/dev/gsql-ref/querying/select-statement#accum-and-post-accum-clauses

luyilun32661 · August 31, 2021, 3:45pm

Hi Dan,
The issue is that POST_ACCUM cannot access the accumulator value during accumulation in the ACCUM clause.
See this example:

SumAccum<INT> @@accident_id = 0;
MapAccum<VERTEX<Car>, INT> @@map;
out = SELECT s FROM Car:s 
              ACCUM @@accident_id += 1
              POST-ACCUM @@map += (s -> @@accident_id);

In @@map, @@accident_id for all vertices will be the same and equal to the final accumulation value in the ACCUM clause. But what I need is incremental integers starting from 0.

Dan_Barkus · August 31, 2021, 6:53pm

Hi Luyilun,

That output is correct for how your query is written. @@ symbolizes a global accumulator, so only one instance of @@accident_id exists in the scope of your query, each vertex in the accum statement simply adds 1 to the @@accident_id accum and the POST-ACCUM takes the end result of that.

If you’re looking to keep track of the value per vertex, you’ll need to use a vertex attached accumulator signified by a single @. That may also prevent your need for a MapAccum as you’ll have a unique value per vertex, and can just return the value of each vertex’s attached accumulator.