How to get the Top value without a nested HeapAccum

I have a small project, but one part of it has me stumped. It is for healthcare, but I will use a simpler example with movies that illustrates the scenario. Let’s say I can total up the number of times each person has seen a movie, and I have the type of movie (comedy, drama, action, etc)

I want to find the most viewed movie, for each person, by type, and I want to do it all in GSQL.

My first thought was to have a tuple of movie name (STRING) and times viewed (INT), then a HeapAccum of size 1 sorted by times viewed DESC, then have a MapAccum from STRING of the movie type to the HeapAccum. But a nested HeapAccum is not supported, so that will not work.

so if have a MASSIVE data set with things like

personx --> Comedy : Blazing Saddles : 55
–> Comedy : Meet the Parents: 21
–> Drama: The Godfather : 28
–> Drama: Fracture: 5

etc

How can I organize my query and my accumulators to just give the top 1 viewed movie for each person, for each type?

You can define a local HeapAccum for each person. E.g.

HeapAccum<movei_tuple>(1, times_viewed desc) @top_1_heap;

start = {person.*};

start = select s from start:s-(watched:e)-movei:t
accum s.@top_1_heap += (t.movei_name, e.times_viewed);

print start.@top_1_heap;

Thanks - but that doesnt solve the problem. The specific issue here is where you want a top value for a category, but you cannot declare a HeapAccum nested inside a MapAccum.

Using my example above, the results should be

Comedy: Blazing Saddles : 55
Drama: The Godfather: 28

Your example just gives me the top viewed movie overall

TypeDef Tuple<INT times_viewed, STRING movei_name> movei_tuple; // must define time viewed as the first field in order to sort by it first
MapAccum<String, MaxAccum<movei_tuple>> @map; // use max accum here

start = {person.*};

start = select s from start:s-(watched:e)-movei:t
accum s.@map += (t.movei_type -> movei_tuple(e.times_viewed, t.movei_name));

print start.@map;

How about this?

1 Like

Yes! Verified and this works. So in the specific case of a HeapAccum of size 1, MaxAccum fits the bill and can be nested. It didn’t occur to me to use the order of the fields in the tuple to influence the default sort. Thanks!!

1 Like