I have a small project, but one part of it has me stumped. It is for healthcare, but I will use a simpler example with movies that illustrates the scenario. Let’s say I can total up the number of times each person has seen a movie, and I have the type of movie (comedy, drama, action, etc)
I want to find the most viewed movie, for each person, by type, and I want to do it all in GSQL.
My first thought was to have a tuple of movie name (STRING) and times viewed (INT), then a HeapAccum of size 1 sorted by times viewed DESC, then have a MapAccum from STRING of the movie type to the HeapAccum. But a nested HeapAccum is not supported, so that will not work.
so if have a MASSIVE data set with things like
personx --> Comedy : Blazing Saddles : 55
–> Comedy : Meet the Parents: 21
–> Drama: The Godfather : 28
–> Drama: Fracture: 5
How can I organize my query and my accumulators to just give the top 1 viewed movie for each person, for each type?