If a query has a FOREACH loop inside of it, can the compilation process automatically parallelize it if each of the iterations of the loop are independent of one another? If not, is there a way to achieve this behavior? I currently have a SELECT statement inside of the loop and would like to apply that in an accumulator-like fashion. Thanks!
What an interesting question!
So, I don’t believe FOREACH loops are parallelised.
But we do have the ACCUM over edges which is implicitly parallel. So a possible approach is to construct a vertex and edges that will have the same effect.
How many loops in your FOREACH?
I iterate through a global SetAccum of vertices. I would usually use an ACCUM statement, but the operation requires a SELECT statement applied to each vertex in the SetAccum, and SELECT statements are not supported within ACCUMs.
OK. How complex is this select statement?
I ask, as it is possible using the functional neighbor operators to do quite a lot if it is only one hop.
Also, the POST-ACCUM statement is a parallel accumulation on vertices, so might still be better than a simple FOREACH. Worth trying anyway if it isn’t too painful.
The select statement is across a large set of vertices, and calls a sub-query in the ACCUM statement that uses the vertex from the FOREACH loop as an argument.
I still have the issue of not being able to run a SELECT in a POST-ACCUM, but would be ok with that option as well.
Is the subquery also complicated? I guess without seeing the code it is difficult for me to give specific solutions, apart from calling the sub-select in a separate function with the vertex as a parameter. Which is slow at best and probably defeats most of the advantage of parallelism.
Generally, vertex set manipulation and standard statements are sufficient, but obviously you wouldn’t be asking if that was a working solution. If it is reasonable, please provide a representative query and schema then we can get into the detail.
The subquery is not too complicated. I have tried the sub-select in a different function previously, and for some reason, the query result was never returned to me, even though I saw the activity monitor spike and decrease back to the “resting” state. That approach was 3 queries deep (the current FOREACH being replaced by an ACCUM calling the sub-select statement in another query, which in turn called the lowest-level query). However, due to this not working, I combined the two queries with the FOREACH, such that only the lowest-level subquery is called currently. I do think that there is some strange and unexpected behavior when calling subqueries, espcially on a distributed environment.