I recently moved a prototype that I have been working on from my local Docker instance to a cluster of machines, and on one of the queries, I experience ~100x slowdown when I add the “DISTRIBUTED” keyword to the query. Meanwhile, another query that I have experienced a ~30x speedup. Are there rules of thumb of why one distributed query would succeed in increasing efficiency while another would not? I am thinking maybe the initial size of seed vertex set, global/local accumulator usage, etc. would impact this?
Edit: I ran across this: https://docs.tigergraph.com/dev/gsql-ref/querying/distributed-query-mode#guidelines-for-selecting-distributed-query-mode documentation, and the query that slowed down does not start at a large number of vertices, but what is considered “many hops”?