Propagating the smaller internal id in weakly connected components?

luyilun32661 · September 26, 2021, 10:34am

Hi TG Team,

When reading the GSQL source code snippet for weakly connected component:

MinAccum<INT> @cc_id = 0;       //each vertex's tentative component id
MapAccum<INT, INT> @@compSizes;
MapAccum<INT, ListAccum<INT>> @@compGroupBySize;
FILE f(file_path); 

Start = {v_type};

# Initialize: Label each vertex with its own internal ID
S = SELECT x 
	FROM Start:x
	POST-ACCUM x.@cc_id = getvid(x)
;

# Propagate smaller internal IDs until no more ID changes can be Done
WHILE (S.size()>0) DO
		S = SELECT t
			FROM S:s -(e_type:e)- v_type:t
			ACCUM t.@cc_id += s.@cc_id // If s has smaller id than t, copy the id to t
			HAVING t.@cc_id != t.@cc_id'
		;
END;

I don’t understand the trick of propagating smaller internal IDs until no more id changes can be done. I thought WCC stops when all vertices are marked as visited after BFS.

Can anyone enlighten me ? Thanks.

Muenchen · September 26, 2021, 11:41am

Hello Luyilun bro:

I am an equally new data scientist engineer attracted to TigerGraph. I would like to tentatively answer your question since it is not related to TigerGraph. Instead, it is a basic trick in BFS (which you could see relatively frequently in coding interview problems such as LeetCode).

A WCC is defined to be a group of vertices in which any pair could be linked by intermediate vertices within the same group. Suppose a1-a2-…-an, and ai has the smallest ID in the WCC. Then after one iteration, a(i-1)'s and a(i+1)‘s IDs are changed to be ai’s ID. Hence, after some iterations, all vertices in the chain are changed to ai’s ID. Since all vertices in the WCC are connected in a chain to ai, then all vertices’ IDs are changed to ai’s ID after some iterations. To be accurate, after k iterations, all vertices within k hops to ai are changed to ai’s ID. (I suppose the graph to be undirected. For a directed graph, things might turn out to be more complicated but the essence should be the same.)

It is straightforward that in the last iteration, since all vertices in a WCC have the same IDs, then no more changes in ID need to be performed. At this time, each WCC is labeled by the smallest ID in it. and it is easy to count the number of WCCs. Actually, you could specify all vertices in the WCC to be marked as any ID after WCC stops, but the minimum ID is a natural and easy choice.

It is not a BFS since each vertex could be processed more than once. You could have a look at a similar BFS problem which is interesting and in some way related to WCC:

number of islands problem LeetCode

Hope this would clarify it.