We are using enterprise TG but had a few questions on migrating one of our NLP based use cases onto Tiger Graph.
We have some entity resolution algorithms which rely on NLP constructs like Lemmatization/edit distance etc. Now, from what I have understood, there are 2 ways to do Data Science on TG:
In Graph ML: This I believe is the ideal way even for our use case. One of the problems that I am having is how to plug in the NLP constructs on GSQL. I think we can write UDFs but i am not very much sure about the C++ libraries for NLP. I think stitching together all steps into a GSQL might still be challengning. Wanted to know if anyone has tried it here and can provide any pointers.
Using pyTigerGraph: This is something I discovered recently. It’s very easy for me to use this option because my entire code base is on python and I just need to tweak the relevant bits to pull data from TG using the functions provided and rest would work as is. I have seen few graph gurus videos where people are doing Data Science using this technique. I had a couple of questions on this regard though:
2.1) Can it be used for medium/large size graph workloads? I ask this because our graph can grow upto order of billions of edges and 100s of millions of node. Even though the Data footprint on TG is of the order of GBs, but wanted to understand how would it pan out for larger graphs. There has been an issue raised in the github repo: https://github.com/pyTigerGraph/pyTigerGraph/issues/7 on this regard.
2.2) Is there a way to do some kind of pagination where the results from the graph can’t be accommodated in memory? I think it hits restpp underneath so does restpp provide such functionality?