Split Function and splitting strings inside a query

I’m wanting to use a split function inside a query to parse strings into a list of elements.

When reading the reference guide the split function is listed under “Loading a LIST or SET Attribute” and “Loading a MAP Attribute” which has a different usage than parsing a string inside a query.

I would like to do something like this:

ListAccum @@words;

ACCUM @@words += SPLIT(“tom,dick,harry”, ”,”);

I was pleased to see some string functions in the documentation but it’s a very limited list and doesn’t have “split”. How do people typically handle this and can we create our own custom functions?

Thank you.

Hi again George,

Yes, you can write your own custom functions in this file: <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp.

You can return a ListAccum from a function in the file and add it to your @@words accumulator like you did above, I’ll include an example below.

Here is a user-defined function that does what you’re looking for:

inline ListAccum string_split (string str, string delimiter) {

ListAccum<string> newList;

size_t pos = 0;

std::string token;

while ((pos = str.find(delimiter)) != std::string::npos) {

  token = str.substr(0, pos);

  newList += token;

  str.erase(0, pos + delimiter.length());

}

newList += str;

return newList;

}

To use this in your query, you can simply do

@@words += string_split(“tom,dick,harry”,",");

Thanks,

Kevin

Thank you Kevin, it worked perfect and I’m glad to see we can create custom functions like that. I have 2 questions:

  1. Is it possible to deploy functions like this to the cloud account and if so, how?

  2. I would think there would be a large repository of custom functions like this from internal and user contributions. Is there such a resource?

Thank you.

Hi,

  1. This is currently not possible. A workaround could be to export a solution using a non-cloud version, and then import the solution to the cloud.

  2. There is no such resource… yet.

  • Kevin

Ok thank you for clearing this of. By implementing your sample split function I learned that custom functions are not written in GSQL but rather what looks like c++. Can you confirm this is c++ and is there guidelines on what can and cannot be done in writing these custom functions? I searched “custom function” in the TG website and didn’t find anything. For example, in SQL Server we can write CLR functions in c# but there is a guideline explaining what can be done, all the constraints and how to deploy them.

Yes, user defined functions are written in c++.

You can read about them here : https://docs.tigergraph.com/dev/gsql-ref/querying/operators-functions-and-expressions#user-defined-functions

Thanks,

Kevin

Hello Kevin,

This is very useful information that I could not get form the TigerGraph documentation. The documentation says:

Users can define their own expression functions in C++ in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp. Only bool, int, float, double, and string (NOT std::string) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.

But, it seems like I can return a list like ListAccum. That is great! Can I pass any of the GSQL container types such as set and map? Do I declare them on C++ side just like they are declared on GSQL side?

Will following UDF declaration be acceptable?

inline ListAccum string_split (ListAccum input) {

ListAccum<INT> newList;

// code ...

return newList;

}

Kumar

inline bool cache_vector(ListAccum p) {

//… … code

return true;

}

gives me compilation error when I run ./compile in UDF directory. Please suggest how a container type can be passed and returned in UDF.

Thanks

Kumar

Hi Kumar,

You can pass in a string as a parameter for your UDF, like I did for the example from way back.

Use a for each statement in the GSQL code and you can iterate through the ListAccum and call the function for each string.

Thanks,

Kevin

Okay, thanks, will do. So, return of a udf can be a container, but not args?