Various Data Type Questions

Hi,

I am working through some issues with data types. Here are some questions that have come up:

  1. Is there a known issue with using the LIST data type?
  • In my online data loading I seem to get values getting jumbled up between fields that aren’t even on the same vertex type.
  • Basic STRING COMPRESS types seem to work fine, only using it inside a LIST seems to be affected.
  1. Is there a better / more efficient way to represent a UUID than as a STRING , e.g. using FIXED_BINARY(16) ?
  • If using FIXED_BINARY , is ORDER, WHERE , etc. supported?
  1. When using SET.size() I get the error “this feature is not yet available in batch mode queries”. When will it be available?
  2. Is there a way to test the content of a LIST / LIST for certain values?
  • I need to use a LIST because I need the output to retain order characteristics, but I need to use predicates that test for value overlap.
  • The way I’d expect this to work is something like ((list_type_attribute INTERSECT list_type_query_parameter).size() > 0)
  1. Is there planned support for using BOOL types with the ORDER clause?
  • The workaround of using an INT to represent all BOOL values seems wasteful.
  1. When is support planned for PRINT results from interpreted queries maintaining the ORDER of the SELECT they represent?
  2. Are there any plans for a percentile accumulator that supports arbitrary percentiles?
  • e.g. PercentileAccum(1, 5, 50, 95, 99)
  1. Are there any plans to support DATETIME in the existing accumulators?
  • e.g. MinAccum
  1. Are there any plans to support formatting DATETIME output results (specifically with ISO8601 formats)?
  2. Are there any plans to add escaping/quoting identifiers to the language generally or the PRINT statement specifically?
  • It’s pretty limiting that all GSQL reserved keywords block using those strings as keys in PRINT results.
  • In a generic query generator we have to namespace everything (vertex names, edge names, AS names, etc.) to avoid accidentally hitting keywords).
  1. The grand-daddy of them all: is there any plan to introduce an OPTION type?
  • Explicit NULL handling is a great idea, and it makes some sense that TigerGraph would ban NULLs to avoid it.
  • However, the workaround of including a secondary shadow attribute to indicate presence is very painful with large numbers of attributes.
  • Being able to wrap any other data type in an option type that could be handled as a monad would provide the flexibility of natively representing “missing” values while ensuring that GSQL written against those types properly handles the missing cases.
  • e.g.
    • definition: OPTION some_attribute DEFAULT NONE (or DEFAULT SOME(999) )
    • predicate: WHERE tgt.some_attribute.contains(999)
    • aggregate: AvgAccum<OPTION> // does nothing when value is NONE
    • etc.

Thanks in advance!

  1. Currently, we do not support primary key using fixed_binary(n) type. You are right we can save more space if we support fixed_binary on primary key field. You can use string for now. Since UUID is usually 128-bit (16 bytes), string is good for most use case.

Please be aware we have two way to specify primary key now.

Prior 2.3,

CREATE VERTEX movie (primary_id UUID string, name STRING, year UINT)

Note: In default mode, a PRIMARY_ID is not an attribute, but the WITH primary_id_as_attribute=“true” clause can be used to make it an attribute.

Post 2.3,

CREATE VERTEX movie (UUID string PRIMARY KEY, name STRING, year UINT)

Note: the PRIMARY KEY is always an attribute; the WITH option is unneeded.

More document on primary key is here https://docs.tigergraph.com/dev/gsql-ref/ddl-and-loading/defining-a-graph-schema#create-vertex

  1. Is there a way to test the content of a LIST / LIST for certain values?

  2. I need to use a LIST because I need the output to retain order characteristics, but I need to use predicates that test for value overlap.

  3. The way I’d expect this to work is something like ((list_type_attribute INTERSECT list_type_query_parameter).size() > 0)

USE GRAPH social

DROP QUERY hello2

CREATE QUERY hello2( Set <string> names) FOR GRAPH social{

  ListAccum<string> @nbrs;

  SetAccum<string> @@nameList;

  OrAccum @overlapped = false;

  @@nameList = names; //pass Set<sting> to SetAccum, which has an contains function.

  Start = {person.*};

  # for each person, put their neighbors name in its @nbrs list.

  Result = SELECT tgt

           FROM Start:s-(friendship:e) - person:tgt

           ACCUM s.@nbrs += [tgt.name](http://tgt.name/);

  #find those whose neighbors have overlap with nameList

  Result = SELECT s

             FROM Start:s

             ACCUM FOREACH i IN range [0, s.@nbrs.size()-1] DO

                       IF @@nameList.contains(s.@nbrs.get(i)) THEN

                             s.@overlapped = TRUE,

                             BREAK

                       END

                   END

              HAVING s.@overlapped == TRUE;

  PRINT Result;

}

INTERPRET QUERY hello2(["Dan", "Jenny"])
  1. Are there any plans to support DATETIME in the existing accumulators?

  2. e.g. MinAccum

They are already supported in 2.4 release. E.g.

create query test2 () for graph poc_graph {

  MaxAccum<datetime> @@t = to_datetime("2011-05-11 12:23:34");

  SumAccum<string> @@s = "2012-06-12 21:22:23";

  L0 = {members.*};

  L1 = SELECT src

    FROM L0:src-(member_member)->members:tgt

    WHERE year(src.dob) > year(to_datetime("2003-04-02 00:00:00"));

  PRINT @@t;

  PRINT year(@@t);

  PRINT year(to_datetime(@@s));

  PRINT year(to_datetime("2013-07-13 11:22:33"));

  PRINT month(@@t);

  PRINT month(to_datetime(@@s));

  PRINT month(to_datetime("2013-07-13 11:22:33"));

  PRINT day(@@t);

  PRINT day(to_datetime(@@s));

  PRINT day(to_datetime("2013-07-13 11:22:33"));

  PRINT hour(@@t);

  PRINT hour(to_datetime(@@s));

  PRINT hour(to_datetime("2013-07-13 11:22:33"));

  PRINT minute(@@t);

  PRINT minute(to_datetime(@@s));

  PRINT minute(to_datetime("2013-07-13 11:22:33"));

  PRINT second(@@t);

  PRINT second(to_datetime(@@s));

  PRINT second(to_datetime("2013-07-13 11:22:33"));

}

//install query test2

interpret query test2()
  1. Are there any plans to support formatting DATETIME output results (specifically with ISO8601 formats)?

It’s available in compiled mode, not in interpret mode yet. See example below.

USE GRAPH social

DROP QUERY hello2

CREATE QUERY hello2( Set <string> names) FOR GRAPH social{

  ListAccum<string> @nbrs;

  SetAccum<string> @@nameList;

  OrAccum @overlapped = false;

  MinAccum<datetime> @@t;

  MinAccum<datetime> @tt;

  @@nameList = names;

  Start = {person.*};

  # for each person, put their neighbors name in its @nbrs list.

  Result = SELECT tgt

           FROM Start:s-(friendship:e) - person:tgt

           ACCUM s.@nbrs += [tgt.name](http://tgt.name/),

                 @@t += to_datetime("2011-05-11 12:23:34");

  #find those who is neighbor with "Dan"

  Result = SELECT s

             FROM Start:s

             ACCUM FOREACH i IN range [0, s.@nbrs.size()-1] DO

                       IF @@nameList.contains(s.@nbrs.get(i)) THEN

                             s.@overlapped = TRUE,

                             BREAK

                       END

                   END,

                 @@t += to_datetime("2012-05-11 12:23:34"),

                 s.@tt += to_datetime("2012-05-11 12:23:34")

              HAVING s.@overlapped == TRUE;

  PRINT Datetime_Format(@@t, "%Y/%m/%d %H:%M:%S");

  PRINT Datetime_Format(@@t, "%Y/%m/%d");

  PRINT Result[ Datetime_Format(Result.@tt, "%Y/%m/%d")];

}

install query hello2

run query hello2(["Dan", "Jenny"])

{

  "error": false,

  "message": "",

  "version": {

    "schema": 0,

    "edition": "developer",

    "api": "v2"

  },

  "results": [

    {"Datetime_Format(@@t,%Y/%m/%d %H:%M:%S)": "2011/5/11 12:23:34"},

    {"Datetime_Format(@@t,%Y/%m/%d)": "2011/5/11"},

    {"Result": [

      {

        "v_id": "Amily",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      },

      {

        "v_id": "Tom",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      },

      {

        "v_id": "Dan",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      },

      {

        "v_id": "Kevin",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      },

      {

        "v_id": "Jenny",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      },

      {

        "v_id": "Nancy",

        "attributes": {"Datetime_Format(Result.@tt,%Y/%m/%d)": "2012/5/11"},

        "v_type": "person"

      }

    ]}

  ]

}
  1. Are there any plans to add escaping/quoting identifiers to the language generally or the PRINT statement specifically?

  2. It’s pretty limiting that all GSQL reserved keywords block using those strings as keys in PRINT results.

  3. In a generic query generator we have to namespace everything (vertex names, edge names, AS names, etc.) to avoid accidentally hitting keywords).

Very good suggestion. We are considering to lift the reserved key word for attribute/vertex/edge type identifiers as much as it makes sense.

Hi Robert,

Thanks for all your questions.

For 1. I have run a simple test locally. It worked fine. Could you please provide an example of the jumbled values.
If possible would you please provide an available dataset so that we can take a look via an online session?

For 3. I have tried query below on my local 2.4 version and it worked. What is your current version right now? Is the Start.size() same as the SET.size() you mentioned?

CREATE BATCH QUERY test1(/* Parameters here */) FOR GRAPH MyGraph {
Start = {ANY};
while Start.size() > 0 do
continue;
end;
PRINT Start.size();
}

2, 4, 8, 9, 10 are answered by Mingxi.

5, 6, 7, 11 are feature requests, I will report this to the product team and discuss internally.

Thanks.