VERTEX_MUST_EXIST problem

VERTEX_MUST_EXIST disables vertex creation on the fly (as the docs explain it: https://docs.tigergraph.com/gsql-ref/3.3/ddl-and-loading/creating-a-loading-job#_vertex_must_exist_parameter) while loading data.

I see the strange behavior of TG (I am using version 3.1.6).

When I try to create the following loading job, I get some errors:

CREATE LOADING JOB test  {

    DEFINE FILENAME file1;
    DEFINE FILENAME file2;

    LOAD file1
        TO VERTEX Level1 VALUES ($0, $1) WHERE $2 == "1",
        TO VERTEX Level2 VALUES ($0, $1) WHERE $2 == "2",
        TO VERTEX Level3 VALUES ($0, $1) WHERE $2 == "3"
    USING SEPARATOR="|", HEADER="false", EOL="\n";

    LOAD file2
        TO EDGE PART_OF VALUES($0 Level2, $1 Level1),
        TO EDGE PART_OF VALUES($0 Level3, $1 Level2)
    USING SEPARATOR="|", HEADER="false", EOL="\n", VERTEX_MUST_EXIST="true";

}

Here is the error message:

tigergraph@m1:~/kb/loading-test$ gsql --graph LoadingTest test.gsql
Semantic Check Fails: The USING clause for the same file path "null" should be the same. However in Job 'test' one block has USING clause as "{EOL=\n, SEPARATOR=|, HEADER=false, VERTEX_MUST_EXIST=true}", while another block has USING clause as "{EOL=\n, SEPARATOR=|, HEADER=false}".
Semantic Check Fails: The file file2 has different configs!
If you need different configs for one file, please use symbolic link.
The job test could not be created!

If I take each of the LOAD statements to a dedicated file/job definition, I can create both the loading jobs…

Please, let me know if it is a bug. I couldn’t find info about this restriction in the docs.

Maybe the GSQL interpreter is confused… (please, note the following fragment: the same file path “null”)

Regards,
Karol

If someone wants to reproduce the problem, here is a simple schema I am using:

CREATE VERTEX Level1(PRIMARY_ID id STRING, name STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
CREATE VERTEX Level2(PRIMARY_ID id STRING, name STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
CREATE VERTEX Level3(PRIMARY_ID id STRING, name STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
CREATE DIRECTED EDGE PART_OF(FROM Level2, TO Level1|FROM Level3, TO Level2)

CREATE GRAPH LoadingTest(Level1, Level2, Level3, PART_OF)

USE GRAPH LoadingTest

@gruby_karol , I replicated your graph and reproduced the Same error

and I fixed that by defining the paths .


CREATE LOADING JOB test for GRAPH LoadingTest {

    DEFINE FILENAME file1 = "path2";
    DEFINE FILENAME file2 = *path1";

    LOAD file1
        TO VERTEX Level1 VALUES ($0, $1) WHERE $2 == "1",
        TO VERTEX Level2 VALUES ($0, $1) WHERE $2 == "2",
        TO VERTEX Level3 VALUES ($0, $1) WHERE $2 == "3"
    USING SEPARATOR="|", HEADER="false", EOL="\n";

    LOAD file2
        TO EDGE PART_OF VALUES($0 Level2, $1 Level1),
        TO EDGE PART_OF VALUES($0 Level3, $1 Level2)
    USING SEPARATOR="|", HEADER="false", EOL="\n", VERTEX_MUST_EXIST="true";

}

@Mohamed_Zrouga Thank you for that workaround.
BTW, will I be able to override the file paths with run loading job xxx using file1="yyy"?

And, it looks like a bug, right? Is there a place to report it?

The paths need to exists, it seems. When I use a placeholder like “path1”, “path2”, I get the following error:

Semantic Check Fails: File or directory 'path2' does not exist!
Semantic Check Fails: The FILENAME = 'm1:path2' is not in a valid path format.

I have ran that snippet on 3.3 could you please share some information regarding the version.
And while running the loading job yes you can specify the paths.

I have ran that snippet on 3.3 could you please share some information regarding the version.

Here you go:

$ gadmin version
TigerGraph version: 3.3.0
product          release_3.3.0_11-01-2021         8eeb8abb15d53f7a242cffbada7c0051fd1c8905  2021-09-13 12:11:04 -0700
olgp             release_3.3.0_11-01-2021         edc9ca3e245eadafab29841100659717062af90e  2021-10-05 09:13:06 -0700
cqrs             release_3.3.0_11-01-2021         11181ce8eaf54e9a6f8be338880925a9ec9f4dcd  2021-10-20 03:55:27 -0700
topology         release_3.3.0_11-01-2021         51f55046d3973c82d2c56790c2c69e9b73ec85f9  2021-10-05 09:13:08 -0700
gpe              release_3.3.0_11-01-2021         49b85a24207068edbfc388a1d35ea5dac338bbdc  2021-10-01 05:14:39 -0700
gse              release_3.3.0_11-01-2021         c8039cbb5263f3fdc1b1952bafce86082a83e8cb  2021-09-30 10:22:45 -0700
third_party      release_3.3.0_11-01-2021         317031418629dc08ef6a1a6ffeacf5d3f90852a3  2021-08-22 06:30:36 -0700
utility          release_3.3.0_11-01-2021         f2ff8c81b25367f25ff662513512151d61dc0914  2021-10-28 10:39:58 -0700
realtime         release_3.3.0_11-01-2021         7030f1bb0bcacd55047a204370c2bff83ca7a590  2021-10-05 09:13:14 -0700
er               release_3.3.0_11-01-2021         9eca4f2b7d05a2d2946e01363e7220258c781c26  2021-10-20 03:55:30 -0700
gle              release_3.3.0_11-01-2021         26b783caa9c75aaebf7908cd4c196599cc7d2444  2021-10-26 03:38:07 -0700
document         release_3.3.0_11-01-2021         9d65a7395d8db226a56ac0563a935f1998a8e191  2021-07-26 19:00:16 -0700
glive            release_3.3.0_11-01-2021         dfb297a52db7becc88717fbd2295ba78adef881d  2021-09-02 00:09:01 -0700
gap              release_3.3.0_11-01-2021         65a775f1d51b7b934b9e017a79cd796d6a5284dd  2021-10-28 11:46:32 -0700
gst              release_3.3.0_11-01-2021         a5e0bd18f89ff5ec027cca6a2cdc1c70ac990e37  2021-10-22 13:04:35 -0700
gus              release_3.3.0_11-01-2021         85587b103f68ad64884a1a759358f2547335be03  2021-10-28 00:41:12 -0700
blue_features    release_3.3.0_11-01-2021         a7068bb7a45506695adf25879f1363f0b08e0b73  2021-09-17 17:19:35 -0700
blue_commons     release_3.3.0_11-01-2021         4ca38ccd387e9cc0043e23e4a17b000a8982696a  2020-11-16 08:21:03 -0800

Have you solved this problem? I also met the same problem today, and my version is 3.4

@kwonder On my side, I walked around the problem by splitting loading jobs… But it’s just a walkaround.

1 Like