Can not read newest data written inside the same transaction?

qingfeng14 · October 13, 2021, 9:52am

Hello,

I’m testing the transaction in tigergraph(v3.1.6), but I found that I can’t read the newest data written in the same transaction. Is it a bug? Or why we design it like this, which is different from other database, like mysql.

Here is my query:

create vertex MyNode (primary_id id UINT, id1 UINT) WITH primary_id_as_attribute="true"
create directed edge MyEdge(from MyNode, to MyNode)
create graph twitter(MyNode, MyEdge)

USE GRAPH twitter

CREATE OR REPLACE QUERY read_and_update(UINT id, UINT id1, INT maxLoop) for graph twitter{
    Nodes = {MyNode.*};
    ori = SELECT v FROM Nodes:v where v.id == id;
    UPDATE v FROM Nodes:v SET v.id1 = id1 where v.id == id;
    final = SELECT v FROM Nodes:v where v.id == id;
    
    PRINT ori, final;
}

install query read_and_update

Here is the original data:

$ gsql -g twitter "select * from MyNode where id == 12"
[{
  "v_id": "12",
  "attributes": {
    "id1": 12,
    "id": 12
  },
  "v_type": "MyNode"
}]

Call query read_and_update:

$ gsql -g twitter "run query read_and_update(12,13,1)"
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{
    "ori": [{
      "v_id": "12",
      "attributes": {
        "id1": 12,
        "id": 12
      },
      "v_type": "MyNode"
    }],
    "final": [{
      "v_id": "12",
      "attributes": {
        "id1": 12,
        "id": 12
      },
      "v_type": "MyNode"
    }]
  }]
}

When we call query read_and_update, the output shows that we didn’t get the updated data during the transaction.(We expect that the id1 is 13 in final)

Get the vertex again:

$ gsql -g twitter "select * from MyNode where id == 12"
[{
  "v_id": "12",
  "attributes": {
    "id1": 13,
    "id": 12
  },
  "v_type": "MyNode"
}]

Now we can get the updated data.

Can anyone explain why we can’t read the newest data written in the same transaction and why tigergraph designs transaction like this?

Thanks.

Leo_Shestakov · October 13, 2021, 8:50pm

This is done on purpose to allow consistent and reliable processing of GSQL code in relation to the graph back-end (when parallel-like scenarios might arise).

Essentially, any modifications you make to vertices or edges in the graph will only be fully updated (and accessible by queries) after that query is completely finished running.

The best way to work around this is to use accumulators (to hold important updated data that you cannot rely on for the back-end) or utilize subqueries that can successfully access the updated back-end if they are called at the end of the main query.

This is described more rigorously in the Transaction Processing documentation page.

qingfeng14 · October 14, 2021, 7:22am

@Leo_Shestakov Thanks for your reply.

I want to know what do you mean by call subqueries at the end of the main query.

If we call read subquery at the end of the main query(like read_and_update, which calls sub_read at the end), we cannot access the updated data.
If we update and read data in two subqueries(like RW), we still cannot access the updated data.
If we call another query after the main query, they are two queries in different transactions in fact.

So is accumulator the only way to work around this?

drop query read_and_update
drop query RW
CREATE OR REPLACE QUERY sub_read(UINT id) for graph twitter RETURNS(ListAccum<int>) {
    ListAccum<int> @@res;
    
    Nodes = {MyNode.*};
    Nodes = SELECT v FROM Nodes:v where v.id == id POST-ACCUM @@res += v.id1;
    
    PRINT Nodes;
    RETURN(@@res);
}

CREATE OR REPLACE QUERY sub_write(UINT id, UINT id1) for graph twitter RETURNS(ListAccum<int>) {
    ListAccum<int> @@res;
    
    Nodes = {MyNode.*};
    Nodes = SELECT v FROM Nodes:v where v.id == id POST-ACCUM v.id1 = id1;
    Nodes = SELECT v FROM Nodes:v where v.id == id POST-ACCUM @@res += v.id1;
    
    PRINT Nodes;
    RETURN(@@res);
} 

CREATE OR REPLACE QUERY read_and_update(UINT id, UINT id1, INT maxLoop) for graph twitter{
    SumAccum<int> @t = 0;
    int i = 0;
    
    Nodes = {MyNode.*};
    ori = SELECT v FROM Nodes:v where v.id == id POST-ACCUM v.@t = id1;

    WHILE (i < maxLoop) DO
        Nodes = SELECT v From Nodes:v where v.id == id POST-ACCUM v.@t += 1;
        i = i + 1;
    END;
    
    modified = SELECT v From Nodes:v where v.id == id POST-ACCUM v.id1 = v.@t;
    final = SELECT v FROM Nodes:v where v.id == id;
    
    PRINT ori, modified, final;
    PRINT sub_read(id) as sub_result;
}

CREATE OR REPLACE QUERY RW(UINT id, UINT id1) for graph twitter {
    PRINT sub_write(id, id1);
    PRINT sub_read(id);
}

install query sub_read
install query sub_write
install query read_and_update
install query RW

call read_and_update

$ gsql -g twitter "select * from MyNode where id == 12"
[{
  "v_id": "12",
  "attributes": {
    "id1": 12,
    "id": 12
  },
  "v_type": "MyNode"
}]
$ gsql -g twitter "run query read_and_update(12,13,5)"
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [
    {
      "ori": [{
        "v_id": "12",
        "attributes": {
          "@t": 18,
          "id1": 12,
          "id": 12
        },
        "v_type": "MyNode"
      }],
      "final": [{
        "v_id": "12",
        "attributes": {
          "@t": 18,
          "id1": 12,
          "id": 12
        },
        "v_type": "MyNode"
      }],
      "modified": [{
        "v_id": "12",
        "attributes": {
          "@t": 18,
          "id1": 12,
          "id": 12
        },
        "v_type": "MyNode"
      }]
    },
    {"sub_result": [12]}
  ]
}

call RW

$ gsql -g twitter "run query RW(12,12)"
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [
    {"sub_write(id,id1)": [18]},
    {"sub_read(id)": [18]}
  ]
}

Thanks.

Leo_Shestakov · October 14, 2021, 5:46pm

Yes, I agree that accumulators are the best way to go about this.

I have encountered inconsistent behavior when calling subqueries (or even subsequent queries one after another) and making them access updated back-end data. Purely by observation, it may have something to do with the size of your graph or the number of vertices that have the same target type that you are trying to access.

qingfeng14 · October 15, 2021, 6:12am

@Leo_Shestakov Thanks for your help!