Multi-tenant graph

Jon_Herke · April 8, 2020, 5:00pm

Hi!

Out app currently is backed by an RDBMS using a column on each table to represent the Tenant-Id.

So each of our customers accesses the same App, which uses WHERE clauses to filter the tenant data.

Can you give me some advice on how this architecture could be implemented in Tigergraph?

For instance: the TenantId should be a property of each Vertex? Or there is a way to use the tenantId to create partitions and subsequently more performant the access.

Thanks,
Dan

Jon_Herke · April 8, 2020, 5:01pm

Hi Daniel,

In your case, I think you can simply build an index on the column Tenant-Id in RDBMS to speed up.

Here’s my solution with Tigergraph:

create a vertex [tenant_id] with Tenant-Id as id
create another vertex [entity] with the rest columns.
create an undirected edge [entity_tenant] between tenant_id and entity
then write a query by activating the vertex id and traverse through the edge type [entity_tenant] (following code is not tested):

CREATE QUERY get_entity_by_tenant_id(VERTEX<tenant_id> v) FOR GRAPH MyGraph {

  init_set = {v};

  result = SELECT tgt

    FROM init_set:start-(:e)->entity:tgt;

  print result;

}

Jon_Herke · April 8, 2020, 5:03pm

Thanks,

Do you think that approach will scale on transactional (temporal) data? (I mean: we have tenants with hundreds of millions of transactions)

I understand the solution for master data, but when we get into business transactions or IoT data, the TenantId vertex would become a super-node with so many edges.

Each transaction is a vertex having several kinds of children and grandchildren, so putting the TenantId as a property on each vertex also does not seem optimal.

dan

Jon_Herke · April 8, 2020, 5:03pm

Hi Dan,

Transaction data need to be carefully designed and handled. Or else it will created a lot of useless edges and vertices which need to be deleted periodically. Can you provide more details on how are you going to use the transaction data and how long do you prefer to store? What kind query that you are going to write on these transaction data. Then we can provide some practical suggestions on the schema design. Usually aggregation node with aggregated data on transactions is more efficient than just using transaction node.

Best Wishes,
Dan

Jon_Herke · April 8, 2020, 5:04pm

It really depends on your query and the data.

RDBMS computes the relations on the fly by filtering and joining. Instead, graph databases store the relations in advance to speed up.

With proper schema design and query, the performance and be amazingly good.

According to my understanding, the hub-nodes can be eliminated by adding an intermediate node between [entity] and [tenant_id], for example, date. so the schema would be [entity] - [tenant_id_date_tuple] - [tenant_id]

This is a disaster for RDBMS because it introduces 1 more join, making the computational cost unacceptable. However, for TigerGraph, edge traversal is cheap and extremely fast!

Feel free to talk to our sales team to discuss your use case.

Jon_Herke · April 8, 2020, 5:04pm

Another modeling approach can be your initial intuition, store TenantId as a property of a transaction. Simple and fast. When you need to lookup tenantId related info, just use a global variable to store them by a vertex SELECT block.

You schema looks like this

CREATE VERTEX transaction (primary_id transactionID string, TenantId int)
CREATE VERTEX tenant (primary_id TenantId int, TenantId int, attribute1 string)

CREATE GRAPH myApp(transaction, tenant)

CREATE Query test (Vertex< transaction> txn) FOR GRAPH demo {

 SumAccum<int> @@tid;
 SumAccum<String> @@att1;

 Start = {txn};
//get tenantId, store it in global accumulator @@tid.
 Result = SELECT s
        FROM Start:s
        POST-ACCUM @@tid += s.TenantId;
  
 //lookup tenant info 
  T = {tenant.*};

  T2 = SELECT t
                FROM T:t
                WHERE t.TenantId == @@tid
                POST-ACCUM @@att1 += t.attribute1 ;

 PRINT @@att1;
  PRINT Result;
        

}

As you can see, we may have billions transaction vertices, and a couple million tenant vertices. The idea is to lookup tenant info by global variable when we need.