Load snappy.parquet files

Hi there,

I’d like to load some data from snappy.parquet files on hdfs to TG, are there existing tools available?

Thanks,

Loading directly from HDFS is not currently supported (I believe it is being considered), but you can load Parquet files form S3 buckets (look for file.reader.type in the documentation).

You might be able to mount HDFS on Linux and load data from the mount, but only CSV-like and JSON data sources are supported in case of file load.

1 Like

Thanks, I’ll write some spark code then.

1 Like

:+1:Good point, I should have mentioned Spark as an alternative.

How hard is it to extend GSQL to use C++ code? Could this be done using UDF?

There are parquet C++ libraries:

https://arrow.apache.org/docs/cpp/parquet.html

So, it should be straightforward but “some programming required”!