Big Data Ecosystem
Andre Pany
andre at s-e-a-p.de
Thu Jul 11 20:12:38 UTC 2019
On Thursday, 11 July 2019 at 20:00:19 UTC, jmh530 wrote:
> On Thursday, 11 July 2019 at 18:12:15 UTC, bachmeier wrote:
>> On Tuesday, 9 July 2019 at 21:16:03 UTC, Andre Pany wrote:
>>
>>> What I currently really miss is the possibility to read/write
>>> Parquet files.
>>
>> For the record, this *is* something that can be done because
>> there are R packages (like sparklyr) that do it, and that
>> means you can do it from D as well. Now maybe you mean you
>> want an interface written in D, but the functionality is
>> nonetheless easily available to D programs. I've never worked
>> with Parquet files so I can't comment on the details.
>
> In something like two minutes of googling, I found that Apache
> Arrow [1] has C bindings [2] for parquet's C++ read/write
> utilities. I know nothing about Parquet files, but I imagine
> this would be faster than calling the R packages.
>
> [1] https://github.com/apache/arrow
> [2]
> https://github.com/apache/arrow/tree/master/c_glib/parquet-glib
Thanks. The benefit of Parquet in contrast to e.g hdf5 is the
file size. A 500 mb csv has a size of 300 mb as hdf5 and 180 mb
as Parquet.
The file size is important when you need to read and write to
e.g. AWS S3.
Kind regards
Andre
More information about the Digitalmars-d
mailing list