Big Data Ecosystem

Andre Pany andre at s-e-a-p.de
Thu Jul 11 20:12:38 UTC 2019


On Thursday, 11 July 2019 at 20:00:19 UTC, jmh530 wrote:
> On Thursday, 11 July 2019 at 18:12:15 UTC, bachmeier wrote:
>> On Tuesday, 9 July 2019 at 21:16:03 UTC, Andre Pany wrote:
>>
>>> What I currently really miss is the possibility to read/write 
>>> Parquet files.
>>
>> For the record, this *is* something that can be done because 
>> there are R packages (like sparklyr) that do it, and that 
>> means you can do it from D as well. Now maybe you mean you 
>> want an interface written in D, but the functionality is 
>> nonetheless easily available to D programs. I've never worked 
>> with Parquet files so I can't comment on the details.
>
> In something like two minutes of googling, I found that Apache 
> Arrow [1] has C bindings [2] for parquet's C++ read/write 
> utilities. I know nothing about Parquet files, but I imagine 
> this would be faster than calling the R packages.
>
> [1] https://github.com/apache/arrow
> [2] 
> https://github.com/apache/arrow/tree/master/c_glib/parquet-glib

Thanks. The benefit of Parquet in contrast to e.g  hdf5 is the 
file size. A 500 mb csv has a size of 300 mb as hdf5 and 180 mb 
as Parquet.
The file size is important when you need to read and write to 
e.g. AWS S3.

Kind regards
Andre


More information about the Digitalmars-d mailing list