Big Data Ecosystem

Laeeth Isharc laeeth at kaleidic.io
Fri Jul 12 18:02:59 UTC 2019


On Tuesday, 9 July 2019 at 16:58:56 UTC, Eduard Staniloiu wrote:
> Cheers, everybody!
>
> I was wondering what is the current state of affairs of the D 
> ecosystem with respect to Big Data: are there any libraries out 
> there? If so, which?
>
> Thank you,
> Edi

Weka.io of course have the world's fastest file system and I 
understand ML at scale is one hot market for them.  It's simple 
to get going from what I saw and it's not expensive in the scheme 
of things.  I don't really understand myself why you would use 
cloud in many cases, but it does work on the cloud if you want.

I guess you know mir and Lubeck.  There's LDA tucked away there 
in case you need.

James Thompson lightning talk was quite interesting - sometimes 
doing things efficiently can reduce the need for all the 
complexity of some of the standard approaches.

I don't know if you consider postgres part of big data solutions, 
but with Timescale DB maybe.  You can quite easily write Foreign 
Data Wrappers in D to integrate with other data sources and you 
can also write server side functions.  I have done maybe half the 
work for that but didn't get time to finish yet.  DPP more or 
less works for postgres headers.

Joyent have an interesting approach to working on big data the 
UNIX way.  They have an object store called Manta that allows you 
to run code on the same node as the data (stored using zfs).   
One could do something similar in D.  I wanted to get comfortable 
with SmartOS but I don't think it's ready for us today.  However 
one could do something similar home-rolled with zfs and Linux 
containers.  I wrapped libzfscore and lxd - alpha quality right 
now.  Not sure if I pushed the latest versions to GitHub yet.

For syncing stuff across a WAN between regions, TCP doesn't have 
great throughput.  You can either strap together a bunch of 
connections or use something on top of UDP to make it reliable.  
We found UDT-D gave us 300x faster file transfers between London 
and HK.  It's up at GitHub though not very polished code.


More information about the Digitalmars-d mailing list