Weka.IO in the news... but not mentioning Dlang... why?
Shachar Shemesh
shachar at weka.io
Sat Sep 23 16:09:30 UTC 2017
On 23/09/17 11:57, Suliman wrote:
>> One is a linear database and the other is a filesystem?
>>
>> If that doesn't satisfy you, please describe to me the difference
>> between D and Microsoft Word, so I know what kind of answer you're
>> expecting.
>>
>
> But Hadoop is more look like file system that DataBase...
Hadoop Distributed File System is, sort of, a file system. I don't know
much about it (just read the Wikipedia page), so I'll try to answer as
best I understand. Corrections welcome:
Performance:
I have not idea what HDFS's per-node performance numbers are, but there
are several indications that make me suspect they are not as good as Weka's.
First of all, I don't think a tool written in Java, designed to run over
another file system and the kernel's networking has any chance of
out-performing a tool written in D, the directly uses the NVME and the
network interface.
Second, the file system seems oriented toward large read-only blobs. As
a file system, I don't think it has any chance against any dedicated
Posix compliant file system, but I'm guessing you're mostly interested
in using HDFS as a basis for running Hadoop itself, so that might not
matter.
Cost:
Here I don't think there is any way for HDFS to compete. That might
sound strange to some, as HDFS is open source while Weka charge
licensing fees. The reason I'm saying this is because HDFS uses
mirroring in order to achieve fault tolerance, while Weka uses Raid (I
should know - I wrote it).
In short, to get 1PB of usable capacity while tolerating 2 faults you'll
need 3PB of raw capacity with Hadoop (200% overhead). At 16+2, you'll
only need around 1.3PB with Weka. Whatever you're paying for the
licenses is, in all likelihood, going to be less than the cost of the
hardware.
Like I said, corrections are welcome, as I'm not familiar with HDFS or
Hadoop.
Shachar
More information about the Digitalmars-d
mailing list