They wrote the fastest parallelized BAM parser in D

Chris via Digitalmars-d digitalmars-d at puremagic.com
Tue Mar 31 06:31:32 PDT 2015


On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
>
>> As Andrew Brown pointed out, visualization is not behind 
>> Pythons success. Its success lies in the fact that it's a 
>> language you can hack away in easily.
>
> Sounds right.  I am not in the camp that says it is a killer 
> for D.  It would just be nice to have both at least a passable 
> solution for visualization, and some way of making it 
> interactive.  (The REPL might be one route).  The problem with 
> separating the processes completely and just piping the output 
> from D code that does the heavy lifting to a python or julia 
> front end is it may make it more painful to play with and 
> explore the data.  My interests are finance more than science, 
> so that may lead to a different set of needs.  Finishing mathgl 
> and writing D bindings for bokeh (take a look - it is pretty 
> cool, particularly to be able to use the browser as client, 
> acknowledging that it is a tradeoff) is not so much work.  But 
> some help on bokeh particularly would be nice, as I fear 
> picking one way of implementing the object structure and later 
> finding it is a mistake.
>
>> the initial euphoria of being able to automatically rename 
>> files and extract value X from file Y soon gives way to 
>> frustration when it comes to performance.
>
> Yep.
>
>> The paper shows well that in a world where data processing is 
>> of utmost importance, and we're talking about huge sets of 
>> data, languages like Python don't cut it anymore.
>
> I could not agree more, and I do think the intersection of two 
> trends creates tremendous opportunity for D.  It's also 
> commonsensical to look at notable successes - and I hope it is 
> not just my biases that lead me to think many of these are in 
> just this kind of application.  Data sets keep getting larger 
> (but not necessarily more information rich in dollar terms), 
> and Moore's Law/memory speed+latency is not keeping pace.  This 
> is exactly the kind of change that creeps up on you because not 
> much changes in a few months (which is the kind of horizon many 
> of us tend to think in).
>
> People say "what is D's edge", but my personal perception is 
> "where is the competition for D" in this area.  It has to be 
> native code/JIT, and I refuse to learn Java; it also should be 
> plastic and lend itself to rapid iteration.
>
>> at the same time there's growing discontent among researchers, 
>> scientists and engineers as regards performance, simply 
>> because the data sets are becoming bigger and bigger every day 
>> and the algorithms are getting more and more refined. Sooner 
>> or later people will have to find new ways, out of sheer 
>> necessity.
>
> upvote.  I would love to see any references you have on this - 
> not because it's not rather obvious to me, but because it is 
> helpful when talking to other people.

The article that gave rise to this thread is a good reference.

I came from a slightly different angle, I looked for alternatives 
to Python, because I needed:

1. fast native execution (real time)
2. easy interfacing to C
3. cross-platform development

(Modern convenience, templates, ranges etc. were bonuses I 
discovered bit by bit)

As regards algorithms and data processing, most people in 
research use Matlab (proprietary) and Python. However, in my 
field they're useless when it comes to building data-driven 
systems (fast analysis, retraining of machine based on (slight) 
modifications), and putting computationally heavy algorithms into 
real world applications. Proof of concept is all it amounts to, 
usually.

So D has a real chance here, because of

1. native code
2. modern convenience
3. templates, structs, mixins, ranges, std.algorithm etcetc.
4. interfacing to C libs

>> Don't forget that "the state of the art" can change very 
>> quickly in IT and the name of the game is anticipating new 
>> developments rather than taking snapshots of the current state 
>> of the art and frame them. D really has a lot to offer for 
>> data processing and I wouldn't rule it out that more and more 
>> programmers will turn to it for this task.
>
> I fully agree.  If we started a section on use cases, would you 
> be able to write a page or two on D's advantages in data 
> processing?

I think that Dicebot et al would have good examples.


More information about the Digitalmars-d mailing list