They wrote the fastest parallelized BAM parser in D

Tue Mar 31 04:04:49 PDT 2015

> As Andrew Brown pointed out, visualization is not behind 
> Pythons success. Its success lies in the fact that it's a 
> language you can hack away in easily.

Sounds right.  I am not in the camp that says it is a killer for 
D.  It would just be nice to have both at least a passable 
solution for visualization, and some way of making it 
interactive.  (The REPL might be one route).  The problem with 
separating the processes completely and just piping the output 
from D code that does the heavy lifting to a python or julia 
front end is it may make it more painful to play with and explore 
the data.  My interests are finance more than science, so that 
may lead to a different set of needs.  Finishing mathgl and 
writing D bindings for bokeh (take a look - it is pretty cool, 
particularly to be able to use the browser as client, 
acknowledging that it is a tradeoff) is not so much work.  But 
some help on bokeh particularly would be nice, as I fear picking 
one way of implementing the object structure and later finding it 
is a mistake.

> the initial euphoria of being able to automatically rename 
> files and extract value X from file Y soon gives way to 
> frustration when it comes to performance.

Yep.

> The paper shows well that in a world where data processing is 
> of utmost importance, and we're talking about huge sets of 
> data, languages like Python don't cut it anymore.

I could not agree more, and I do think the intersection of two 
trends creates tremendous opportunity for D.  It's also 
commonsensical to look at notable successes - and I hope it is 
not just my biases that lead me to think many of these are in 
just this kind of application.  Data sets keep getting larger 
(but not necessarily more information rich in dollar terms), and 
Moore's Law/memory speed+latency is not keeping pace.  This is 
exactly the kind of change that creeps up on you because not much 
changes in a few months (which is the kind of horizon many of us 
tend to think in).

People say "what is D's edge", but my personal perception is 
"where is the competition for D" in this area.  It has to be 
native code/JIT, and I refuse to learn Java; it also should be 
plastic and lend itself to rapid iteration.

> at the same time there's growing discontent among researchers, 
> scientists and engineers as regards performance, simply because 
> the data sets are becoming bigger and bigger every day and the 
> algorithms are getting more and more refined. Sooner or later 
> people will have to find new ways, out of sheer necessity.

upvote.  I would love to see any references you have on this - 
not because it's not rather obvious to me, but because it is 
helpful when talking to other people.

> Don't forget that "the state of the art" can change very 
> quickly in IT and the name of the game is anticipating new 
> developments rather than taking snapshots of the current state 
> of the art and frame them. D really has a lot to offer for data 
> processing and I wouldn't rule it out that more and more 
> programmers will turn to it for this task.

I fully agree.  If we started a section on use cases, would you 
be able to write a page or two on D's advantages in data 
processing?