They wrote the fastest parallelized BAM parser in D

Tue Mar 31 06:43:56 PDT 2015

On Tuesday, 31 March 2015 at 13:31:33 UTC, Chris wrote:
> On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
>>
>>> As Andrew Brown pointed out, visualization is not behind 
>>> Pythons success. Its success lies in the fact that it's a 
>>> language you can hack away in easily.
>>
>> Sounds right.  I am not in the camp that says it is a killer 
>> for D.  It would just be nice to have both at least a passable 
>> solution for visualization, and some way of making it 
>> interactive.  (The REPL might be one route).  The problem with 
>> separating the processes completely and just piping the output 
>> from D code that does the heavy lifting to a python or julia 
>> front end is it may make it more painful to play with and 
>> explore the data.  My interests are finance more than science, 
>> so that may lead to a different set of needs.  Finishing 
>> mathgl and writing D bindings for bokeh (take a look - it is 
>> pretty cool, particularly to be able to use the browser as 
>> client, acknowledging that it is a tradeoff) is not so much 
>> work.  But some help on bokeh particularly would be nice, as I 
>> fear picking one way of implementing the object structure and 
>> later finding it is a mistake.
>>
>>> the initial euphoria of being able to automatically rename 
>>> files and extract value X from file Y soon gives way to 
>>> frustration when it comes to performance.
>>
>> Yep.
>>
>>> The paper shows well that in a world where data processing is 
>>> of utmost importance, and we're talking about huge sets of 
>>> data, languages like Python don't cut it anymore.
>>
>> I could not agree more, and I do think the intersection of two 
>> trends creates tremendous opportunity for D.  It's also 
>> commonsensical to look at notable successes - and I hope it is 
>> not just my biases that lead me to think many of these are in 
>> just this kind of application.  Data sets keep getting larger 
>> (but not necessarily more information rich in dollar terms), 
>> and Moore's Law/memory speed+latency is not keeping pace.  
>> This is exactly the kind of change that creeps up on you 
>> because not much changes in a few months (which is the kind of 
>> horizon many of us tend to think in).
>>
>> People say "what is D's edge", but my personal perception is 
>> "where is the competition for D" in this area.  It has to be 
>> native code/JIT, and I refuse to learn Java; it also should be 
>> plastic and lend itself to rapid iteration.
>>
>>> at the same time there's growing discontent among 
>>> researchers, scientists and engineers as regards performance, 
>>> simply because the data sets are becoming bigger and bigger 
>>> every day and the algorithms are getting more and more 
>>> refined. Sooner or later people will have to find new ways, 
>>> out of sheer necessity.
>>
>> upvote.  I would love to see any references you have on this - 
>> not because it's not rather obvious to me, but because it is 
>> helpful when talking to other people.
>
> The article that gave rise to this thread is a good reference.
>
> I came from a slightly different angle, I looked for 
> alternatives to Python, because I needed:
>
> 1. fast native execution (real time)
> 2. easy interfacing to C
> 3. cross-platform development
>
> (Modern convenience, templates, ranges etc. were bonuses I 
> discovered bit by bit)
>
> As regards algorithms and data processing, most people in 
> research use Matlab (proprietary) and Python. However, in my 
> field they're useless when it comes to building data-driven 
> systems (fast analysis, retraining of machine based on (slight) 
> modifications), and putting computationally heavy algorithms 
> into real world applications. Proof of concept is all it 
> amounts to, usually.
>
> So D has a real chance here, because of
>
> 1. native code
> 2. modern convenience
> 3. templates, structs, mixins, ranges, std.algorithm etcetc.
> 4. interfacing to C libs
>
>>> Don't forget that "the state of the art" can change very 
>>> quickly in IT and the name of the game is anticipating new 
>>> developments rather than taking snapshots of the current 
>>> state of the art and frame them. D really has a lot to offer 
>>> for data processing and I wouldn't rule it out that more and 
>>> more programmers will turn to it for this task.
>>
>> I fully agree.  If we started a section on use cases, would 
>> you be able to write a page or two on D's advantages in data 
>> processing?
>
> I think that Dicebot et al would have good examples.

It'd be nice, if we had a dedicated data-analysis section and/or 
library. I'm almost sure that people working with massive amounts 
of data would find it by googling "efficient data analysis" or 
something like that.

Facebook probably has a wealth of data analysis examples / 
techniques, too.