They wrote the fastest parallelized BAM parser in D

Chris via Digitalmars-d digitalmars-d at puremagic.com
Tue Mar 31 02:21:11 PDT 2015


On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:
> On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d 
> wrote:
>> > .NET actually already has a foothold in bioinformatics, 
>> > specially in user facing software and steering of reading 
>> > equipments and robots.
>> > 
>> > So D's needs a story over C# and F# (alongside WPF for data 
>> > visualization) use cases.
>> > 
>> > --
>> > Paulo
>
> Paulo,
>
> Can you send me some pointers to this stuff?
>
>> 
>> Though when it comes to open source bioinformatics projects, 
>> Perl and Python have a large foothold
>> among most most bioinformaticians. Most utilities that require 
>> speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS 
>> etc).
>> 
>> I think D stands a good chance as a language of choice for 
>> bioinformatics projects.
>> 
>> George
>
> My "prejudice", based on training people in Python and C++ over 
> the
> last few years, is that Python and C++ have a very strong 
> position in
> the bioinformatics community, with the use of IPython (now 
> becoming
> Jupyter) increasing and solidifying the Python position.
>
> D's position is quite weak here because one of the important 
> things is
> visualising data, something SciPy/Matplotlib are very good at. 
> D has
> no real play in this arena and so there is no way (currently) of
> creating a foothold. Sad, but…

As Andrew Brown pointed out, visualization is not behind Pythons 
success. Its success lies in the fact that it's a language you 
can hack away in easily. Almost everybody who has to do some data 
processing (most researchers do these days) and has limited or no 
experience with programming will opt for Python: easy (at 
first!), well-documented and everyone else uses it. However, the 
initial euphoria of being able to automatically rename files and 
extract value X from file Y soon gives way to frustration when it 
comes to performance.

The paper shows well that in a world where data processing is of 
utmost importance, and we're talking about huge sets of data, 
languages like Python don't cut it anymore. Two things are 
happening at the moment: on the one hand people still use Python 
for various reasons (see above and hundreds of posts on this 
forum), at the same time there's growing discontent among 
researchers, scientists and engineers as regards performance, 
simply because the data sets are becoming bigger and bigger every 
day and the algorithms are getting more and more refined. Sooner 
or later people will have to find new ways, out of sheer 
necessity.

Don't forget that "the state of the art" can change very quickly 
in IT and the name of the game is anticipating new developments 
rather than taking snapshots of the current state of the art and 
frame them. D really has a lot to offer for data processing and I 
wouldn't rule it out that more and more programmers will turn to 
it for this task.


More information about the Digitalmars-d mailing list