They wrote the fastest parallelized BAM parser in D
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Tue Mar 31 02:21:11 PDT 2015
On Monday, 30 March 2015 at 18:23:31 UTC, Russel Winder wrote:
> On Mon, 2015-03-30 at 18:04 +0000, george via Digitalmars-d
> wrote:
>> > .NET actually already has a foothold in bioinformatics,
>> > specially in user facing software and steering of reading
>> > equipments and robots.
>> >
>> > So D's needs a story over C# and F# (alongside WPF for data
>> > visualization) use cases.
>> >
>> > --
>> > Paulo
>
> Paulo,
>
> Can you send me some pointers to this stuff?
>
>>
>> Though when it comes to open source bioinformatics projects,
>> Perl and Python have a large foothold
>> among most most bioinformaticians. Most utilities that require
>> speed are often written in C and C++ (BLAST, HMMER, SAMTOOLS
>> etc).
>>
>> I think D stands a good chance as a language of choice for
>> bioinformatics projects.
>>
>> George
>
> My "prejudice", based on training people in Python and C++ over
> the
> last few years, is that Python and C++ have a very strong
> position in
> the bioinformatics community, with the use of IPython (now
> becoming
> Jupyter) increasing and solidifying the Python position.
>
> D's position is quite weak here because one of the important
> things is
> visualising data, something SciPy/Matplotlib are very good at.
> D has
> no real play in this arena and so there is no way (currently) of
> creating a foothold. Sad, but…
As Andrew Brown pointed out, visualization is not behind Pythons
success. Its success lies in the fact that it's a language you
can hack away in easily. Almost everybody who has to do some data
processing (most researchers do these days) and has limited or no
experience with programming will opt for Python: easy (at
first!), well-documented and everyone else uses it. However, the
initial euphoria of being able to automatically rename files and
extract value X from file Y soon gives way to frustration when it
comes to performance.
The paper shows well that in a world where data processing is of
utmost importance, and we're talking about huge sets of data,
languages like Python don't cut it anymore. Two things are
happening at the moment: on the one hand people still use Python
for various reasons (see above and hundreds of posts on this
forum), at the same time there's growing discontent among
researchers, scientists and engineers as regards performance,
simply because the data sets are becoming bigger and bigger every
day and the algorithms are getting more and more refined. Sooner
or later people will have to find new ways, out of sheer
necessity.
Don't forget that "the state of the art" can change very quickly
in IT and the name of the game is anticipating new developments
rather than taking snapshots of the current state of the art and
frame them. D really has a lot to offer for data processing and I
wouldn't rule it out that more and more programmers will turn to
it for this task.
More information about the Digitalmars-d
mailing list