They wrote the fastest parallelized BAM parser in D

John Colvin via Digitalmars-d digitalmars-d at puremagic.com
Tue Mar 31 01:31:02 PDT 2015


On Tuesday, 31 March 2015 at 08:09:00 UTC, Andrew Brown wrote:
> Visualisation is certainly not behind python's success in 
> bioinformatics, which predates ipython. If you look through 
> journals, very few of the figures are done in python (and none 
> at all in julia). It succeeded because it allows you to hack 
> your way through massive text files and it's not perl.
>
> One problem with using D instead of C or C++ for projects like 
> this, is that these projects are a few people developing 
> software for many users, who are working on frequently very old 
> clusters where they don't have admin rights. Getting an 
> executable file to work for them is not trivial. Programs like 
> samtools solve this by expecting people to compile it 
> themselves, knowing they can rely on gcc to be installed. But 
> none of these clusters have a D compiler handy.
>
> On my university, out of the box executables for ldc don't run, 
> gdc executable files don't link with libc, and dmd sometimes 
> shouts it can't find dmd.conf. And this is a fairly up to date 
> and well administered cluster, I know quite a few instituions 
> still on centOS 5. Now, I can work to fix these problems for 
> myself, but I can't expect a user spend 3 hours compiling llvm, 
> then ldc and various libraries to use my software, rather than 
> just look for the C/C++ equivalent.
>
> Yesterday I was asked if I'd rewrite my code in C++ to solve 
> this problem, not really an option as I don't know C++. I guess 
> this is a fairly niche issue, D Learn kindly pointed me in the 
> direction of VMs which I think will solve most of my problems. 
> The sambabamba authors seem to be sharing dockers (congrat on 
> the paper by the way!). But I think it is a factor to be 
> considered when using D: disseminating software is trickier 
> than with C/C++.

Building LDC and its depedencies isn't that difficult, but it was 
still a pain to have to do that just to compile my code for the 
cluster.

There needs to be some sort of bootstrap script, downloads 
included, available to go from a bare bones c++ toolchain to a 
working D compiler. Or even just some executables online compiled 
with an ancient glibc.


More information about the Digitalmars-d mailing list