They wrote the fastest parallelized BAM parser in D

Andrew Brown via Digitalmars-d digitalmars-d at puremagic.com
Tue Mar 31 01:08:58 PDT 2015


Visualisation is certainly not behind python's success in 
bioinformatics, which predates ipython. If you look through 
journals, very few of the figures are done in python (and none at 
all in julia). It succeeded because it allows you to hack your 
way through massive text files and it's not perl.

One problem with using D instead of C or C++ for projects like 
this, is that these projects are a few people developing software 
for many users, who are working on frequently very old clusters 
where they don't have admin rights. Getting an executable file to 
work for them is not trivial. Programs like samtools solve this 
by expecting people to compile it themselves, knowing they can 
rely on gcc to be installed. But none of these clusters have a D 
compiler handy.

On my university, out of the box executables for ldc don't run, 
gdc executable files don't link with libc, and dmd sometimes 
shouts it can't find dmd.conf. And this is a fairly up to date 
and well administered cluster, I know quite a few instituions 
still on centOS 5. Now, I can work to fix these problems for 
myself, but I can't expect a user spend 3 hours compiling llvm, 
then ldc and various libraries to use my software, rather than 
just look for the C/C++ equivalent.

Yesterday I was asked if I'd rewrite my code in C++ to solve this 
problem, not really an option as I don't know C++. I guess this 
is a fairly niche issue, D Learn kindly pointed me in the 
direction of VMs which I think will solve most of my problems. 
The sambabamba authors seem to be sharing dockers (congrat on the 
paper by the way!). But I think it is a factor to be considered 
when using D: disseminating software is trickier than with C/C++.

On Tuesday, 31 March 2015 at 03:30:09 UTC, Laeeth Isharc wrote:
> On Tuesday, 31 March 2015 at 02:31:58 UTC, Craig Dillabaugh 
> wrote:
>> On Monday, 30 March 2015 at 22:55:37 UTC, lobo wrote:
>>> On Monday, 30 March 2015 at 20:25:33 UTC, CraigDillabaugh 
>>> wrote:
>>>> On Monday, 30 March 2015 at 20:09:35 UTC, Laeeth Isharc 
>>>> wrote:
>>>>>
>>>> clip
>>>>>
>>>>> You're right about the lack of visualization being a shame. 
>>>>> I have been thinking about porting Bokeh bindings to D.  
>>>>> There isn't much too it on the server side - all you need 
>>>>> to do is build up the object model and translate it to JSON 
>>>>> - but I have not time right now to do it all myself.
>>>>>
>>>> clip
>>>>
>>>> A comment on the visualization thing. Is this really a big 
>>>> issue?
>>> [snip]
>>>
>>> Yes of course, why do you think Pyhton + sciPy/Numpy has such 
>>> a foothold in the scientific community. Visualisation is an 
>>> important part of data processing pipeline.
>>>
>>> It's also why Matlab is so useful for those lucky enough to 
>>> work for a company that can afford it.
>>>
>>> bye,
>>> lobo
>>
>> My point wasn't that visualization isn't important, it is that 
>> in most scientific computing it is very easy (and sensible) to 
>> separate the processing and visualization aspects.  So lack of 
>> D visualization tools should not hinder  its value as a data 
>> processing tool.
>>
>> For example, Hadoop is immensely popular for data processing, 
>> but it includes no visualization tools. That is a slightly 
>> different domain I understand, but there are similarities.
>>
>> So in short, if there were nice D visualization tools that 
>> would certainly be helpful, but I don't think is should be a 
>> show stopper.
>
> Yes, I tried to pick my words carefully.  It is not a disaster, 
> as a someone seemed to imply, but it would be nice to have 
> visualization, particularly for interactive exploration of 
> data.  One is back to Walter's quote about the two language 
> combination being an indicator that something is lacking.



More information about the Digitalmars-d mailing list