They wrote the fastest parallelized BAM parser in D
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Tue Mar 31 06:43:56 PDT 2015
On Tuesday, 31 March 2015 at 13:31:33 UTC, Chris wrote:
> On Tuesday, 31 March 2015 at 11:04:50 UTC, Laeeth Isharc wrote:
>>
>>> As Andrew Brown pointed out, visualization is not behind
>>> Pythons success. Its success lies in the fact that it's a
>>> language you can hack away in easily.
>>
>> Sounds right. I am not in the camp that says it is a killer
>> for D. It would just be nice to have both at least a passable
>> solution for visualization, and some way of making it
>> interactive. (The REPL might be one route). The problem with
>> separating the processes completely and just piping the output
>> from D code that does the heavy lifting to a python or julia
>> front end is it may make it more painful to play with and
>> explore the data. My interests are finance more than science,
>> so that may lead to a different set of needs. Finishing
>> mathgl and writing D bindings for bokeh (take a look - it is
>> pretty cool, particularly to be able to use the browser as
>> client, acknowledging that it is a tradeoff) is not so much
>> work. But some help on bokeh particularly would be nice, as I
>> fear picking one way of implementing the object structure and
>> later finding it is a mistake.
>>
>>> the initial euphoria of being able to automatically rename
>>> files and extract value X from file Y soon gives way to
>>> frustration when it comes to performance.
>>
>> Yep.
>>
>>> The paper shows well that in a world where data processing is
>>> of utmost importance, and we're talking about huge sets of
>>> data, languages like Python don't cut it anymore.
>>
>> I could not agree more, and I do think the intersection of two
>> trends creates tremendous opportunity for D. It's also
>> commonsensical to look at notable successes - and I hope it is
>> not just my biases that lead me to think many of these are in
>> just this kind of application. Data sets keep getting larger
>> (but not necessarily more information rich in dollar terms),
>> and Moore's Law/memory speed+latency is not keeping pace.
>> This is exactly the kind of change that creeps up on you
>> because not much changes in a few months (which is the kind of
>> horizon many of us tend to think in).
>>
>> People say "what is D's edge", but my personal perception is
>> "where is the competition for D" in this area. It has to be
>> native code/JIT, and I refuse to learn Java; it also should be
>> plastic and lend itself to rapid iteration.
>>
>>> at the same time there's growing discontent among
>>> researchers, scientists and engineers as regards performance,
>>> simply because the data sets are becoming bigger and bigger
>>> every day and the algorithms are getting more and more
>>> refined. Sooner or later people will have to find new ways,
>>> out of sheer necessity.
>>
>> upvote. I would love to see any references you have on this -
>> not because it's not rather obvious to me, but because it is
>> helpful when talking to other people.
>
> The article that gave rise to this thread is a good reference.
>
> I came from a slightly different angle, I looked for
> alternatives to Python, because I needed:
>
> 1. fast native execution (real time)
> 2. easy interfacing to C
> 3. cross-platform development
>
> (Modern convenience, templates, ranges etc. were bonuses I
> discovered bit by bit)
>
> As regards algorithms and data processing, most people in
> research use Matlab (proprietary) and Python. However, in my
> field they're useless when it comes to building data-driven
> systems (fast analysis, retraining of machine based on (slight)
> modifications), and putting computationally heavy algorithms
> into real world applications. Proof of concept is all it
> amounts to, usually.
>
> So D has a real chance here, because of
>
> 1. native code
> 2. modern convenience
> 3. templates, structs, mixins, ranges, std.algorithm etcetc.
> 4. interfacing to C libs
>
>>> Don't forget that "the state of the art" can change very
>>> quickly in IT and the name of the game is anticipating new
>>> developments rather than taking snapshots of the current
>>> state of the art and frame them. D really has a lot to offer
>>> for data processing and I wouldn't rule it out that more and
>>> more programmers will turn to it for this task.
>>
>> I fully agree. If we started a section on use cases, would
>> you be able to write a page or two on D's advantages in data
>> processing?
>
> I think that Dicebot et al would have good examples.
It'd be nice, if we had a dedicated data-analysis section and/or
library. I'm almost sure that people working with massive amounts
of data would find it by googling "efficient data analysis" or
something like that.
Facebook probably has a wealth of data analysis examples /
techniques, too.
More information about the Digitalmars-d
mailing list