[OT] Application case study comparing Java, Go, and C++

Jon Degenhardt jond at noreply.com
Thu Feb 28 23:50:44 UTC 2019


On Thursday, 28 February 2019 at 22:58:54 UTC, Seb wrote:
> On Thursday, 28 February 2019 at 20:48:01 UTC, Jon Degenhardt 
> wrote:
>> This paper may be of interest to people here:
>>
>> "A comparison of three programming languages for a 
>> full-fledged next-generation sequencing tool", P.Costanza, 
>> C.Herzeel, W.Verachrert
>> https://doi.org/10.1101/558056
>>
>> The paper compares implementations of a tool operating on 
>> SAM/BAM files (bioinformatics) from a performance perspective. 
>> Focus is on comparison of GC schemes used in Go and Java with 
>> reference counting in C++. The GC schemes were materially 
>> faster.
>>
>> I'm not familiar with the authors or the implementations, so 
>> cannot say how well the implementations were done. However, it 
>> appears to be a useful case study, and the authors go provide 
>> a fair bit of analysis in the paper.
>>
>> There's a reddit thread also: 
>> https://www.reddit.com/r/programming/comments/avsfc6/performance_comparison_of_go_c_and_java_for/
>
> I wouldn't give much value to this paper. It hasn't been peer 
> reviewed and I doubt it would pass any. A quick example:
>
> "It [their tool] can be used as a drop-in replacement for many 
> operations implemented by SAMtools [...]". Though no 
> performance comparison was done against samtools (nor any other 
> tools expect their own implementations). I find this pretty 
> shocking, because their entire paper's purpose is about 
> performance...
>
> For reference, samtools is the de-facto standard for a reason 
> (yes it's old and written in C).
>
> Though, to be fair sambamba (written in D) is faster than the C 
> "standard" implementation:
>
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765878

They do have benchmark comparisons against GATK 4 in another 
paper:
"elPrep 4: A multithreaded framework for sequence analysis"
https://doi.org/10.1371/journal.pone.0209523

I'm not so familiar with these tool sets. How does GATK 4 stack 
up against other tools?

 From the paper it looks like many of the performance gains over 
GATK 4 resulted from architecture and algorithm changes, so it 
may not be valid from the perspective of comparing C++/Go/Java 
and GC vs reference counting.


More information about the Digitalmars-d mailing list