Announcement and Request: Typesafe Coordinate Systems for High-Throughput Sequencing Applications
Arne Ludwig
arne.ludwig at posteo.de
Wed Sep 1 09:01:27 UTC 2021
On Wednesday, 1 September 2021 at 05:36:53 UTC, James Blachly
wrote:
> In another post, I've just announced our D-based high
> throughput sequencing library, dhtslib.
>
> One feature that is, AFAIK, novel in the field is leveraging
> the compiler's type system to enforce correctness regarding
> different genome/reference sequence coordinate systems.
> Clearly, the encoding of domain specific knowledge in a
> language's type system is nothing new, but it is surprising
> that this has not been done before in bioinformatics, and it is
> an idea that IMO is long overdue given the trainwreck of
> different coordinate systems in our field.
>
> You can find dhtslib's develop branch, with Typesafe
> Coordinates merged and ready to use, here:
>
> https://github.com/blachlylab/dhtslib/
>
>
> **Now the request:**
> We've drafted a manuscript describing Typesafe Coordinates as a
> sort of low-key endorsement of the D language and our library
> package `dhtslib`. You can find the manuscript here:
>
> https://github.com/blachlylab/typesafe-coordinates/
>
> We would be very grateful to those of you who would take the
> time to read the manuscript and post comments (publicly or
> privately), _especially if we have made any incorrect
> statements_ or our language regarding type systems is awkward
> or nonstandard.
>
> We did praise D, and gently criticized Rust and OCaml* somewhat
> as it appeared to me that they lacked the features required to
> implement Typesafe Coordinate Systems in as ergonomic a way as
> we could in D. However, being a true novice at both of these
> other languages there is the possibility that I've missed
> something significant, and that the Rust and OCaml
> implementations could be retooled to match the D
> implementation. I'd still be glad to hear it if that's the case.
>
> I plan to make a few minor cleanups and submit this to a
> preprint server as well as a scientific journal in the next
> week or so.
>
> Kind regards
>
> James S Blachly, MD
> The Ohio State University
>
>
> * as a side note, I actually find the OCaml code quite
> attractive in its terseness: `let j = cl_interval_of_ho
> (ob_interval_of_zb i)`
Hi James and Charles,
I am happy to hear of your latest idea of creating type-safe
coordinate systems. It's a great idea!
After reading the code on GitHub, I have only one major remark:
IMHO, it would be great to separate the novel coordinates systems
from any `htslib` dependencies ([see lines
47-50](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L47-L50)) as there are only auxiliary functions that use both the novel coordinates systems and `htslib`. The greater goal I have in mind is to provide the coordinate systems in a separate DUB sub-package (e.g. `dhtslib:coordinates`) that requires only a D compiler. That makes integration into existing projects that do not need `htslib` much easier.
Also, I have a short list of minor, technical remarks:
1. The returned type in [line
114](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L114) has a typo, there is an additional 's'.
2. The array of identifiers `CoordSystemLabels` in [line
203](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L203) is a bit unsafe and not strictly required for two reasons:
1. It can by generated by the compiler using `enum
CoordSystemLabels = __traits(allMembers, CoordSystem);`.
2. As far as I can tell its only application is in [line
376](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L376). The same result can be achieved safely using `cs.stringof.split('.')[$ - 1]` or without use of `std.array.split`: `cs.stringof[CoordSystem.stringof.length + 1 .. $]`.
3. The function `unionImpl` in [line
326](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L326) actually computes the convex hull of the two intervals which should be noted in the doc comment for completeness' sake.
4. I have noted that you use operator overloading for union and
intersection of `Interval`s. You may also add overloads for the
`offset` function in both `Interval` and `Coordinate` with `auto
opBinary(string op, T)(T off) if ((op == '+' || op == '-') &&
isIntegral!T)` and `auto opBinaryRight(string op, T)(T off) if
((op == '+' || op == '-') && isIntegral!T)`.
I enjoyed reading the manuscript. It highlights the issue clearly
and presents the solution without getting lost in details.
Ignoring typos at this stage, I have no remarks on it – keep
going!
Cheers!
-- Arne
More information about the Digitalmars-d-announce
mailing list