fasta parser with iopipe?
biocyberman via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Aug 23 02:53:49 PDT 2017
I lost my momentum to learn D and want to gain it up again.
Therefore I need some help with this seemingly simple task:
# Fasta sequence
> \>Entry1_ID header field1|header field2|...
> CAGATATCTTTGATGTCCTGATTGGAAGGACCGTTGGCCCCCCACCCTTAGGCAG
> TGTATACTCTTCCATAAACGAGCTATTAGTTATGAGGTCCGTAGATTGAAAAGGG
> TGACGGAATTCGGCCGAACGGGAAAGACGGACATCTAGGTATCCTGAGCACGGTT
> GCGCGTCCGTATCAAGCTCCTCTTTATAGGCCCCG
> \>Entry2_ID header field1|header field4|...
> GTTACTGTTGGTCGTAGAGCCCAGAACGGGTTGGGCAGATGTACGACAATATCGCT
> TAGTCACCCTTGGGCCACGGTCCGCTACCTTACAGGAATTGAGA
> \>Entry3_ID header field1|header field2|...
> GGCAGTACGATCGCACGCCCCACGTGAACGATTGGTAAACCCTGTGGCCTGTGAGC
> GACAAAAGCTTTAATGGGAAATACGCGCCCATAACTTGGTGCGA
# Some characteristics:
- Entry_ID is >[[:alphanumeric:]]. Where '>' marks the entry
start. In this post I have to put an escape character (\) to make
the '>' visible.
- Headers may contain annotation information separated by some
delimiter (i.e. | in this case).
- Entry ID and header is a single line, which does not contain
newline characters.
- Sequence under the header line is [ATCGN\n]* (Perl regex).
- A fasta file can be plain-text or gzip compressed.
# Goals:
Write a parser that uses Dlang range with iopipe library for
performance and ease of use. A big fasta file can be dozens of
gigabytes.
# Questions:
1. How do I model a fasta entry with a struct or class?
2. How to I implement a range of fasta entries with iopipe. A
range in this case can be a forward range, but preferably a
random access range.
3. I want to do with range to explore the power and elegance of
ranges. But if performance is a big concern, what can I do
alternatively?
More information about the Digitalmars-d-learn
mailing list