Parsing a UTF-16LE file line by line, BUG?
Era Scarecrow via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jan 26 20:26:31 PST 2017
On Tuesday, 17 January 2017 at 11:40:15 UTC, Nestor wrote:
> Thanks, but unfortunately this function does not produce proper
> UTF8 strings, as a matter of fact the output even starts with
> the BOM. Also it doesn't handle CRLF, and even for LF
> terminated lines it doesn't seem to work for lines other than
> the first.
I thought you wanted to get line by line of contents, which
would then remain as UTF-16. Translating between the two types
shouldn't be hard, probably to!string or a foreach with appending
to code-units on chars would convert to UTF-8.
Skipping the BOM is just a matter of skipping the first two
bytes identifying it...
> I guess I have to code encoding detection, buffered read, and
> transcoding by hand, the only problem is that the result could
> be sub-optimal, which is why I was looking for a built-in
> solution.
Maybe. Honestly I'm not nearly as familiar with the library or
functions as I would love to be, so often home-made solutions
seem more prevalent until I learn the lingo. A disadvantage of
being self taught.
More information about the Digitalmars-d-learn
mailing list