Parsing a UTF-16LE file line by line, BUG?

Era Scarecrow via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 26 20:26:31 PST 2017


On Tuesday, 17 January 2017 at 11:40:15 UTC, Nestor wrote:
> Thanks, but unfortunately this function does not produce proper 
> UTF8 strings, as a matter of fact the output even starts with 
> the BOM. Also it doesn't handle CRLF, and even for LF 
> terminated lines it doesn't seem to work for lines other than 
> the first.

  I thought you wanted to get line by line of contents, which 
would then remain as UTF-16. Translating between the two types 
shouldn't be hard, probably to!string or a foreach with appending 
to code-units on chars would convert to UTF-8.

  Skipping the BOM is just a matter of skipping the first two 
bytes identifying it...

> I guess I have to code encoding detection, buffered read, and 
> transcoding by hand, the only problem is that the result could 
> be sub-optimal, which is why I was looking for a built-in 
> solution.

  Maybe. Honestly I'm not nearly as familiar with the library or 
functions as I would love to be, so often home-made solutions 
seem more prevalent until I learn the lingo. A disadvantage of 
being self taught.


More information about the Digitalmars-d-learn mailing list