Reading text (I mean "real" text...)

Denis noreply at noserver.lan
Sat Jun 20 07:32:51 UTC 2020


Digging into this a bit further --

POSIX defines a "print" class, which I believe is an exact fit. 
The Unicode spec doesn't define this class, which I presume is 
why D's std.uni library also omits it. But there is an isprint() 
function in libc, which I should be able to use (POSIX here). 
This function refers to the system locale, so it isn't limited to 
ASCII characters (unlike std.ascii:isPrintable).

So that's one down, two to go:

   Loop until newline or EOF
    (1) Read bytes or character             } Possibly
    (2) Decode UTF-8, exception if invalid  } together
    (3) Call isprint(), exception if invalid
   Return line

(This simplified outline obviously doesn't show how to deal with 
the complications arising from using buffers, handling codepoints 
that straddle the end of the buffer, etc.)

Where I'm still stuck is the read or read-and-auto-decode: this 
is where the waters get really muddy for me. Three different 
techniques for reading characters are suggested in this thread 
(iopipe, ranges, rawRead): 
https://forum.dlang.org/thread/cgteipqqfxejngtpgbbt@forum.dlang.org

I'd like to stick with standard D or C libraries initially, so 
that rules out iopipe for now. What would really help is some 
details about what one read technique does particularly well vs. 
another. And is there a technique that seems more suited to this 
use case than the rest?

Thanks again


More information about the Digitalmars-d-learn mailing list