std.uni, std.ascii, std.encoding, std.utf ugh!

Wed May 6 10:57:59 UTC 2020

On Tuesday, 5 May 2020 at 19:24:41 UTC, WebFreak001 wrote:
> On Tuesday, 5 May 2020 at 18:41:50 UTC, learner wrote:
>> Good morning,
>>
>> Trying to do this:
>>
>> ```
>> bool foo(string s) nothrow { return s.all!isDigit; }
>> ```
>>
>> I realised that the conversion from char to dchar could throw.
>>
>> I need to validate and operate over ascii strings and utf8 
>> strings, possibly in separate functions, what's the best way 
>> to transition between:
>>
>> ```
>> immutable(ubyte)[] -> validate utf8 -> string -> nothrow usage 
>> -> isDigit etc
>> immutable(ubyte)[] -> validate ascii -> AsciiString? -> 
>> nothrow usage -> isDigit etc
>> string             -> validate ascii -> AsciiString? -> 
>> nothrow usage -> isDigit etc
>> ```
>>
>> Thank you

Thank you WebFreak,

>
> if you want nothrow operations on the sequence of characters 
> (bytes) of the strings, use `str.representation` to get 
> `immutable(ubyte)[]` and work on that. This is useful for 
> example for doing indexOf (countUntil), startsWith, endsWith, 
> etc. Make sure at least one of your inputs is validated though 
> to avoid potentially handling or cutting off unfinished code 
> points. I think this is the best way to go if you want to do 
> simple things.

What I really want is a way to validate an immutable(ubyte)[] 
sequence for UFT8 or ASCII, and from that point forward, apply 
functions like isDigit in nothrow functions.

> If your algorithm is sufficiently complex that you would like 
> to still decode but not crash, you can also manually call 
> .decode with UseReplacementDchar.yes to make it emit \uFFFD for 
> invalid characters.

I will simply reject invalid UTF8 input, that's coming from I/O

> To get the best of both worlds, use `.byUTF!dchar` which gives 
> you an input range to iterate over and defaults to using 
> replacement dchar. You can then call the various algorithm & 
> array functions on it.

Can you explain better?

> Unless you are working with different encodings than UTF-8 
> (like doing file or network operations) you shouldn't be 
> needing std.encoding.

I'm expecting UTF8 and ASCII encoding from I/O

Thank you!