std.uni, std.ascii, std.encoding, std.utf ugh!
learner
learner at nomail.com
Wed May 6 10:57:59 UTC 2020
On Tuesday, 5 May 2020 at 19:24:41 UTC, WebFreak001 wrote:
> On Tuesday, 5 May 2020 at 18:41:50 UTC, learner wrote:
>> Good morning,
>>
>> Trying to do this:
>>
>> ```
>> bool foo(string s) nothrow { return s.all!isDigit; }
>> ```
>>
>> I realised that the conversion from char to dchar could throw.
>>
>> I need to validate and operate over ascii strings and utf8
>> strings, possibly in separate functions, what's the best way
>> to transition between:
>>
>> ```
>> immutable(ubyte)[] -> validate utf8 -> string -> nothrow usage
>> -> isDigit etc
>> immutable(ubyte)[] -> validate ascii -> AsciiString? ->
>> nothrow usage -> isDigit etc
>> string -> validate ascii -> AsciiString? ->
>> nothrow usage -> isDigit etc
>> ```
>>
>> Thank you
Thank you WebFreak,
>
> if you want nothrow operations on the sequence of characters
> (bytes) of the strings, use `str.representation` to get
> `immutable(ubyte)[]` and work on that. This is useful for
> example for doing indexOf (countUntil), startsWith, endsWith,
> etc. Make sure at least one of your inputs is validated though
> to avoid potentially handling or cutting off unfinished code
> points. I think this is the best way to go if you want to do
> simple things.
What I really want is a way to validate an immutable(ubyte)[]
sequence for UFT8 or ASCII, and from that point forward, apply
functions like isDigit in nothrow functions.
> If your algorithm is sufficiently complex that you would like
> to still decode but not crash, you can also manually call
> .decode with UseReplacementDchar.yes to make it emit \uFFFD for
> invalid characters.
I will simply reject invalid UTF8 input, that's coming from I/O
> To get the best of both worlds, use `.byUTF!dchar` which gives
> you an input range to iterate over and defaults to using
> replacement dchar. You can then call the various algorithm &
> array functions on it.
Can you explain better?
> Unless you are working with different encodings than UTF-8
> (like doing file or network operations) you shouldn't be
> needing std.encoding.
I'm expecting UTF8 and ASCII encoding from I/O
Thank you!
More information about the Digitalmars-d-learn
mailing list