dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

jfondren julian.fondren at gmail.com
Sun Nov 7 18:44:45 UTC 2021


On Sunday, 7 November 2021 at 02:12:36 UTC, zjh wrote:
> On Sunday, 7 November 2021 at 01:59:47 UTC, jfondren wrote:
>> On Sunday, 7 November 2021 at 01:12:19 UTC, zjh wrote:
>
> Rust has more than ten `kinds` of strings. Maybe we can add 
> `2/3` one.

Meanwhile, in Rust:

```rust
#[cfg(test)]
mod tests {
     fn type_of<T>(_: T) -> &'static str {
         core::any::type_name::<T>()
     }
     const INVALID: &'static str = unsafe {
         std::str::from_utf8_unchecked(&[
             0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xa7, 0x85, 0xaf, 0x74, 
0x68, 0x65, 0x72, 0x65,
         ])
     };
     #[test]
     fn iter_invalid() {
         for c in INVALID.chars() {
             println!("{} {}, {}", type_of(c), c as u32, c);
         }
     }
}
```

If you smuggle invalid UTF into a type that Rust expects to be 
valid UTF (the same case as `string` in D, allegedly), then 
Rust's equivalent of `foreach (dchar c; str) { }` just emits 
invalid chars -- two of 'em, somehow.

104, 101, 108, 108, 110 - "hello"
453, 1012 - ???
104, 101, 114, 101 - "here" (the 't' is lost)

This is similar to `foreach (dchar c; 
std.encoding.codePoints(str)) { }` which emits three dchars 
between "hello" and "there", but which also has an assert failure 
in non-release builds.


More information about the Digitalmars-d mailing list