evenChunks on a string - hasLength constraint fails?

Paul Backus snarwin at gmail.com
Tue Mar 14 18:41:50 UTC 2023


On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:
> I'm trying to understand why this doesn't work. I don't really 
> understand the error. If I interpret this correctly, it's 
> missing a length attribute on a string, but shouldn't length be 
> there?

By default, D's standard library treats a `string` as a range of 
Unicode code points (i.e., a range of `dchar`s), encoded in 
UTF-8. Because UTF-8 is a variable-length encoding, it's 
impossible to know how many code points there are in a `string` 
without iterating it--which means that, as far as the standard 
library is concerned, `string` does not have a valid `.length` 
property.

This behavior is known as "auto decoding", and is described in 
more detail in this article by Jack Stouffer:

https://jackstouffer.com/blog/d_auto_decoding_and_you.html

If you do not want the standard library to treat your `string` as 
an array of code points, you must use a wrapper like 
[`std.utf.byCodeUnit`][1] (to get a range of `char`s) or 
[`std.string.representation`][2] (to get a range of `ubyte`s). 
For example:

```d
auto parts = evenChunks(line.byCodeUnit, 2);
```

Of course, if you do this, there is a risk that you will split a 
code point in half and end up with invalid Unicode. If your 
program needs to handle Unicode input, you would be better off 
finding a different solution—for example, you could use 
[`std.range.primitives.walkLength`][3] to compute the midpoint of 
the range by hand, and split it using [`std.range.chunks`][4]:

```d
size_t length = line.walkLength;
auto parts = chunks(line, length / 2);
```

[1]: https://phobos.dpldocs.info/std.utf.byCodeUnit.html
[2]: https://phobos.dpldocs.info/std.string.representation.html
[3]: 
https://phobos.dpldocs.info/std.range.primitives.walkLength.1.html
[4]: https://phobos.dpldocs.info/std.range.chunks.html


More information about the Digitalmars-d-learn mailing list