evenChunks on a string - hasLength constraint fails?
Paul Backus
snarwin at gmail.com
Tue Mar 14 18:41:50 UTC 2023
On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:
> I'm trying to understand why this doesn't work. I don't really
> understand the error. If I interpret this correctly, it's
> missing a length attribute on a string, but shouldn't length be
> there?
By default, D's standard library treats a `string` as a range of
Unicode code points (i.e., a range of `dchar`s), encoded in
UTF-8. Because UTF-8 is a variable-length encoding, it's
impossible to know how many code points there are in a `string`
without iterating it--which means that, as far as the standard
library is concerned, `string` does not have a valid `.length`
property.
This behavior is known as "auto decoding", and is described in
more detail in this article by Jack Stouffer:
https://jackstouffer.com/blog/d_auto_decoding_and_you.html
If you do not want the standard library to treat your `string` as
an array of code points, you must use a wrapper like
[`std.utf.byCodeUnit`][1] (to get a range of `char`s) or
[`std.string.representation`][2] (to get a range of `ubyte`s).
For example:
```d
auto parts = evenChunks(line.byCodeUnit, 2);
```
Of course, if you do this, there is a risk that you will split a
code point in half and end up with invalid Unicode. If your
program needs to handle Unicode input, you would be better off
finding a different solution—for example, you could use
[`std.range.primitives.walkLength`][3] to compute the midpoint of
the range by hand, and split it using [`std.range.chunks`][4]:
```d
size_t length = line.walkLength;
auto parts = chunks(line, length / 2);
```
[1]: https://phobos.dpldocs.info/std.utf.byCodeUnit.html
[2]: https://phobos.dpldocs.info/std.string.representation.html
[3]:
https://phobos.dpldocs.info/std.range.primitives.walkLength.1.html
[4]: https://phobos.dpldocs.info/std.range.chunks.html
More information about the Digitalmars-d-learn
mailing list