How to pass an InputRange of dchars to a function that wants an Input range of chars?

Jonathan M Davis newsgroup.d at jmdavisprog.com
Thu Feb 20 02:48:48 UTC 2025


On Wednesday, February 19, 2025 11:13:24 AM MST realhet via Digitalmars-d-learn wrote:
> Hello,
>
> The problematic line is at the very bottom.
>
> I only managed to make it run by precedding the .byChar with
> .text, but that is unwanted because it would convert the whole
> InputRange.
>
> How can I do this dchar->char conversion on only the required
> number of chars? (The kwSearch function will pop only a few
> chars, it doesn't whant a whole RandomAccessRange)

I don't have time at the moment to decipher your code and figure out what
you're doing, but at a glance, it looks like you're expecting strings to be
treated as ranges of immutable char, and they're not.

Phobos treats all strings as ranges of dchar. We call it auto-decoding,
because it means that the range API is automatically decoding UTF-8 and
UTF-16 code units to UTF-32. It was an attempt to make Unicode handling
correct by default (by making it harder to accidentally split code points),
but it doesn't actually succeed at providing full Unicode-correctness (since
graphemes and normalization are a thing), and dealing with auto-decoding can
be pretty annoying. So, we'd like to get rid of it in the next major version
of Phobos and just treat all arrays as arrays of their actual element types,
but for now, we have to deal with the range API treating all arrays of char
and wchar as bidirectional ranges of dchar.

So, if you're looking to do anything with ranges of char, you can't use
strings directly. byChar is one way to wrap a string to get a range of char.
byCodeUnit would be another so long as it's an array of char specifically
(rather than an array of wchar or dchar - for those, byCodeUnit would give
you a range of wchar and dchar respectively). Neither of them actually
converts the underlying range. Rather, they wrap it and lazily convert the
elements as you access them.

So you should probably either just make your code operate on ranges of
dchar, or you'll need to wrap your ranges using byChar or byCodeUnit in
order to get ranges of char. All range-based functions will treat your
strings as ranges of dchar. So, if really need to have strings and be
treating them as ranges of char without wrapping them and without
potentially creating new strings from wrapped ranges, then you can't use any
range-based functions to do what you're doing.

If you're using byCodeUnit (or you use byChar on an array of char, which in
turn uses byCodeUnit), then you can use the source member on the result to
get the underlying string back at whatever point it is in the iteration, but
in general, if you pass a string wrapped by byCodeUnit to any range-based
function that returns its own range type, then you can't convert back to a
string without using something like std.conv.to to allocate a new string.

Range-based functions which are eager rather than lazy (e.g. find) will
return the original range, but a large percentage of range-based functions
are lazy and will return wrapped ranges. So, depending on what you're doing,
it's going to be difficult to do a bunch of range-based operations and then
get a string at the end of the result witout allocating a new string - even
if strings were treated as ranges of their actual element type.

- Jonathan M Davis





More information about the Digitalmars-d-learn mailing list