Handling arbitrary char ranges

ag0aep6g via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Apr 20 15:44:37 PDT 2016


On 20.04.2016 23:59, Alex Parrill wrote:
> On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:
>> [...]
>
> First, you can't assign anything to a void[], for the same reason you
> can't dereference a void*. This includes the slice assignment that you
> are trying to do in `buf[0..minLen] = remainingData[0..minLen];`.

Not true. You can assign any dynamic array to a void[].

Regarding vector notation, the spec doesn't seem to mention how it 
interacts with void[], but dmd accepts this no problem:
----
int[] i = [1, 2, 3];
auto v = new void[](3 * int.sizeof);
v[] = i[];
----

[...]
> Second, don't use slicing on ranges (unless you need it). Not all ranges
> support it...

As far as I see, the slicing code is guarded by `static if (isArray!T)`. 
Arrays support slicing.

[...]
> Instead, use a loop (or maybe `put`) to fill the array.

That's what done in the `else` path, no?

> Third, don't treat text as bytes; encode your characters.
>
>      auto schema = EncodingScheme.create("utf-8");
>      auto range = chain("hello", " ", "world").map!(ch => cast(char) ch);
>
>      auto buf = new ubyte[](100);
>      auto currentPos = buf;
>      while(!range.empty && schema.encodedLength(range.front) <=
> currentPos.length) {
>          auto written = schema.encode(range.front, currentPos);
>          currentPos = currentPos[written..$];
>          range.popFront();
>      }
>      buf = buf[0..buf.length - currentPos.length];

You're "converting" chars to UTF-8 here, right? That's a nop. char is a 
UTF-8 code unit already.

> (PS there ought to be a range in Phobos that encodes each character,
> something like map maybe)

std.utf.byChar and friends:

https://dlang.org/phobos/std_utf.html#.byChar


More information about the Digitalmars-d-learn mailing list