Handling arbitrary char ranges
Alex Parrill via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Apr 20 19:35:31 PDT 2016
On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:
> On 20.04.2016 23:59, Alex Parrill wrote:
>> On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:
>>> [...]
>>
>> First, you can't assign anything to a void[], for the same
>> reason you
>> can't dereference a void*. This includes the slice assignment
>> that you
>> are trying to do in `buf[0..minLen] =
>> remainingData[0..minLen];`.
>
> Not true. You can assign any dynamic array to a void[].
That's not assigning the elements of a void[]; it's just changing
what the slice points to and adjusting the length, like doing
`void* ptr = someOtherPtr;`
> Regarding vector notation, the spec doesn't seem to mention how
> it interacts with void[], but dmd accepts this no problem:
> ----
> int[] i = [1, 2, 3];
> auto v = new void[](3 * int.sizeof);
> v[] = i[];
> ----
It only seems to work on arrays, not arbitrary ranges, sliceable
or not. Though see below.
> [...]
>> Second, don't use slicing on ranges (unless you need it). Not
>> all ranges
>> support it...
>
> As far as I see, the slicing code is guarded by `static if
> (isArray!T)`. Arrays support slicing.
>
> [...]
>> Instead, use a loop (or maybe `put`) to fill the array.
>
> That's what done in the `else` path, no?
Yes, I did not see the static if condition, my bad.
>> Third, don't treat text as bytes; encode your characters.
>>
>> auto schema = EncodingScheme.create("utf-8");
>> auto range = chain("hello", " ", "world").map!(ch =>
>> cast(char) ch);
>>
>> auto buf = new ubyte[](100);
>> auto currentPos = buf;
>> while(!range.empty && schema.encodedLength(range.front) <=
>> currentPos.length) {
>> auto written = schema.encode(range.front, currentPos);
>> currentPos = currentPos[written..$];
>> range.popFront();
>> }
>> buf = buf[0..buf.length - currentPos.length];
>
> You're "converting" chars to UTF-8 here, right? That's a nop.
> char is a UTF-8 code unit already.
It can be either chars, wchars, or dchars.
>> (PS there ought to be a range in Phobos that encodes each
>> character,
>> something like map maybe)
>
> std.utf.byChar and friends:
>
> https://dlang.org/phobos/std_utf.html#.byChar
byChar would work. byWChar and byDChar might cause endian-ness
issues.
More information about the Digitalmars-d-learn
mailing list