What type does byGrapheme() return?

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Dec 31 22:53:58 UTC 2019


On Tue, Dec 31, 2019 at 04:36:56PM -0500, Steven Schveighoffer via Digitalmars-d-learn wrote:
> On 12/31/19 4:22 PM, H. S. Teoh wrote:
[...]
> > 	import std;
> > 	void main() {
> > 		auto x = "Bla\u0301hbla\u0310h\u0309!";
> > 		auto r = x.byGrapheme;
> > 		writefln("%s", r.map!((ref g) => g[]).joiner.to!string);
> > 	}
[...]
> > What did I do wrong?
> 
> auto r = x.byGrapheme.array;

Haha, in my hurry I totally forgot about the .array. Mea culpa.


[...]
> The fact that a Grapheme's return requires you keep the grapheme in
> scope for operations seems completely incorrect and dangerous IMO
> (note that operators are going to always have a ref this, even when
> called on an rvalue). So even though using ref works, I think the
> underlying issue here really is the lifetime problem.
[...]

After my wrong recollection of the history surrounding indexOf vs.
countUntil, I'm not sure I can rely on my memory anymore, :-P but AIUI
Dmitri implemented it this way because he wanted to avoid allocations
(GC or otherwise) in the most common case of Grapheme containing just a
small number of code points (usually 1 or 2). When the number of
combining diacritics exceed the size of the Grapheme struct, then it
would quietly switch to malloc or some such for holding the data.  My
guess is that this is the reason for passing &this to the wrapper range
returned by opSlice(). And possibly it's also to allow mutation of the
Grapheme via the returned slice?

Perhaps this whole approach should be looked at again. Certainly, unless
I'm missing something, it *ought* to be possible to implement Grapheme
in a way that doesn't require this scoped reference business.


T

-- 
The diminished 7th chord is the most flexible and fear-instilling chord. Use it often, use it unsparingly, to subdue your listeners into submission!


More information about the Digitalmars-d-learn mailing list