Find Semantically Correct Word Splits in UTF-8 Strings
monarch_dodra via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Oct 1 10:09:56 PDT 2014
On Wednesday, 1 October 2014 at 11:47:41 UTC, Nordlöw wrote:
> On Wednesday, 1 October 2014 at 11:06:24 UTC, Nordlöw wrote:
>> I'm looking for a way to make my algorithm
>>
>
> Update:
>
> S[] findMeaningfulWordSplit(S)(S word,
> HLang[] langs = []) if
> (isSomeString!S)
> {
> for (size_t i = 1; i + 1 < word.length; i++)
> {
> const first = word.takeExactly(i).to!string;
Does that even work? takeExactly would pop up to N *codepoints*,
whereas your string only has N *codeunits*.
Something like:
for (auto second = str ; !second.empty ; second.popFront() )
{
auto first = str[0 .. $ - second.length];
...
}
//special case str + str[$ .. $] here. (or adapt your loop)
Would also be unicode correct, without increasing the original
complexity.
More information about the Digitalmars-d-learn
mailing list