Find Semantically Correct Word Splits in UTF-8 Strings
"Nordlöw" via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Oct 1 04:06:23 PDT 2014
I'm looking for a way to make my algorithm
S[] findWordSplit(S)(S word,
HLang[] langs = [])
{
for (size_t i = 1; i + 1 < word.length; i++)
{
const first = word[0..i];
const second = word[i..$];
if (this.canMeanSomething(first, langs) &&
this.canMeanSomething(second, langs))
{
return [first,
second];
}
}
return typeof(return).init;
}
correctly work if S is a (UTF-8) string without first, in lazy
manner, encode word to a dstring.
Currently this algorithm works as
"carwash" => ["car", "wash"]
and I would like it to work correctly and efficient in my native
language aswell as
"biltvätt" => ["bil", "tvätt"]
:)
More information about the Digitalmars-d-learn
mailing list