Find Semantically Correct Word Splits in UTF-8 Strings
monarch_dodra via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Oct 1 09:44:23 PDT 2014
On Wednesday, 1 October 2014 at 11:06:24 UTC, Nordlöw wrote:
> I'm looking for a way to make my algorithm
>
> S[] findWordSplit(S)(S word,
> HLang[] langs = [])
> {
> for (size_t i = 1; i + 1 < word.length; i++)
> {
> const first = word[0..i];
> const second = word[i..$];
> if (this.canMeanSomething(first, langs) &&
> this.canMeanSomething(second, langs))
> {
> return [first,
> second];
> }
> }
> return typeof(return).init;
> }
>
> correctly work if S is a (UTF-8) string without first, in lazy
> manner, encode word to a dstring.
>
> Currently this algorithm works as
>
> "carwash" => ["car", "wash"]
>
> and I would like it to work correctly and efficient in my
> native language aswell as
>
> "biltvätt" => ["bil", "tvätt"]
>
> :)
Out of curiosity, why exactly isn't it working in your "native
language"? If you avoid decoding in your "canMeanSomething", you
should encounter no problems.
More information about the Digitalmars-d-learn
mailing list