D'ish similar_text?
Dgame
r.schuett.1987 at gmail.com
Thu May 10 20:08:04 UTC 2018
I'm in need for some sort of string similarity comparision. I've
found soundex but that didn't solved my needs. After some search
I found a C implementation of similar_text, but that is quite
ugly... I was able to let it work in D but it's still somewhat
messy. Is there any D implementation of similar_text? Here's my
current code which makes heavy use of pointer-arithmetic which
makes it hard to understand.
----
void similar_text_similar_str(char* txt1, size_t len1, char*
txt2, size_t len2, size_t* pos1, size_t* pos2, size_t* max) {
char* p, q;
char* end1 = txt1 + len1;
char* end2 = txt2 + len2;
size_t l;
*max = 0;
for (p = txt1; p < end1; p++) {
for (q = txt2; q < end2; q++) {
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l]
== q[l]); l++) { }
if (l > *max) {
*max = l;
*pos1 = p - txt1;
*pos2 = q - txt2;
}
}
}
}
size_t similar_text_similar_char(char* txt1, size_t len1, char*
txt2, size_t len2) {
size_t sum;
size_t pos1, pos2, max;
similar_text_similar_str(txt1, len1, txt2, len2, &pos1,
&pos2, &max);
if ((sum = max) != 0) {
if (pos1 && pos2) {
sum += similar_text_similar_char(txt1, pos1, txt2,
pos2);
}
if ((pos1 + max < len1) && (pos2 + max < len2)) {
sum += similar_text_similar_char(txt1 + pos1 + max,
len1 - pos1 - max,
txt2 + pos2 + max, len2 - pos2 - max);
}
}
return sum;
}
size_t similar_text(in string s1, in string s2, out double
percent) {
immutable size_t sim = similar_text_similar_char(s1.dup.ptr,
s1.length, s2.dup.ptr, s2.length);
percent = sim * 200.0 / (s1.length + s2.length);
return sim;
}
----
More information about the Digitalmars-d-learn
mailing list