New Lookup Table (MixString)
Salih Dincer
salihdb at hotmail.com
Sun Sep 3 10:36:58 UTC 2023
On Saturday, 2 September 2023 at 14:20:58 UTC, Richard (Rikki)
Andrew Cattermole wrote:
> Lets see:
>
> O(n) search for alphabet index
I don't think speed is a big issue because a thousand pages and
possibly 47 letters old text
(kutadgu-bilig-fergana-holograph.txt: ~ 2 MB.) is completed in
under 1 second. The conversion done includes reading from the
file, finding the counterparts, and writing to the file...
For-example:
```d
enum abece
{
b =
"AEINRLİDKMUYTBSOÜŞZGÇHĞVCÖPFJXWÂÎÛĖĀĪŪĦŜŊĠŻṬẒḲĮ".to!(wchar[]),
k =
"aeınrlidkmuytbsoüşzgçhğvcöpfjxwâîûėāīūħŝŋġżṭẓḳį".to!(wchar[]),
ele = "gusiocCOISUG".to!(wchar[])
}
void main()
{
alias MSbyk = MixString!(wchar, abece.b);
enum bütünSözlük =
"aeınrlidkmuytbsoüşzgçhğvcöpfjxwâîûėāīūħŝŋġżṭẓḳį"; //
abece.k.to!string;
auto büyük = MSbyk(bütünSözlük);
// Source:
https://archive.org/download/kutadgu-bilig-fergana-nushasi/681053_djvu.txt
auto dosya = File("KutadguBilig.txt", "r");
while (!dosya.eof)
{
foreach(wchar c; dosya.readln)
{
if(auto result = büyük.nextIndexOf(c))
{
wchar lookup = büyük.dict[result - 1] >> 16;
lookup.write;
} else {
c.write;
}
}
writeln;
}
} /*
pico at enpi:~/Projeler/NewLookup$ time ./newLookupTable > result.txt
real 0m0,875s
user 0m0,859s
sys 0m0,016s
*/
```
On Saturday, 2 September 2023 at 14:20:58 UTC, Richard (Rikki)
Andrew Cattermole wrote:
>
> Unicode Demystified covers the standard method for doing this
> sort of lookup as well as how to do the case conversion
> correctly.
> https://www.amazon.com/Unicode-Demystified-Practical-Programmers-Encoding/dp/0201700522
Thank you, I will read the book you mentioned.
SDB at 79
More information about the Digitalmars-d
mailing list