New Lookup Table (MixString)

Salih Dincer salihdb at hotmail.com
Sun Sep 3 10:36:58 UTC 2023


On Saturday, 2 September 2023 at 14:20:58 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
> Lets see:
>
> O(n) search for alphabet index

I don't think speed is a big issue because a thousand pages and 
possibly 47 letters old text 
(kutadgu-bilig-fergana-holograph.txt: ~ 2 MB.) is completed in 
under 1 second. The conversion done includes reading from the 
file, finding the counterparts, and writing to the file...

For-example:
```d
enum abece
{
   b = 
"AEINRLİDKMUYTBSOÜŞZGÇHĞVCÖPFJXWÂÎÛĖĀĪŪĦŜŊĠŻṬẒḲĮ".to!(wchar[]),
   k = 
"aeınrlidkmuytbsoüşzgçhğvcöpfjxwâîûėāīūħŝŋġżṭẓḳį".to!(wchar[]),
   ele = "gusiocCOISUG".to!(wchar[])
}

void main()
{
   alias MSbyk = MixString!(wchar, abece.b);
   enum bütünSözlük = 
"aeınrlidkmuytbsoüşzgçhğvcöpfjxwâîûėāīūħŝŋġżṭẓḳį"; // 
abece.k.to!string;
   auto büyük = MSbyk(bütünSözlük);

   // Source: 
https://archive.org/download/kutadgu-bilig-fergana-nushasi/681053_djvu.txt
   auto dosya = File("KutadguBilig.txt", "r");
   while (!dosya.eof)
   {
     foreach(wchar c; dosya.readln)
     {
       if(auto result = büyük.nextIndexOf(c))
       {
         wchar lookup = büyük.dict[result - 1] >> 16;
         lookup.write;
       } else {
         c.write;
       }
     }
     writeln;
   }
} /*
pico at enpi:~/Projeler/NewLookup$ time ./newLookupTable > result.txt

real  0m0,875s
user  0m0,859s
sys   0m0,016s
*/
```

On Saturday, 2 September 2023 at 14:20:58 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
>
> Unicode Demystified covers the standard method for doing this 
> sort of lookup as well as how to do the case conversion 
> correctly. 
> https://www.amazon.com/Unicode-Demystified-Practical-Programmers-Encoding/dp/0201700522

Thank you, I will read the book you mentioned.

SDB at 79


More information about the Digitalmars-d mailing list