What library functionality would you most like to see in D?

Mehrdad wfunction at hotmail.com
Sun Aug 7 11:33:19 PDT 2011


On 8/7/2011 3:21 AM, Jonathan M Davis wrote:
> On Sunday 07 August 2011 14:08:06 Dmitry Olshansky wrote:
>> On 07.08.2011 12:09, Mehrdad wrote:
>>> A readText() function that would read a text file (**and** autodetect
>>> its encoding from its BOM) would be of great help.
>> Well the name is here, dunno if it meets your expectations:
>> http://d-programming-language.org/phobos/std_file.html#readText
> ...
> What Mehrdad wants is a way to read in a file with an encoding other than
> UTF-8, UTF-16, or UTF-32, have it autodetect the encoding by reading the file's
> BOM, and then convert it it to whatever encoding is that the character type
> that readText is using uses.
Yeah, although I don't mean anything /other/ than those -- I only care 
about Unicode, but I think it should be auto-detected, not based on the 
template parameter.


On 8/7/2011 6:21 AM, Andrei Alexandrescu wrote:
> I think we could and should change readText to do the BOM trick. It's 
> been on my mind forever.
I /do/ have an implementation, but it's (1) only for Windows, (2) 
hastily written (no error checking or whatever), and (3) doesn't work 
for UTF-16 BE (although it works for LE), and (4) only returns the 
result in UTF-8.
It's a starting point, though. An added bonus is the fact that it 
actually looks at the file data as well, so the heuristic is rather nice.

     pragma(lib, "advapi32.lib");
     extern(Windows) BOOL IsTextUnicode(in void* pBuffer, int cb, int* lpi);
     string readText(const(char)[] name)
     {
         auto data = cast(char[])file.read(name);
         int test = 0xFFFF;
         if (IsTextUnicode(data.ptr, data.length, &test))
         { return (cast(wchar[])(test & 0x00088 ? data[2 .. $] : 
data)).toUTF8(); }
         else
         { return (data.startsWith([0xEF, 0xBB, 0xBF]) ? data[3 .. $] : 
data).toUTF8(); }
     }


More information about the Digitalmars-d mailing list