Want to read a whole file as utf-8

Namespace via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Feb 3 16:56:54 PST 2015


On Tuesday, 3 February 2015 at 23:55:19 UTC, FG wrote:
> On 2015-02-04 at 00:07, Foo wrote:
>> How would I use decoding for that? Isn't there a way to read 
>> the file as utf8 or event better, as unicode?
>
> Well, apparently the utf-8-aware foreach loop still works just 
> fine.
> This program shows the file size and the number of unicode 
> glyps, or whatever they are called:
>
>     import core.stdc.stdio;
>     int main() @nogc
>     {
>         const int bufSize = 64000;
>         char[bufSize] buffer;
>         size_t bytesRead, count;
>         FILE* f = core.stdc.stdio.fopen("test.d", "r");
>         if (!f)
>             return 1;
>         bytesRead = fread(cast(void*)buffer, 1, bufSize, f);
>         if (bytesRead > bufSize - 1) {
>             printf("File is too big");
>             return 1;
>         }
>         if (!bytesRead)
>             return 2;
>         foreach (dchar d; buffer[0..bytesRead])
>             count++;
>         printf("read %d bytes, %d unicode characters\n", 
> bytesRead, count);
>         fclose(f);
>         return 0;
>     }
>
> Outputs for example this: read 838 bytes, 829 unicode characters
>
> (It would be more complicated if it had to process bigger 
> files.)

To use a foreach loop is such a nice idea! tank you very much. :)

That's my code now:
----
private:

static import m3.m3;
static import core.stdc.stdio;
alias printf = core.stdc.stdio.printf;

public:

@trusted
@nogc
auto readFile(in string filename) nothrow {
	import std.c.stdio : FILE, SEEK_END, SEEK_SET, fopen, fclose, 
fseek, ftell, fread;

	FILE* f = fopen(filename.ptr, "rb");
	fseek(f, 0, SEEK_END);
	immutable size_t fsize = ftell(f);
	fseek(f, 0, SEEK_SET);

	char[] str = m3.m3.make!(char[])(fsize);
	fread(str.ptr, fsize, 1, f);
	fclose(f);

	return str;
}

@trusted
@nogc
@property
dstring toUTF32(in char[] s) {
     dchar[] r = m3.m3.make!(dchar[])(s.length); // r will never 
be longer than s
     foreach (immutable size_t i, dchar c; s) {
     	r[i] = c;
     }

     return cast(dstring) r;
}

@nogc
void main() {
	auto str = readFile("test_file.txt");
	scope(exit) m3.m3.destruct(str);

	auto str2 = str.toUTF32;
	printf("%d : %d\n", cast(int) str[0], cast(int) str2[0]);
}
----

m3 is my own module and means "manual memory management", three 
m's so m3. If we will use D (what is now much more likely) that 
is our core module for memory management.


More information about the Digitalmars-d-learn mailing list