Reading unicode chars..

Tue Sep 2 16:20:38 PDT 2014

On 09/02/2014 02:13 PM, monarch_dodra wrote:

 > I'd suggest you create a range out of your std.stream.File, which reads
 > it byte by byte.

I was in the process of doing just that.

 > Then, you pass it to the "byDchar()" range, which will
 > auto decode those characters. If you really want to do it "character by
 > character".

I first started writing my own byDchar but then used std.utf.byDchar as 
you suggest. However, I had to resort to

1) Adding attributes to function calls which I know some are unsafe (see 
assumeHasAttribs() below). For example, I don't think getc() should be 
pure. (?) Also, how could all of its functions be nothrow? Is byDchar() 
is asking too much of its users?

2) I also had to make StreamRange a template just to get attribute 
inference from the compiler.

import std.stdio;
import std.stream;
import std.utf;
import std.traits;

auto assumeHasAttribs(T)(T t) pure
     if (isFunctionPointer!T || isDelegate!T)
{
     enum attrs = functionAttributes!T |
                  FunctionAttribute.pure_ |
                  FunctionAttribute.nogc |
                  FunctionAttribute.nothrow_;

     return cast(SetFunctionAttributes!(T, functionLinkage!T, attrs)) t;
}

/* This is a template just to take advantage of compiler's attribute
  * inference. */
struct StreamRange()
{
     std.stream.File f;
     char c;

     this(std.stream.File f)
     {
         this.f = f;
         prime();
     }

     private void prime()
     {
         if (!empty()) {
             c = assumeHasAttribs(&(f.getc))();
         }
     }

     @property bool empty() const
     {
         return assumeHasAttribs(&(f.eof))();
     }

     @property char front() const
     {
         return c;
     }

     void popFront()
     {
         prime();
     }
}

auto streamRange()(std.stream.File file)
{
     return StreamRange!()(file);
}

void main()
{
     string fileName = "unicode_test_file";
     doWrite(fileName);
     doRead(fileName);
}

void doWrite(string fileName)
{
     auto file = std.stdio.File(fileName, "w");
     file.writeln("abcçd𝔸e„f");
}

void doRead(string fileName)
{
     auto range = byDchar(streamRange(new std.stream.File(fileName,
                                                          FileMode.In)));

     foreach (c; range) {
         writeln(c);
     }
}

 > What's wrong with reading line by line, but processing the characters in
 > said lines 1 by 1? That works "out of the box".

Agreed.

Ali