Reading unicode chars..
Ali Çehreli via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue Sep 2 16:20:38 PDT 2014
On 09/02/2014 02:13 PM, monarch_dodra wrote:
> I'd suggest you create a range out of your std.stream.File, which reads
> it byte by byte.
I was in the process of doing just that.
> Then, you pass it to the "byDchar()" range, which will
> auto decode those characters. If you really want to do it "character by
> character".
I first started writing my own byDchar but then used std.utf.byDchar as
you suggest. However, I had to resort to
1) Adding attributes to function calls which I know some are unsafe (see
assumeHasAttribs() below). For example, I don't think getc() should be
pure. (?) Also, how could all of its functions be nothrow? Is byDchar()
is asking too much of its users?
2) I also had to make StreamRange a template just to get attribute
inference from the compiler.
import std.stdio;
import std.stream;
import std.utf;
import std.traits;
auto assumeHasAttribs(T)(T t) pure
if (isFunctionPointer!T || isDelegate!T)
{
enum attrs = functionAttributes!T |
FunctionAttribute.pure_ |
FunctionAttribute.nogc |
FunctionAttribute.nothrow_;
return cast(SetFunctionAttributes!(T, functionLinkage!T, attrs)) t;
}
/* This is a template just to take advantage of compiler's attribute
* inference. */
struct StreamRange()
{
std.stream.File f;
char c;
this(std.stream.File f)
{
this.f = f;
prime();
}
private void prime()
{
if (!empty()) {
c = assumeHasAttribs(&(f.getc))();
}
}
@property bool empty() const
{
return assumeHasAttribs(&(f.eof))();
}
@property char front() const
{
return c;
}
void popFront()
{
prime();
}
}
auto streamRange()(std.stream.File file)
{
return StreamRange!()(file);
}
void main()
{
string fileName = "unicode_test_file";
doWrite(fileName);
doRead(fileName);
}
void doWrite(string fileName)
{
auto file = std.stdio.File(fileName, "w");
file.writeln("abcçd𝔸e„f");
}
void doRead(string fileName)
{
auto range = byDchar(streamRange(new std.stream.File(fileName,
FileMode.In)));
foreach (c; range) {
writeln(c);
}
}
> What's wrong with reading line by line, but processing the characters in
> said lines 1 by 1? That works "out of the box".
Agreed.
Ali
More information about the Digitalmars-d-learn
mailing list