d2sqlite3 db.run, where lies the bug?

Tue Apr 10 20:07:34 UTC 2018

On 04/10/2018 08:04 PM, Ralph Amissah wrote:
> The exact location of problem may be provided in the error statement
> "core.exception.UnicodeException at src/rt/util/utf.d(292): invalid
> UTF-8 sequence".
> 
[...]
> Mock problem string with test code follows (d2sqlite3 required):
> 
[... code ...]

A more minimal test case, reduced from your code:

----
module d2sqlite3_utf8.issue;
import d2sqlite3;
void main() {
   string[] info_tag = ["pass", "fault"];
     auto db = Database(":memory:");
     string _sql_statement = `SELECT '’’';`;
     db.run(_sql_statement);
     db.close;
}
----

 From the exception's stack trace we see that 
`d2sqlite3.internal.util.byStatement(immutable(char)[]).ByStatement.findEnd` 
is the deepest non-Phobos function involved. So that's a good first spot 
to look for a bug. Let's check it out.

https://github.com/biozic/d2sqlite3/blob/2e8211946ae0e09646d561aeae1361a695adcc17/source/d2sqlite3/internal/util.d#L64-L83

And indeed, there's a bug in these lines:

----
auto tail = sql[pos .. $];
immutable offset = tail.countUntil(';') + 1;
pos += offset;
----

`pos` is used to slice the string `sql`. That means, `pos` is 
interpreted as a number of UTF-8 code *units*. But then the result of 
`countUntil` is added. `countUntil` counts code *points*. So a number of 
code points is mistaken as a number of code units. That means the next 
slicing can be incorrect and split up a multibyte sequence. And then 
`countUntil` will complain about broken UTF-8.

This can be fixed by letting `countUntil` operate on count code units 
instead:

----
import std.utf: byCodeUnit;
immutable offset = tail.byCodeUnit.countUntil(';') + 1;
----

If you want, you can make a bug report or a pull request with the fix. 
Otherwise, if you're not up to that, I can make one.

[...]
>    - DMD64 D Compiler v2.074.1

That's rather old. I'd recommend updating if possible.