d2sqlite3 db.run, where lies the bug?
ag0aep6g
anonymous at example.com
Tue Apr 10 20:07:34 UTC 2018
On 04/10/2018 08:04 PM, Ralph Amissah wrote:
> The exact location of problem may be provided in the error statement
> "core.exception.UnicodeException at src/rt/util/utf.d(292): invalid
> UTF-8 sequence".
>
[...]
> Mock problem string with test code follows (d2sqlite3 required):
>
[... code ...]
A more minimal test case, reduced from your code:
----
module d2sqlite3_utf8.issue;
import d2sqlite3;
void main() {
string[] info_tag = ["pass", "fault"];
auto db = Database(":memory:");
string _sql_statement = `SELECT '’’';`;
db.run(_sql_statement);
db.close;
}
----
From the exception's stack trace we see that
`d2sqlite3.internal.util.byStatement(immutable(char)[]).ByStatement.findEnd`
is the deepest non-Phobos function involved. So that's a good first spot
to look for a bug. Let's check it out.
https://github.com/biozic/d2sqlite3/blob/2e8211946ae0e09646d561aeae1361a695adcc17/source/d2sqlite3/internal/util.d#L64-L83
And indeed, there's a bug in these lines:
----
auto tail = sql[pos .. $];
immutable offset = tail.countUntil(';') + 1;
pos += offset;
----
`pos` is used to slice the string `sql`. That means, `pos` is
interpreted as a number of UTF-8 code *units*. But then the result of
`countUntil` is added. `countUntil` counts code *points*. So a number of
code points is mistaken as a number of code units. That means the next
slicing can be incorrect and split up a multibyte sequence. And then
`countUntil` will complain about broken UTF-8.
This can be fixed by letting `countUntil` operate on count code units
instead:
----
import std.utf: byCodeUnit;
immutable offset = tail.byCodeUnit.countUntil(';') + 1;
----
If you want, you can make a bug report or a pull request with the fix.
Otherwise, if you're not up to that, I can make one.
[...]
> - DMD64 D Compiler v2.074.1
That's rather old. I'd recommend updating if possible.
More information about the Digitalmars-d
mailing list