std.uri.decodeComponent decodes invalid UTF-8 [BUG]
kdevel
kdevel at vogtner.de
Wed Aug 6 05:47:10 UTC 2025
The bug is as follows:
On Tuesday, 5 August 2025 at 19:28:06 UTC, Richard (Rikki) Andrew
Cattermole wrote:
> [...]
> A URI is ASCII.
Sure. It is this:
%c0%af
> Any input to that function will be ASCII, it won't be UTF-8.
>
> The hex encoding is not UTF-8, its its own encoding, that gets
> reencoded out to UTF-8.
Which is decoded by decodeComponent into
/
which is valid ASCII and valid UTF-8. But the mapping of
%c0%af -> /
is an invalid one. It may be debatable if decodeComponent could
legitimately have returned invalid UTF-8, i.e. "\xc0\xaf".
The bug is that decodeComponent decodes invalid UTF-8 without
noticing the user of that function. This behavior is a violation
of RFC 3629.
More information about the Digitalmars-d
mailing list