std.uri.decodeComponent decodes invalid UTF-8 [BUG]

kdevel kdevel at vogtner.de
Wed Aug 6 05:47:10 UTC 2025


The bug is as follows:

On Tuesday, 5 August 2025 at 19:28:06 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
> [...]
> A URI is ASCII.

Sure. It is this:

    %c0%af

> Any input to that function will be ASCII, it won't be UTF-8.
>
> The hex encoding is not UTF-8, its its own encoding, that gets 
> reencoded out to UTF-8.

Which is decoded by decodeComponent into

    /

which is valid ASCII and valid UTF-8. But the mapping of

    %c0%af -> /

is an invalid one. It may be debatable if decodeComponent could
legitimately have returned invalid UTF-8, i.e. "\xc0\xaf".

The bug is that decodeComponent decodes invalid UTF-8 without
noticing the user of that function. This behavior is a violation
of RFC 3629.


More information about the Digitalmars-d mailing list