[Issue 15382] std.uri has an incorrect set of reserved characters
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Sun Jan 24 22:43:55 UTC 2021
https://issues.dlang.org/show_bug.cgi?id=15382
Stefan <kdevel at vogtner.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
CC| |kdevel at vogtner.de
Resolution|INVALID |---
--- Comment #3 from Stefan <kdevel at vogtner.de> ---
According to § 2.2 of RFC 3986 there are the following character
classes:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The code in phobos/std/uri.d references these character classes
instead:
62 uflags['#'] |= URI_Hash;
66 uflags[c] |= URI_Alpha;
67 uflags[c + 0x20] |= URI_Alpha; // lowercase letters
69 foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit;
70 foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved;
71 foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark;
If encodeComponent is used URI_Encode is invoked with
unescapedSet = URI_Alpha | URI_Digit | URI_Mark. This leads to
some reserved characters not beeing encoded, e.g. ! or (.
The notion of mark characters stems from the obsoleted RFC 2396 [2].
RFC 3986 explains the changes in its Appendix D.2 [3].
[1] https://tools.ietf.org/html/rfc3986#section-2
[2] https://tools.ietf.org/html/rfc2396#section-2.3
[3] https://tools.ietf.org/html/rfc3986#appendix-D.2
--
More information about the Digitalmars-d-bugs
mailing list