auto-decoding
Uknown
sireeshkodali1 at gmail.com
Sun Apr 1 02:44:32 UTC 2018
On Sunday, 1 April 2018 at 01:19:08 UTC, auto wrote:
> What is auto decoding and why it is a problem?
Auto-decoding is essentially related to UTF representation of
Unicode strings. In D, `char[]` and `string` represent UTF8
strings, `wchar[]` and `wstring` represent UTF16 strings and
`dchar[]` and `dstring` represent UTF32 strings. You need to know
how UFT works in order to understand auto-decoding. Since in
practice most code deals with UTF8 I'll explain wrt that.
Essentially, the problem comes down to the fact that not all the
Unicode characters are representable by 8 bit `char`s (for UTF8).
Only the ASCII stuff is represented by the "normal" way. UTF8
uses the fact that the first few buts in a char are never used in
ASCII, to tell how many more `char`s ahead that character is
encoded in. You can watch this video for a better
understanding[0]. By default though, if one were to traverse a
`char` looking for characters, they would get unexpected results
with Unicode data
Auto-decoding tries to solve this by automatically applying the
algorithm to decode the characters to Unicode "Code-Points". This
is where my knowledge ends though. I'll give you pros and cons of
auto-decoding.
Pros:
* It makes Unicode string handeling much more easier for
beginners.
* Much less effort in general, it seems to "just work™"
Cons:
* It makes string handling slow by default
* It may be the wrong thing, since you may not want Unicode
code-points, but graphemes instead.
* Auto-decoding throws exceptions on reaching invalid
code-points, so all string
handling code in general throws exceptions.
If you want to stop auto-decoding, you can use
std.string.representation like this:
import std.string : representation;
auto no_decode = some_string.representation;
Now no_decode wont be auto-decoded, and you can use it in place
of some_string. You can also use std.utf to decode by graphemes
instead.
You should also read this blog post:
https://jackstouffer.com/blog/d_auto_decoding_and_you.html
And this forum post:
https://forum.dlang.org/post/eozguhavggchzzruzkwk@forum.dlang.org
[0]: https://www.youtube.com/watch?v=MijmeoH9LT4
More information about the Digitalmars-d-learn
mailing list