auto-decoding

Uknown sireeshkodali1 at gmail.com
Sun Apr 1 02:44:32 UTC 2018


On Sunday, 1 April 2018 at 01:19:08 UTC, auto wrote:
> What is auto decoding and why it is a problem?

Auto-decoding is essentially related to UTF representation of 
Unicode strings. In D, `char[]` and `string` represent UTF8 
strings, `wchar[]` and `wstring` represent UTF16 strings and 
`dchar[]` and `dstring` represent UTF32 strings. You need to know 
how UFT works in order to understand auto-decoding. Since in 
practice most code deals with UTF8 I'll explain wrt that. 
Essentially, the problem comes down to the fact that not all the 
Unicode characters are representable by 8 bit `char`s (for UTF8). 
Only the ASCII stuff is represented by the "normal" way. UTF8 
uses the fact that the first few buts in a char are never used in 
ASCII, to tell how many more `char`s ahead that character is 
encoded in. You can watch this video for a better 
understanding[0]. By default though, if one were to traverse a 
`char` looking for characters, they would get unexpected results 
with Unicode data

Auto-decoding tries to solve this by automatically applying the 
algorithm to decode the characters to Unicode "Code-Points". This 
is where my knowledge ends though. I'll give you pros and cons of 
auto-decoding.

Pros:
  * It makes Unicode string handeling much more easier for 
beginners.
  * Much less effort in general, it seems to "just work™"

Cons:
  * It makes string handling slow by default
  * It may be the wrong thing, since you may not want Unicode 
code-points, but graphemes instead.
  * Auto-decoding throws exceptions on reaching invalid 
code-points, so all string
handling code in general throws exceptions.

If you want to stop auto-decoding, you can use 
std.string.representation like this:

import std.string : representation;
auto no_decode = some_string.representation;

Now no_decode wont be auto-decoded, and you can use it in place 
of some_string. You can also use std.utf to decode by graphemes 
instead.

You should also read this blog post: 
https://jackstouffer.com/blog/d_auto_decoding_and_you.html

And this forum post: 
https://forum.dlang.org/post/eozguhavggchzzruzkwk@forum.dlang.org

[0]: https://www.youtube.com/watch?v=MijmeoH9LT4


More information about the Digitalmars-d-learn mailing list