Let's stop parser Hell

Wed Aug 1 22:17:42 PDT 2012

On Thu, Aug 2, 2012 at 1:29 AM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> On Wednesday, August 01, 2012 22:47:47 Philippe Sigaud wrote:
>> I somehow thought that with UTF-8 you were limited to a part of
>> Unicode, and to another, bigger part with UTF-16.
>> I equated Unicode with UTF-32.
>> This is what completely warped my vision. It's good to learn something
>> new everyday, I guess.
>
> I guess that that would explain why you didn't understand what I was saying. I
> was highly confused as to what was confusing about what I was saying, but it
> didn't even occur to me that you had that sort of misunderstanding. You really
> should get a better grip on unicode if you want to be writing code that lexes
> or parses it efficiently (though it sounds like you're reading up on a lot
> already right now).

I knew about 1-2-4 bytes schemes and such. But, somehow, for me,
string == only-almost-ASCII characters.
Anyway, it all *clicked* into place right afterwards and your answers
are perfectly clear to me now.