Encodings
Jonathan M Davis
jmdavisProg at gmx.com
Sun Apr 8 15:03:20 PDT 2012
On Sunday, April 08, 2012 23:36:23 Nathan M. Swan wrote:
> For most of the string processing I do, I read/write text in
> UTF-8 and convert it to UTF-32 for processing (with std.utf), so
> I don't have to worry about encoding. Is this a good or bad
> paradigm? Is there a better way to do this? What method do all of
> you use?
>
> Just curious, NMS
It depends on what you're doing. Depending on the functions that you use and
your memory requirements, UTF-8 may be faster or UTF-32 may be faster. UTF-32
has the advantage of being a random-access range, which will make it work with
a number of functions that UTF-8 won't work with. But UTF-32 also takes
considerably more memory (especially if most of your characters are ASCII
characters), which can be a problem.
I think that the most common thing is to just operate on UTF-8 unless another
encoding is needed (e.g. UTF-32 is required because random-access is needed),
and in plenty of cases, you end up operating on generic ranges anyway if you
use range-based functions on strings and don't use std.array.array on them.
You're going to have to profile your code to see whether using UTF-8 or UTF-32
primarily in your string-processing is more efficient.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list