Need to do some "dirty" UTF-8 handling
Nick Sabalausky
a at a.a
Sat Jun 25 02:00:43 PDT 2011
Sometimes I need to bring data into a string, and need to be able to treat
it as an actual "string", but don't actually care if the entire thing is
technically valid UTF-8 or not, don't care if invalid bytes don't get
preserved right, and can't have any utf exceptions being thrown regardless
of the input. Yea, I know that's sloppy, but sometimes that's good enough
and proper handling may be far more trouble than what's needed. (For
example: Processing HTML from arbitrary URLs. It's pretty much guaranteed
you'll come across stuff that's wrong or even has the encoding type
improperly set. But it's usually more important for the process to succeed
than for it to be perfectly accurate.)
Far as I can tell, this seems to currently be impossible with Phobos (unless
you're *extremely* meticulous about watching what your entire codebase does
with the data), which is a major pain when such a need arises.
Anyone have a good workaround? For instance, maybe a function that'll take
in a byte array and convert *all* invalid UTF-8 sequences to a user-selected
valid character?
More information about the Digitalmars-d-learn
mailing list