Splitting up large dirty file

Jonathan M Davis newsgroup.d at jmdavisprog.com
Wed May 16 10:30:34 UTC 2018


On Wednesday, May 16, 2018 08:57:10 Dennis via Digitalmars-d-learn wrote:
> I thought it wouldn't be hard to crudely split this file using
> D's range functions and basic string manipulation, but the
> combination of being to large for a string and having invalid
> encoding seems to defeat most simple solutions.

D is designed with the idea that a string is valid UTF-8, a wstring is valid
UTF-16, and dstring is valid UTF-32. For various reasons, that doesn't
always hold true like it should, but pretty much all of Phobos is written
with that assumption and will generally throw an exception if it isn't. If
you're ever dealing with a different encoding (or with invalid Unicode), you
really need to use integral types like ubyte (e.g. by using
std.string.representation or by reading the data in as ubytes rather than as
a string) and not try to use character types like char or string. If you try
to use char or string with invalid UTF-8 without having it throw any
exceptions, you're pretty much guaranteed to fail.

- Jonathan M Davis



More information about the Digitalmars-d-learn mailing list