D1: UTF8 char[] casting to wchar[] array cast misalignment ERROR

via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Jun 17 05:54:37 PDT 2014


On Tuesday, 17 June 2014 at 02:27:43 UTC, jicman wrote:
>
> Greetings!
>
> I have a bunch of files plain ASCII, UTF8 and UTF16 with and 
> without BOM (Byte Order Mark).  I had, "I thought", a nice way 
> of figuring out what type of encoding the file was (ASCII, UTF8 
> or UTF16) when the BOM was missing, by reading the content and 
> applying the std.utf.validate function to the char[] or, 
> wchar[] string.  The problem is that lately, I am hitting into 
> a wall with the "array cast misalignment" when casting wchar[].
>  ie.
>
> auto text = cast(string) file.read();
> wchar[] temp = cast(wchar[]) text;

If the length of the data is odd, it cannot be (valid) UTF16. You 
can check for that, and skip the test for UTF16 in this case.

Another thing: it is better not to cast the data to `string` 
before you know that it's actually UTF8. Better make it 
`ubyte[]`; this way you don't need all the casts inside the 
if-blocks.


More information about the Digitalmars-d-learn mailing list