std.file: read, readText and UTF-8 decoding

Uranuz neuranuz at gmail.com
Thu Sep 21 15:29:17 UTC 2023


Hello!
I have some strange problem. I am trying to parse XML files and 
extract some information from it.
I use library dxml for it by Jonathan M Davis. But I have a 
probleme that I have multiple  XML files made by different people 
around the world. Some of these files was created with Byte Order 
Mark, but some of them without BOM. dxml expects no BOM at the 
start of the string.
At first I tried to read file with std.file.readText. Looks like 
it doesn't decode file at any way and doesn't remove BOM, so dxml 
failed to parse it then. This looks strange for me, because I 
expect that "text" function must decode data to UTF-8. Then I 
read that this behavior is documented at least:
"""
...However, no width or endian conversions are performed. So, if 
the width or endianness of the characters in the given file 
differ from the width or endianness of the element type of S, 
then validation will fail.
"""
So it's OK. But I understood that this function "readText" is not 
usefull for me.
So I tried to use plain "read" that returns "void[]". Problemmme 
is that I still don't understand which method I should use to 
convert this to string[] with proper UTF-8 decoding and remove 
BOM and etc.
Could you help me, please to make some clearance.
P.S. Function readText looks odd in std.file, because you cannot 
specify any encoding to decode this file. And logic how it 
decodes is unclear...


More information about the Digitalmars-d-learn mailing list