Parsing D files with non-unicode characters

Wed Nov 7 04:43:19 UTC 2018

On Tuesday, 6 November 2018 at 01:19:17 UTC, Roland Hadinger 
wrote:
> On Tuesday, 6 November 2018 at 00:48:34 UTC, Arun 
> Chandrasekaran wrote:
>>
>> Thanks! Can't we preserve the comments? Comments are 
>> invaluable, especially on the headerfiles. We generate 
>> documentation using doxygen.
>
> If by 'preserve' you mean 'keep the non-UTF-8 encoding as-is', 
> then no, what I suggested wouldn't work.

This did the trick. It uses https://github.com/BYVoid/uchardet to 
determine the character set.

for dir in $(find <DIR> -name include -type d); do
     pushd $dir

     for file in $(ls); do
	iconv -f $(uchardet $file) -t UTF-8 $file > t
	/bin/mv t $file
     done
     # if the encoding is SHIFT-JIS iconv converts \ to ¥. Restore 
it back.
     sed -i 's,¥,\\,g' *

     # convert .h to .d file
     dstep $file

     popd
done