Any library with string encoding/decoding support?
Adam D. Ruppe
destructionator at gmail.com
Mon Jan 20 05:46:15 PST 2014
On Monday, 20 January 2014 at 08:33:09 UTC, ilya-stromberg wrote:
> Do you know any library with string encoding/decoding support?
> I need more encodings than provides `std.encoding`.
I did one that does a little bit more decoding, but no encoding
support at all. (I wrote it for my web scraper and email reader
so all i cared about was getting it to utf8)
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff/blob/master/characterencodings.d
auto s = convertToUtf8(your_raw_data, "current_encoding");
if you want something full featured, GNU iconv isn't hard to use
from D
import core.stdc.errno;
extern(C) {
alias void* iconv_t;
iconv_t iconv_open(const char *tocode, const char
*fromcode);
int iconv_close(iconv_t cd);
pragma(lib, "iconv");
size_t iconv(iconv_t cd,
char **inbuf, size_t *inbytesleft,
char **outbuf, size_t *outbytesleft);
}
auto i = iconv_open("UTF-8", toStringz("CP1252"));
if(i == cast(void*) -1) throw new Exception("iconv open
failed");
scope(exit) iconv_close(i);
/* get input pointer and length ready */
/* Allocate an output buffer with 4x the size of the input
buffer */
// keep the output buffer around as a slice and get a pointer
to it for the lib
auto startingOutputBuffer = new char(content.length * 4];
char* outputBuffer = startingOutputBuffer.ptr;
while(inputLength) {
auto ret = iconv(i, &input, &inputLength, &outputBuffer,
&outputLength);
if(ret == -1) {
// check errno. errno == 84 means wrong charset
}
}
// number of bytes remaining in the output buffer is the size
here
// so we do original buffer size minus remaining buffer size
outputLength = (content.length * 4) - outputLength;
// then slice it to get the result
string convertedContent = startingOutputBuffer[0 ..
outputLength];
Note that iconv i think is GPL licensed.
More information about the Digitalmars-d-learn
mailing list