odd behavior of split() function

Jonathan M Davis jmdavisProg at gmx.com
Fri Jun 7 00:29:33 PDT 2013


On Friday, June 07, 2013 09:18:57 Bedros wrote:
> I would like to split "A+B+C+D" into "A", "B", "C", "D"
> 
> but when using split() I get
> 
> "A+B+C+D", "B+C+D", "C+D", "D"
> 
> 
> the code is below
> 
> 
> import std.stdio;
> import std.string;
> import std.array;
> 
> int main()
> {
>       string [] str_list;
>       string test_str = "A+B+C+D";
>       str_list = test_str.split("+");
>       foreach(item; str_list)
>               printf("%s\n", cast(char*)item);
> 
>       return 0;
> }

That would be because of your misuse of printf. If you used

foreach(item; str_list)
    writeln(item);

you would have been fine. D string literals do happen to have a null character 
one past their end so that you can pass them directly to C functions, but D 
strings in general are _not_ null terminated, and printf expects strings to be 
null terminated. If you want to convert a D string to a null terminated 
string, you need to use std.string.toStringz, not a cast. You should pretty 
much never cast a D string to char* or const char* or any variant thereof. So, 
you could have done

printf("%s\n", toStringz(item));

but I don't know why you'd want to use printf rather than writeln or writefln - 
both of which (unlike printf) are typesafe and understand D types.

You got

"A+B+C+D", "B+C+D", "C+D", "D"

because the original string (being a string literal) had a null character one 
past its end, and each of the strings returned by split was a slice of the 
original string, and printf blithely ignored the actual boundaries of the 
slice looking for the next null character that it happened to find in memory, 
which - because they were all slices of the same string literal - happened to 
be the end of the original string literal. And the strings printed differed, 
because each slice started in a different portion of the underlying array.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list