[challenge] can you break wstring's back?

Steven Schveighoffer schveiguy at yahoo.com
Tue Nov 23 19:08:04 PST 2010


I am working on a string implementation that enforces the correct  
restrictions on a string (bi-directional range, etc), and I came across  
what I feel is a bug.

However, I don't know enough about utf to construct a test case to prove  
it wrong.

In std.array, there are separate functions for array.popBack(), depending  
on whether the array is a char[], a wchar[], or any other array type.  The  
char[] and wchar[] popBacks are drastically different.

However, there is only one back() function for narrow strings which  
supposedly handles both char[] and wchar[].  It looks like it will parse  
1, 2, 3, or 4 elements depending on the bit pattern, and it's only looking  
at the least significant 8 bits of the elements to determine this.  Does  
this make sense for wstring?  I would think the wstring has a different  
way of decoding data than the string, otherwise why the two different  
popBacks?

I don't know how to construct a string which shows there is an issue, is  
there one?  If so, can you prove it with a unit test?

Hint, the bit pattern of the end of the string must 'trick' the function  
into using the wrong number of elements, because ones that happen to match  
the correct number of elements needed will not cause an error (after  
deciding how many elements to decode, the data is passed to the decode  
function, which should do the right thing).

As a bonus, can you write a correct wstring.back function so I can include  
it in my string struct? :)

-Steve


More information about the Digitalmars-d mailing list