Get Character At?
Chris Nicholson-Sauls
ibisbasenji at gmail.com
Tue Apr 24 18:04:19 PDT 2007
Derek Parnell wrote:
> On Tue, 24 Apr 2007 11:56:19 -0400, okibi wrote:
>
>> Derek Parnell Wrote:
>>
>>> On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:
>>>
>>>> Is there a getCharAt() function for D?
>>> Get a character from what? A string, a file, a console screen, ... ?
>>>
>>> --
>>> Derek Parnell
>>> Melbourne, Australia
>>> "Justice for David Hicks!"
>>> skype: derek.j.parnell
>> Such as this:
>>
>> char[] text = "This is a test sentence.";
>>
>> int loc = 5;
>>
>> char num5 = text.getCharAt(loc);
>>
>> Something along those lines.
>
> Because char[] represents a UTF-8 encoded unicode string, to get the Nth
> character (first character is a position 1), try this ...
>
> import std.stdio;
> import std.utf;
>
> T getCharAt(T)(T pText, uint pPos)
> {
> size_t lUTF_Index;
> uint lStride;
>
> // Firstly, find out where the character starts in the string.
> lUTF_Index = std.utf.toUTFindex(pText, pPos-1);
>
> // Then find out its width (in bytes)
> lStride = std.utf.stride(pText, lUTF_Index);
>
> // Return the character encoded in UTF format.
> return pText[lUTF_Index .. lUTF_Index + lStride];
> }
>
> void main()
> {
> char[] text = "a\ua034bcdef";
> uint loc = 4;
> writefln("%s", getCharAt(text, loc)); // shows "c"
> writefln("%s", text[loc-1]); // correctly fails
> }
>
>
> If you just use 'text[loc]', you may not get the correct character, and you
> actually only get a UTF code point fragment anyway.
>
> Remember that char[] is not an array of characters. It is an array of UTF-8
> code point fragments (each 1-byte wide) and a UTF-8 encoded character (code
> point) can have from 1 to 4 fragments.
>
Which is why I tend to try and bite the bullet and just use dchar[] for general purpose
things. I only use char[] in cases where I know it's "safe" to do so (that is, cases
where I know what the input will be, and know it will be within the single-byte character
range). That said, its a darn good thing Phobos has std.utf and Tango has
tango.utils.Utf, otherwise we'd often be in a pickle. (Avoiding potential tango.io joke.)
-- Chris Nicholson-Sauls
More information about the Digitalmars-d-learn
mailing list