Text in D article

Daniel Keep daniel.keep.lists at gmail.com
Sat Nov 18 17:19:27 PST 2006



Pierre Rouleau wrote:
> Pierre Rouleau wrote:
> 
>> Daniel Keep wrote:
>>
>>> Here's a draft of an article which, hopefully, will explain some of the
>>> details of how text in D works.  Any constructive criticism is welcomed,
>>> along with edits or corrections.
>>>
>>
>> As someone who has not been coding in D except for trying out some D
>> every so often, I find:
>>
>> - the discussion of Unicode and its support of D clear and useful
>> - the description of the use of printf and string confusing:
>>
>> You wrote::
>>
>>    Back before D had the std.stdio.writefln method, most examples used
>>    the old C function printf. This worked fine until you tried to output
>>    a string::
>>
>>       printf(“Hello, World!\n”);
>>
>>    The above statement was very likely to print out garbage that left
>>    many people scratching their heads. The reason is that C uses
>>    NUL-terminated strings, whereas D uses true arrays. In other words:
>>
>>    - Strings in C are a pointer to the first character. A string ends at
>>      the first NUL character.
>>    - Strings in D are a pointer to the first character, followed by a
>>      length. There is no terminating character.
>>
>>    And that's the problem: printf is looking for a terminator that
>>    doesn't necessarily exist.
>>
>>
>> That would lead me to believe that I could not use printf to print a
>> string litteral.  But then I just wrote and compiled the following D
>> code::
>>
>>   int
>>   main()
>>   {
>>      printf("Hello!\n");
>>      printf("Bye!\n");
>>      return 1;
>>   }
>>
>> But it prints just fine.  So, something must be missing in your
>> explanation or my understanding.  I'll have to read more about D to
>> understand.
>>
>> Just my 2 cents,
>>
>> -- 
>> P.R.

Read down a little bit further: it points out that you want to use
std.string.toStringz to ensure that the NUL terminator exists.

It also admits that the example actually DOES work, simply because dmd
sticks the NUL terminator on the end of all string literals.  But as
someone already pointed out, if what you're dealing with is NOT a string
literal: a slice of another string, or something read from disk, then it
won't be there and the code will choke.

I should probably reorganise the section to be clearer on this.  I used
that (wrong) example because an example that actually fails would be
somewhat longer, and probably make people think "Ok, so why can't I use
slices to C functions?  Are they not really strings?"

> 
> And BTW, the line::
> 
>   printf(“Hello, World!\n”);
> 
> does not compile because of the non ASCII characters used for quoting.

Damnit... every time I go to write prose that option's off, and every
time I write code examples it's ON.  I swear OOo is out to get me >_<

> So other questions comes to mind:

Off the top of my head:

> - Can D source code contain Unicode characters freely?

- Yup, you betcha!

> - If so, how is it done?

- Use a text editor that supports saving files in UTF-8.  I'm not sure
off the top of my head if UTF-16 and UTF-32 are supported directly...

> - If not, how can we define a Unicode string literal?

- If you don't have access to a Unicode-enabled editor, you can use
escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)

> - Does D have a Unicode string type like, say Python, or is it better at
> specifying them?

- That's *all* D has.  Remember, char, wchar and dchar correspond to
UTF-8, UTF-16 and UTF-32 which are the three main ways of storing
Unicode text.  Internally, Python uses UTF-16.

> - How do we handle internationalization of presentation strings in D?
> - gettext support...

I don't know if gettext would work in D, simply because I've never seen
it tried.  D doesn't have any *direct* support for this, tho.

(Then again, I'm yet to see *any* programming language that does.)

> - Do we have to use text codecs (as in Python for example)?

D has no built-in support for converting between code pages, as far as I
know.  You need to download and use a conversion library like iconv to
convert between code pages.

> This information would fit quite nicely in an article describing text in D.

I may have to restructure it into two sections: a "What the... it's a
borken!" section and a "Q&A" section.

Thanks for the feedback.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/



More information about the Digitalmars-d mailing list