Text in D article

Sat Nov 18 20:52:48 PST 2006

Daniel Keep wrote:

> 
> Pierre Rouleau wrote:
> 
>>Pierre Rouleau wrote:
>>
>>
>>>Daniel Keep wrote:
>>>
>>>
>>>>Here's a draft of an article which, hopefully, will explain some of the
>>>>details of how text in D works.  Any constructive criticism is welcomed,
>>>>along with edits or corrections.
>>>>
>>>
>>>As someone who has not been coding in D except for trying out some D
>>>every so often, I find:
>>>
>>>- the discussion of Unicode and its support of D clear and useful
>>>- the description of the use of printf and string confusing:
>>>
>>>You wrote::
>>>
>>>   Back before D had the std.stdio.writefln method, most examples used
>>>   the old C function printf. This worked fine until you tried to output
>>>   a string::
>>>
>>>      printf(“Hello, World!\n”);
>>>
>>>   The above statement was very likely to print out garbage that left
>>>   many people scratching their heads. The reason is that C uses
>>>   NUL-terminated strings, whereas D uses true arrays. In other words:
>>>
>>>   - Strings in C are a pointer to the first character. A string ends at
>>>     the first NUL character.
>>>   - Strings in D are a pointer to the first character, followed by a
>>>     length. There is no terminating character.
>>>
>>>   And that's the problem: printf is looking for a terminator that
>>>   doesn't necessarily exist.
>>>
>>>
>>>That would lead me to believe that I could not use printf to print a
>>>string litteral.  But then I just wrote and compiled the following D
>>>code::
>>>
>>>  int
>>>  main()
>>>  {
>>>     printf("Hello!\n");
>>>     printf("Bye!\n");
>>>     return 1;
>>>  }
>>>
>>>But it prints just fine.  So, something must be missing in your
>>>explanation or my understanding.  I'll have to read more about D to
>>>understand.
>>>
>>>Just my 2 cents,
>>>
>>>-- 
>>>P.R.
> 
> 
> Read down a little bit further: it points out that you want to use
> std.string.toStringz to ensure that the NUL terminator exists.
> 

I saw that.  My point was that the article should be a little clearer as 
to why you would want to use it.  As an introduction of text processing 
in D, and a treatment of the different string format (NUL terminated or 
lenght-based) a newbie would need to know the implications of the code 
he writes, the effect of transformations (such as slices or whatever).

> It also admits that the example actually DOES work, simply because dmd
> sticks the NUL terminator on the end of all string literals.  But as
> someone already pointed out, if what you're dealing with is NOT a string
> literal: a slice of another string, or something read from disk, then it
> won't be there and the code will choke.
> 
> I should probably reorganise the section to be clearer on this.  I used
> that (wrong) example because an example that actually fails would be
> somewhat longer, and probably make people think "Ok, so why can't I use
> slices to C functions?  Are they not really strings?"

> 
> 
>>And BTW, the line::
>>
>>  printf(“Hello, World!\n”);
>>
>>does not compile because of the non ASCII characters used for quoting.
> 
> 
> Damnit... every time I go to write prose that option's off, and every
> time I write code examples it's ON.  I swear OOo is out to get me >_<

I also like reStructuredText myself...  but writing extra symbols is a 
little trickier...

> 
>>So other questions comes to mind:
> Off the top of my head:
>>- Can D source code contain Unicode characters freely?
> - Yup, you betcha!
>>- If so, how is it done?
> - Use a text editor that supports saving files in UTF-8.  I'm not sure
> off the top of my head if UTF-16 and UTF-32 are supported directly...

Readers might be interested to know that they can use these in the 
source code file. As well, they wonder whether or not non ASCII 
characters are acceptables for things such as variable names.

>>- If not, how can we define a Unicode string literal?
> - If you don't have access to a Unicode-enabled editor, you can use
> escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)
>>- Does D have a Unicode string type like, say Python, or is it better at
>>specifying them?
> - That's *all* D has.  Remember, char, wchar and dchar correspond to
> UTF-8, UTF-16 and UTF-32 which are the three main ways of storing
> Unicode text.  Internally, Python uses UTF-16.
> 
> 
>>- How do we handle internationalization of presentation strings in D?
>>- gettext support...
> 
> 
> I don't know if gettext would work in D, simply because I've never seen
> it tried.  D doesn't have any *direct* support for this, tho.

I can't see why it would not.  Can we have a function named  '_()' in D?
Since gettext philosophy is to write all presentation strings in 
English, then the code can be written in ASCII-only files and since the 
strings are Unicode, the translated strings could contain any symbol at 
runtime.

One aspect is the string formatting.  Does D support string formatting 
similar to Python's dictionary-based formatting like:

a_dict = {person_name : 'Daniel'}
a_string = 'Hello %(person_name)s ! How are you?' % a_dict

Python dictionaries are very useful for that purpose.  Translating 
presentation strings works better when the entire string context is 
available to the person doing the natural language translation.  As far 
as I am concerned, this is an important feature for programming language 
used to (client-side) write applications.

> 
> (Then again, I'm yet to see *any* programming language that does.)
> 
Support for gettext does not have to be built in the language.  Simply 
that the language does not preclude using gettext.

> 
>>- Do we have to use text codecs (as in Python for example)?
> 
> 
> D has no built-in support for converting between code pages, as far as I
> know.  You need to download and use a conversion library like iconv to
> convert between code pages.
> 

> 
> Thanks for the feedback.
> 

You're welcome.

--

Pierre