phobos / tango / ares

Sat Feb 10 18:52:46 PST 2007

Sean Kelly wrote:
> Kevin Bealer wrote:
>>
>> Okay -- I'm really sorry if any of this seems to have a negative tone. 
>> I hesitate to write this since I have a lot of respect for the Tango 
>> design in general, but there are a couple of friction points I've 
>> noticed.
>>
>> 1. writefln / format replacements
>>
>> Concerning standard output and string formatting, in phobos I can do 
>> these operations:
>>
>>   writefln("%s %s %s", a, b, c);
>>   format("%s %s %s", a, b, c);
>>
>> How do I do these in Tango?  The change to "{0} {1}" stuff is fine 
>> with me, in fact I like it, but this syntax:
>>
>>   Stdout.formatln("{0} {1} {2}", a, b, c);
>>   Format!(char).convert("{0} {1} {2}", a, b, c);
>>
>> Is awkward.  And these statements are used *all the time*.  In a 
>> recent toy project I wrote, I used Stdout 15 times, compared to using 
>> "foreach" only 8 times.  I also use the "format to string" idiom a lot 
>> (oddly enough, not in that project), and it's even more awkward.
> 
> The conversion modules seem to have slightly spotty API documentation, 
> but I think this will work for the common case:
> 
> Formatter( "{0} {1} {2}", a, b, c );

Okay, I didn't see this possibility, that actually looks like a decent 
syntax; I withdraw the paragraphs in question, subject to the (zig zag) 
example below. :)

> The Stdout design is the result of a lengthy discussion involving 
> overload rules and expected behavior.  I believe two of the salient 
> points were that the default case should be the simplest to execute, and 
> that the .format method call provided a useful signifier that an 
> explicit format was being supplied.  That said, I believe that the 
> default output format can be called via:
> 
> Stdout( a, b, c );
> 
> or the "whisper" syntax:
> 
> Stdout( a )( b )( c );

Okay - there is a problem with new users who try to print strings with 
"%" somewhere in the string -- this solves that problem, which is nice.

>> That's why I think phobos really did the "Right Thing" by keeping 
>> those down to one token.  Second, the fact that the second one does 
>> exactly what the first does but you need to build a template, etc, is 
>> annoying.  I kept asking myself if I was doing the right thing because 
>> it seemed like I was using too much syntax for this kind of operation 
>> (I'm still not sure it's the best way to go -- is it?)

So am I, but in D I often don't have to, maybe I'm getting spoiled.

> Do you consider the Formatter instance to be sufficient or would it be 
> more useful to wrap this behavior in a free function?  I'll admit that, 
> being from a C++ background I'm quite used to customizing the library 
> behavior to suit my particular use style, but I can understand the 
> desire for "out of the box" convenience.

Hmmm.... given these two statements:

1. char[] zig = Formatter("{0} {1}", "ciao", "bella");
2. char[] zag = Formatter("{0} {1}", "one", "two");

Questions:

A. If these are done sequentially, will zig be affected by the 
processing of 'zag'?  (I.e. because of buffer sharing.)

B. Will doing 1 and 2 from different threads affect zig or zag?

If the answer to A and B is both "NO", then I have no problem with using 
Formatter.  I don't care about free function specifically (i.e. for 
getting a pointer or something), I just want safety, efficiency and 
clean syntax.

Documentation for Sprint suggests that both 1 and 2 are dangerous, I 
don't know if Formatter is like Sprint in that regard.

>> 2. toString and toUtf8 (collisions)
>>
>> The change of the terminology is actually okay with me.
>>
>> But phobos has a way of using toString as both a method and a 
>> top-level function name, all over the place.  This gets really clumsy 
>> because you can never use the top level function names when writing a 
>> class unless you fully qualify them.
>>
>> For example, std.cpuid.toString(), always has to be fully qualified 
>> when called from a class, and seems nondescriptive anyway.  All the 
>> std.conv.toString() functions are nice but it's easy to accidentally 
>> call the in-class toString() by accident.
>>
>> For the utf8 <--> utf16 and similar, it's frustrating to have to do this:
>>
>> dchar[] x32 = ...;
>> char[] x8 = tango.text.convert.Utf.toUtf8(x32);
>>
>> But you have to fully qualify if you are writing code in any class or 
>> struct.  If these were given another name, like makeUtf8, then these 
>> collisions would not happen.
> 
> One aspect of the Mango design that has carried forward into Tango is 
> that similar functions are typically intended to live in their own 
> namespace for the sake of clarity.  Previously, most/all of the free 
> functions were declared in structs simply to prevent collisions, but 
> this had code bloat issues so the design was changed.  Now, users are 
> encouraged to use the aliasing import to produce the same effect:
> 
> import Utf = tango.text.convert.Utf;
> 
> Utf.toUtf8( x32 );
> 
> I'll admit it's not as convenient as simply importing and using the 
> functions, but it does make the origin of every function call quite 
> clear.  I personally avoid "using" in C++ for exactly this reason--if 
> I'm using an external routine I want to know what library it's from by 
> inspection.
> 
> 
> Sean

This is not earth-shaking to me, so the current way is not a big deal, 
but what I want to avoid is what I think of as the Java naming effect, 
where you need to do this:

System.out.print(foo);

... to print something.  To me, the design of a programming language or 
library is like a natural language.  In english we say "tin can" but we 
always say "can" when there is no ambiguity.  You never say "I want to 
buy a tin can of beans".  (I think that in the UK, they say "tin of 
beans" instead, but its the same idea.)

My view is for the common things to be simple and the complex things to 
be as simple as possible.  The extra formality of spelling out the full 
names of things is something that people find comfort in (*), but I 
would as soon do without in D.

(*) I think people find comfort in it because they have been abused by 
other languages.  In C and C++ land, I agree --- if you do a '#define 
binary 1' in an include file somewhere, you can kill an algorithm in 
another file that is a dozen includes up the chain -- I found exactly 
this definition in a file at my job, and it was an 'interesting' problem 
to debug.  Working on large C and C++ projects breeds a kind of paranoia 
about symbol tables that I can completely relate to.

Sometimes the combination of #include and #define is a lot like "come 
from" in the way that it messes with the debugging process.

http://en.wikipedia.org/wiki/Come_from

But again, sorry if I'm being nit picky.

Kevin