Rosetta Commatizing numbers

Tue May 30 21:31:14 PDT 2017

On Tuesday, 30 May 2017 at 10:54:49 UTC, Solomon E wrote:
> I ran into a Rosetta code solution in D that had obvious 
> errors. It's like the author or the previous editor wasn't even 
> trying to do it right, like a protest against how many detailed 
> rules the task had. I assumed that's not the way we want to do 
> things in D.
> ...
> Does anyone have any thoughts about this? Did I do right by D?

I'd say the previous version (by bearophile) suited the task much 
better, but both aren't perfect.

As a general note, consider the following paragraph of the 
problem statement:

"Some of the commatizing rules (specified below) are arbitrary, 
but they'll be a part of this task requirements, if only to make 
the results consistent amongst national preferences and other 
disciplines."

This literally means that, while there are complex rules in the 
real world for commatizing numbers, the problem is kept simple by 
enforcing strict rules.  The minute concerns of the Real World, 
like "Current New Zealand dollar format overrides old Zimbabwe 
dollar format", are irrelevant to the formal problem being 
solved.  Perhaps the example inputs section ("Strings to be used 
as a minimum") gets misleading, but that's what they are: 
examples, not general rules.  By the way, as it's a wiki page, 
problem statement text could also be improved ;) .

Why?  For example, look at Indian numbering system where 
commatizing is visibly different 
(https://en.wikipedia.org/wiki/Indian_numbering_system) - and we 
don't know whether the string should use it or not without the 
context.  Or consider that hexadecimal numbers are usually split 
in groups of four digits, not three - and we don't know whether a 
[0-9]+ number is decimal or hexadecimal without the context.  
See, trying to provide an ultimate solution to real-world 
commatizing, while keeping it a single function without the 
context, can't possibly succeed.

What can be done, then?  Well, the page authors already did the 
difficult part for us: they extracted the essence of a complex 
real-world problem into a small set of formal rules, which are 
now the formal problem statement.  Now comes the easy part: to do 
exactly what is asked in the problem statement.  The flexibility 
comes from having function parameters.  If we have a solution to 
a formal problem, using it for the real-world version of the 
problem is either just specifying the right parameters 
(hopefully), or changing the function if the real world gets too 
complex for it.  In the latter case, the more short and readable 
the existing solution is, the faster can we change the function 
to suit our real-world case.

-----

Now, where is the old version wrong?  Turns out it just calls the 
function with default parameters for every line of input - which 
is wrong since the first two input lines need to be handled 
specially.  Well, that's what the function parameters are for.  
To have a correct solution, we have to use custom parameters for 
the first two lines of input.  The function itself is fine.

Your solution addresses this problem by special-casing the inputs 
inside the function, perhaps because of the misleading inputs 
section in the problem statement.  That's a wrong approach.  
First, it introduces magic numbers 33 and 36 into the code, which 
is a bad programming practice (see here: 
https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants).  Second, it's plain wrong.  According to the problem statement, we don't have these rules for every possible line of >33 standalone decimals, or >36 characters in total.  We just have to call our function with a concrete set of custom parameters for one concrete example, and other set of parameters for another example.  That's to demonstrate that our function accepts and makes proper use of custom parameters!  Special-casing example inputs inside the function is not a solution: if we go down this path, the perfect solution would be a bunch of "if" statements for every possible example input producing the respective example outputs, and empty function for all other possible inputs.

So, how do we call with special parameters?  Currently, we can 
look at every other language except C# as inspiration: ALGOL 68, 
J, Java, Perl 6, Phix, Racket, and REXX.  Your solution also has 
a good way to check example inputs: a unittest block.  It even 
shows one of D's strengths compared to other languages.  And 
there, you do use custom parameters to check that the function 
works.  A good approach would be to put all the examples in the 
unittest instead of reading them from a file.  This way, the 
program will be immediately usable and runnable: no need to 
create an additional arbitrarily-named file just to test it.

-----

All in all, the only thing I'd change in bearophile's solution is 
to remove the file reading loop, add the unittest block from your 
solution instead, and place all the examples there.  Printing the 
result does not seem imperative on Rosettacode, and there are at 
least some entries in D which already use unittest for checking 
the problem requirements (for example, 
https://rosettacode.org/wiki/Sorting_algorithms/Cocktail_sort#D).

Lastly, please note that Rosettacode supports multiple versions 
in a single language (example: 
http://rosettacode.org/wiki/99_Bottles_of_Beer#D).  As 
bearophile's version certainly has its merits, I strongly suggest 
to keep it available, either merged with your current version to 
produce the right solution, or as a second version.

Ivan Kazmenko.