eliminate junk from std.string?

Mon Jan 10 17:29:01 PST 2011

On 1/10/11 2:41 AM, bearophile wrote:
> Lars T. Kyllingstad:
>
>> My suggestions for things to remove:
>>
>> hexdigits, digits, octdigits, lowercase, letters, uppecase, whitespace
>>   - What are these arrays useful for?
>>
>> capwords()
>>   - It tries to do too much.
>>
>> zfill()
>>   - The ljustify(),rjustify(), and center() functions
>>     should instead take an optional padding character
>>     that defaults to a space.
>>
>> maketrans(), translate()
>>   - I don't even understand what these do.
>>
>> inPattern(), countchars(), removechars()
>>   - Pattern matching is std.regex's charter.
>>
>> squeeze(), succ(), tr(), soundex(), column()
>>   - I am having a very hard time imagining myself ever
>>     using these functions...
>
> I agree with about nothing you have said :-)
>
> How much string processing you do day by day? I am using most of
> those things... If you are used in using Python or Ruby you probably
> find most of those things useful. If Andrei removes arrays like
> lowercase, letters, uppecase, I will have to write them myself in
> code.

The arrays letters, uppercase, and lowercase aren't all that useful 
because they only make sense for ASCII. Besides, they should be encoded 
as functions.

> ljustify(),rjustify(), and center() are very useful, even if
> they may be improved in some ways.

Hmmm. I suspected everyone's list will be different :o). I personally 
think the justification and centering functions are rarely useful - how 
often does one need to justify plain text? If you generate HTML the 
markup will do that for you and if you generate some nice text then the 
font will be proportional so the functions are useless.

Nevertheless, I ported them (and also fixed them - they were broken for 
anything non-ASCII, which probably is telling of the extent of their usage).

What are your use cases for these three functions?

> maketrans() and translate() (as
> other things) come from Python string functions, and I have used them
> a hundred times in string processing code. I have used squeeze() some
> times. soundex is not hurting, because even if it's not commonly
> necessary, its name is easy to understand and it's not easy to miss
> for something different, so it doesn't add much noise to the library.
> And I've seen that it's easy to implement soundex wrongly, while the
> one in the std.string is correct.

I think maketrans/translate are okay (if a bit arcane) but they need to 
be ported to Unicode.

Python apparently does mind Unicode as of 3.x, although I'm not sure 
exactly what the semantics are: 
http://stackoverflow.com/questions/3031045/how-come-string-maketrans-does-not-work-in-python-3-1. 
One odd thing is that you'd expect a dynamic language like Python to 
dynamically detect ASCII vs. non-ASCII. The example shows that Python 
rejects string-based translation tables even when they are, in fact, ASCII.

> I agree that too much stuff is generally bad in a library, because
> searching for something requires more time if there are more items to
> search into. In Bugzilla I have three or four bug reports that ask
> for few small changes in std.string (like removing chop and keeping
> chomp). But please don't remove too much. In a library more is often
> better.

I think we should remove all functions that rely on patterns represented 
as strings: inPattern, countchars, removechars, squeeze, munch.

Representing patterns as a convention on top of otherwise untyped 
strings doesn't seem a good solution for D. We should either go with 
regex or with a simple pattern structure and a helper function. That way 
people can say e.g. munch(s, pattern("[0-9]")).

Andrei