Creeping Bloat in Phobos

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 28 14:13:08 PDT 2014


29-Sep-2014 00:44, Uranuz пишет:
>> It's Tolstoy actually:
>> http://en.wikipedia.org/wiki/War_and_Peace
>>
>> You don't need byGrapheme for simple DSL. In fact as long as DSL is
>> simple enough (ASCII only) you may safely avoid decoding. If it's in
>> Russian you might want to decode. Even in this case there are ways to
>> avoid decoding, it may involve a bit of writing in as for typical
>> short novel ;)
>
> Yes, my mistake ;) I was thinking about *Crime and Punishment* but
> writen *War and Peace*. Don't know why. May be because it is longer.
>

Admittedly both are way too long for my taste :)

> Thanks for useful links. As far as we are talking about standard library
> I think that some stanard aproach should be provided to solve often
> tasks: searching, sorting, parsing, splitting strings. I see that
> currently we have a lot of ways of doing similar things with strings. I
> think this is a problem of documentation at some part.

Some of this is historical, in particular std.string is way older then 
std.algorithm.

> When I parsing
> text I can't understand why I need to use all of these range interfaces
> instead of just manipulating on raw narrow string. We have several
> modules about working on strings: std.range, std.algorithm, std.string,
> std.array,

std.range publicly imports std.array thus I really do not see why we 
still have std.array as standalone module.

  std.utf and I can't see how they help me to solve my
> problems. In opposite they just creating me new problem to think of them
> in order to find *right* way.

There is no *right* way, every level of abstraction has its uses. Also 
there is a bit of trade-off on performance vs easy/obvious/nice code.

> So most of my time I spend on thinking
> about it but not solving my task.

Takes time to get accustomed with a standard library. See also std.conv 
and std.format. String processing is indeed shotgun-ed across entire phobos.

> It is hard for me to accept that we don't need to decode to do some
> operations. What is annoying is that I always need to think of
> codelength that I should show to user and byte length that is used to
> slice char array. It's very easy to be confused with them and do
> something wrong.

As long as you use decoding primitives you keep getting back proper 
indices automatically. That must be what some folks considered correct 
way to do Unicode until it was apparent to everybody that Unicode is way 
more then this.

>
> I see that all is complicated we have 3 types of character and more than
> 5 modules for trivial manipulations on strings with 10ths of functions.
> It all goes into hell.

There are many tools, but when I write parsers I actually use almost 
none of them. Well, nowdays I'm going to use the stuff in std.uni like 
CodePointSet, utfMatcher etc. std.regex makes some use of these already, 
but prior to that std.utf.decode was my lone workhorse.

> But I don't even started to do my job. And we
> don't have *standard* way to deal with it in std lib. At least this way
> in not documented enough.

Well on the bright side consider that C has lots of broken functions in 
stdlib, and even some that are _never_ safe like "gets" ;)

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list