How to get a substring?

Sun Oct 27 01:53:46 PDT 2013

On Sunday, 27 October 2013 at 08:35:11 UTC, Jonathan M Davis 
wrote:
> On Sunday, October 27, 2013 09:14:28 Nicolas Sicard wrote:
>> On Sunday, 27 October 2013 at 07:44:06 UTC, Jakob Ovrum wrote:
>> > On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel 
>> > wrote:
>> >> Dumb Newbie Question: I've searched through the library
>> >> reference, but I haven't figured out how to extract a
>> >> substring from a string. I'd like something like
>> >> string.substring("Hello", 0, 2) to return "Hel", for 
>> >> example.
>> >> What method am I looking for? Thanks!
>> > 
>> > There are a lot of good answers in this thread but I also 
>> > think
>> > they miss the real issue here.
>> 
>> I don't think so. It's indeed worth noticing that Phobos'
>> algorithms work with Unicode nicely, but:
>> a) working on indices is sometimes the actual functionality you
>> need
>
> Sometimes, but it usually isn't. If you find that you 
> frequently need to use
> indices for a string, then you should probably rethink how 
> you're using
> strings. Phobos aims at operating on ranges, which rarely means 
> using indices,
> and _very_ rarely means using indices on strings. In general, 
> indices only get
> used on strings when you're trying to optimize a particular 
> algorithm for
> strings and make sure that you slice the string so that the 
> result is a string
> rather than a wrapper range.

+1

Also, I think if users need to get UTF indices in their own code, 
it's indicative of either a) Phobos lacking an (optimized) 
algorithm, or b) the user is doing something extremely niche that 
Phobos can't aim to cover generically.

> Sure, indexing strings can be very useful, but they way that 
> Phobos is
> designed does not lend itself to using string indices (quite 
> the opposite in
> fact), and in my experince, using string indices is rarely 
> needed even when
> doing heavy string manipulation.

And Phobos is better off for it!

I don't know if we do a good enough job of educating users about 
Unicode and its implications though, assuming this is a 
responsibility of the D community towards new D users.

>> c) do they really handle grapheme clusters? (I don't know)
>
> I believe that that sort of thing is properly supported by the 
> updated std.uni
> in 2.064, but it is the sort of thing that you have to code 
> for. Phobos as a
> whole operates on ranges of dchar - which is correct most of 
> the time but not
> enough when you need full-on grapheme support. I haven't yet 
> looked in detail
> at what std.uni now provides though. I just know that it's 
> added some grapheme
> support.
>
> - Jonathan M Davis

The new std.uni supports all the grapheme-related functionality 
you would ever need (as far as I can tell), but the nice thing is 
that most code doesn't need to use it to be Unicode-correct. i.e. 
you don't need to be "aware" of grapheme clusters to not break 
them in the vast majority of code domains.