string splitting funcs
spir
denis.spir at gmail.com
Sat Jan 22 11:52:17 PST 2011
While we're at tweaking std.string:
When writing string libs or types (like Text recently), I implement 3
string splitting methods. This may --or not-- be useful for D's string
module.
The core point is: what to do with empty parts? They may be generated when:
* the separator is present at either end of the source string
* successive separators occur in the source string
Thus,
split("--abc-----def----", "--")
basically returns
["","abc,"","def","",""]
This may be or not what we expect. But why? I ended up considering there
are 2 distinct use cases where we need to split a string:
1. it is like a record (fields)
2. it is like a list (elements)
In the first case, we want to keep empty fields so that each field has a
constant index, and sometimes empty fields are meaningful. For instance,
in name--phone--email, when phone is absent, we still want email as
third field.
In the case of a list instead, most commonly empty elements are
irrelevant, actually often due to flexibility of the grammar (not always
formal). For instance, lists of words / numbers / tokens; or more simply
lines: we will rarely keep blank ones for further process.
This leads to 2 different string splitting funcs, eg
string[] listElements (string sep)
string[] recordFields (string sep)
(names discussable ;-)
The first func is symmetric to join. The second one may simply filter
the first one's results, or instead drop empty elements on the fly.
Finally, there is a third, different, use case, which may well be the
most common one, and requires yet another func:
string[] split (string whitespace=" \t\n")
which indeed splits on any whitespace. Usually, the expected behaviour
is any combination or repetition of ws chars is considered a single
separator; but ws at start/end well generates an empty part.
Makes sense?
Denis
_________________
vita es estrany
spir.wikidot.com
More information about the Digitalmars-d
mailing list