compile-time regex redux
kenny
funisher at gmail.com
Wed Feb 7 09:04:10 PST 2007
Walter Bright wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Walter Bright wrote:
>>> But I think we now have good reasons to revisit this, at least for
>>> compile time use only. For example:
>>>
>>> ("aa|b" ~~ "ababb") would evaluate to "ab"
>>>
>>> I expect one would generally only see this kind of thing inside
>>> templates, not user code.
>>
>> The more traditional way is to mention the string first and pattern
>> second, so:
>>
>> ("ababb" ~~ "aa|b") // match this guy against this pattern
>>
>> And I think it returns "b" - juxtaposition has a higher priority than
>> "|", so your pattern is "either two a's or one b". :o)
>
> My bad. Some more things to think about:
>
> 1) Returning the left match, the match, the right match?
> 2) Returning values of parenthesized expressions?
> 3) Some sort of sed-like replacement syntax?
>
> An alternative is to have the compiler recognize std.Regexp names as
> being built-in.
Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
just being able to write like I can in D with compile time variables would be so much easier for me, and it would only require one template function instead of 35 to parse a simple string... for example.
1. A while back, I needed something very quickly to remove whitespace. it took me much less time with loops than I ever could have done with a regex. I want to be able to do the same in templates, if possible. I will be trying to reproduce later this, but I think that it will require a lot of templates.
2. what about building associative arrays out of a string? I have this function from existing code. It didn't take too long to write. I want to be able to write something like this in templates to build assoc arrays dynamically.
I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. Perhaps I missed something...
EXAMPLES BELOW
--- whitespace removal ---
char[] t = text.dup;
char[] new_text;
uint len = new_text.length = t.length;
new_text.length = 0;
t = replace(t, "\r\n", "\n");
t = replace(t, "\r", "\n");
t = replace(t, "\t", " ");
int i = 0;
len = t.length;
while(i < len) {
if(t[i] == '/' && t[i+1] == '/') {
if(i == 0 || t[i-1] == ' ' || t[i-1] == '\n') {
while(i < len) {
if(t[i] == '\n') {
break;
}
t[i++] = '\n';
}
}
}
i++;
}
for(i = 0; i < len; i++) {
if(t[i] < 0x20) {
if(t[i] == '\n') {
i++;
while(i < len && t[i] == ' ') {
i++;
}
i--;
} else {
t[i] = ' ';
i--;
}
} else if(!(t[i] == ' ' && i > 0 && t[i-1] == ' ')) {
new_text ~= t[i];
}
}
if(new_text[0] == ' ') {
new_text = new_text[1 .. length-1];
}
if(new_text[length-1] == ' ') {
new_text.length = new_text.length-1;
}
--- ASSOC ARRAY BUILDING ---
char[][char[]] parse_options(char[] text) {
char[][char[]] options;
text = strip(text);
uint text_len = text.length;
uint i = 0;
while(text[i] == '{' && text[text_len-1] == '}') {
text_len--;
i++;
}
if(i > 0) {
text = strip(text[i .. text_len]);
text_len = text.length;
i = 0;
}
for(;i < text_len; i++) {
if(text[i] != ' ' && text[i] != ',') {
// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
// ^^^^^
uint start = i;
while(text[i] != ':') {
if(text[i] == ' ') {
log_warning!("found a space in your label... expecting ':' in '^'", text[start .. i]);
}
if(++i >= text_len) {
log_error!("expected label... but not found '^'", text[start .. i]);
goto return_options;
}
}
char[] label = strip(text[start .. i]);
// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
// ^
i++;
while(text[i] == ' ') {
if(++i >= text_len) {
log_error!("label has no value '^'", text);
goto return_options;
}
}
uint def_start = i++;
switch(text[def_start]) {
case '{':
// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
uint scopee = 1;
while(true) {
if(++i >= text_len) {
log_error!("cannot find end to text string in label '^'", label);
goto return_options;
}
if(text[i] == '{') {
scopee++;
} else if(text[i] == '}') {
if(scopee == 1) {
break;
}
scopee--;
}
// skip text
if(text[i] == '"' || text[i] == '\'' || text[i] == '`') {
char delim = text[i];
i++;
if(i >= text_len) break;
while(text[i] != delim || (text[i] == delim && text[i-1] == '\\')) {
if(++i >= text_len) {
log_error!("cannot find end to text string in label '^'", label);
goto return_options;
}
}
}
}
options[label] = strip(text[def_start .. i+1]);
assert(strip(text[def_start .. i+1])[0] == '{');
assert(strip(text[def_start .. i+1])[length-1] == '}');
break;
case '"', '`', '\'':
// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
// ^^^^^^^^^^^^^ ^^^^^^^^
char delim = text[def_start];
char[] string = "";
while(text[i] != delim || (text[i] == delim && text[i-1] == '\\')) {
if(text[i] == delim && text[i-1] == '\\') {
string[length-1] = delim;
} else {
string ~= text[i];
}
if(++i >= text_len) {
log_error!("cannot find end to text string in label '^'", label);
goto return_options;
}
}
options[label] = string;
break;
default:
// { label: "options, yeah", label2: variable, label3: {label1: "lala:", label2: `variable2`}}
// ^^^^^^^^
while(text[i] != ' ' && text[i] != ',' && ++i < text_len) { }
options[label] = text[def_start .. i];
}
}
}
return_options:
return options;
}
More information about the Digitalmars-d
mailing list