compile-time regex redux

kenny funisher at gmail.com
Wed Feb 7 09:04:10 PST 2007


Walter Bright wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Walter Bright wrote:
>>> But I think we now have good reasons to revisit this, at least for 
>>> compile time use only. For example:
>>>
>>>     ("aa|b" ~~ "ababb") would evaluate to "ab"
>>>
>>> I expect one would generally only see this kind of thing inside 
>>> templates, not user code.
>>
>> The more traditional way is to mention the string first and pattern 
>> second, so:
>>
>> ("ababb" ~~ "aa|b") // match this guy against this pattern
>>
>> And I think it returns "b" - juxtaposition has a higher priority than 
>> "|", so your pattern is "either two a's or one b". :o)
> 
> My bad. Some more things to think about:
> 
> 1) Returning the left match, the match, the right match?
> 2) Returning values of parenthesized expressions?
> 3) Some sort of sed-like replacement syntax?
> 
> An alternative is to have the compiler recognize std.Regexp names as 
> being built-in.

Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...

just being able to write like I can in D with compile time variables would be so much easier for me, and it would only require one template function instead of 35 to parse a simple string... for example.

1. A while back, I needed something very quickly to remove whitespace. it took me much less time with loops than I ever could have done with a regex. I want to be able to do the same in templates, if possible. I will be trying to reproduce later this, but I think that it will require a lot of templates.
2. what about building associative arrays out of a string? I have this function from existing code. It didn't take too long to write. I want to be able to write something like this in templates to build assoc arrays dynamically.

I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. Perhaps I missed something...


EXAMPLES BELOW



















--- whitespace removal ---
char[] t = text.dup;
char[] new_text;
uint len = new_text.length = t.length;
new_text.length = 0;

t = replace(t, "\r\n", "\n");
t = replace(t, "\r", "\n");
t = replace(t, "\t", " ");

int i = 0;
len = t.length;
while(i < len) {
	if(t[i] == '/' && t[i+1] == '/') {
		if(i == 0 || t[i-1] == ' ' || t[i-1] == '\n') {
			while(i < len) {
				if(t[i] == '\n') {
					break;
				}
				
				t[i++] = '\n';
			}
		}
	}
	
	i++;
}

for(i = 0; i < len; i++) {
	if(t[i] < 0x20) {
		if(t[i] == '\n') {
			i++;
			while(i < len && t[i] == ' ') {
				i++;
			}
			i--;
		} else {
			t[i] = ' ';
			i--;
		}
	} else if(!(t[i] == ' ' && i > 0 && t[i-1] == ' ')) {
		new_text ~= t[i];
	}
}

if(new_text[0] == ' ') {
	new_text = new_text[1 .. length-1];
}

if(new_text[length-1] == ' ') {
	new_text.length = new_text.length-1;
}


--- ASSOC ARRAY BUILDING ---

char[][char[]] parse_options(char[] text) {
	char[][char[]] options;
	text = strip(text);
	uint text_len = text.length;
	
	uint i = 0;
	while(text[i] == '{' && text[text_len-1] == '}') {
		text_len--;
		i++;
	}
	
	if(i > 0) {
		text = strip(text[i .. text_len]);
		text_len = text.length;
		i = 0;
	}
	
	for(;i < text_len; i++) {
		if(text[i] != ' ' && text[i] != ',') {
			
			// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
			//   ^^^^^
			uint start = i;
			while(text[i] != ':') {
				
				if(text[i] == ' ') {
					log_warning!("found a space in your label... expecting ':' in '^'", text[start .. i]);
				}
				
				if(++i >= text_len) {
					log_error!("expected label... but not found '^'", text[start .. i]);
					goto return_options;
				}
			}
			
			char[] label = strip(text[start .. i]);
			
			// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
			//         ^
			i++;
			
			while(text[i] == ' ') {
				if(++i >= text_len) {
					log_error!("label has no value '^'", text);
					goto return_options;
				}
			}
			
			uint def_start = i++;
			switch(text[def_start]) {
			case '{':
				// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
				//                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				uint scopee = 1;
				while(true) {
					if(++i >= text_len) {
						log_error!("cannot find end to text string in label '^'", label);
						goto return_options;
					}
					
					if(text[i] == '{') {
						scopee++;
					} else if(text[i] == '}') {
						if(scopee == 1) {
							break;
						}
						
						scopee--;
					}
					
					// skip text
					if(text[i] == '"' || text[i] == '\'' || text[i] == '`') {
						char delim = text[i];
						i++;
						if(i >= text_len) break;
						while(text[i] != delim || (text[i] == delim && text[i-1] == '\\')) {
							if(++i >= text_len) {
								log_error!("cannot find end to text string in label '^'", label);
								goto return_options;
							}
						}
					}
				}
				
				options[label] = strip(text[def_start .. i+1]);
				assert(strip(text[def_start .. i+1])[0] == '{');
				assert(strip(text[def_start .. i+1])[length-1] == '}');
				
			break;
			case '"', '`', '\'':
				// { label: "options, yeah", label2: `variable`, label3: {label1: "lala:", label2: `variable2`}}
				//           ^^^^^^^^^^^^^            ^^^^^^^^
				char delim = text[def_start];
				char[] string = "";
				while(text[i] != delim || (text[i] == delim && text[i-1] == '\\')) {
					if(text[i] == delim && text[i-1] == '\\') {
						string[length-1] = delim;
					} else {
						string ~= text[i];
					}
					
					if(++i >= text_len) {
						log_error!("cannot find end to text string in label '^'", label);
						goto return_options;
					}
				}
				
				options[label] = string;
			break;
			default:
				// { label: "options, yeah", label2: variable, label3: {label1: "lala:", label2: `variable2`}}
				//                                   ^^^^^^^^
				while(text[i] != ' ' && text[i] != ',' && ++i < text_len) { }
				options[label] = text[def_start .. i];
			}
		}
	}
	
return_options:
	return options;
}



More information about the Digitalmars-d mailing list