removing ansi control escape characters from a string

Adam D. Ruppe destructionator at gmail.com
Fri May 31 18:32:40 PDT 2013


On Saturday, 1 June 2013 at 01:08:46 UTC, Timothee Cour wrote:
> removes escape codes for coloring, etc.

Getting all these is a very difficult task because the escape 
sequences aren't all well defined.

But, you should get pretty good results by just filtering out 
anything that starts with "\033[" through anything that is char 
 >= 'A'.

"\033" is often written ^[ by the shell. If you hold ctrl and 
press the [ key, it sends the same character as pressing the esc 
key, which is \033.


string outputString;
bool inEscape = false;
bool justEnteredEscape = false;
foreach(c; inputString) {
    if(justEnteredEscape) {
        justEnteredEscape = false;
        if(c == '[')
             inEscape = true;
        else {
             // NOTE: this is actually likely wrong but prolly 
good enough
             outputString ~= c;
        }
    else if(inEscape) {
        if(c >= 'A')
            inEscape = false;
        // otherwise we want to skip this character, since it is 
part of e.g. a color sequence
    } else if(c == '\033') {
        justEnteredEscape = true;
    } else {
        if(c == 8) continue; // skip backspace
        if(c == 0) continue; // and so on for whatever else you 
don't want....

       outputString ~= c;
    }
}

// outputString should be ok now




I didn't actually run that code since I don't have test data 
available but I think it will work to strip out the majority of 
the escape sequences you'll see on an output stream.


Now I said the if(justEnteredEscape) part is wrong, but probably 
good enough. The reason it is probably wrong is some terminals 
will use other characters there, especially on input. Terminal 
input is a huge mess.

For example, if I hit F1 on my xterm, the sequence it sends is 
^[OP. We're only looking for ^[[ there.

The problem is how do you tell the difference between xterm 
sending ^[OP and the user hitting escape, then typing O and P?

This is why unix sucks btw... you really can't. Real apps tell 
the difference by looking at the time delay. In fact, if you open 
vim or something and type <esc> OP really quickly in xterm 
(assuming your xterm sends the same sequences as mine - it might 
not! This is where the termcap and terminfo databases come in and 
omg that's painful).

Anyway if you hit it really quickly, vim will pop open its help 
screen! Whereas if you type it a little slower, it will go to 
command mode then open a new line and type P.

The reason is if you hit it fast, the application has no way it 
can possibly tell if you were doing it manually or if xterm was 
sending the F1 key.




And that's the problem you'll face with the log file. Unless it 
has timing information, you can't use that method. So there's no 
way for you to tell.

Also btw those codes have variable length and content, so you'd 
really have to understand them to strip them (see my terminal.d 
[1] for an example of this, it is a lot of code), so even if it 
was totally possible, that's a lot of work for filtering a log 
file.

If the majority is output or regular key input though, you don't 
have to worry too much about this. The color sequences are pretty 
well defined and by far most common in output and this should 
handle them.

[1] 
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff/blob/master/terminal.d


More information about the Digitalmars-d-learn mailing list