removing ansi control escape characters from a string
Adam D. Ruppe
destructionator at gmail.com
Fri May 31 18:32:40 PDT 2013
On Saturday, 1 June 2013 at 01:08:46 UTC, Timothee Cour wrote:
> removes escape codes for coloring, etc.
Getting all these is a very difficult task because the escape
sequences aren't all well defined.
But, you should get pretty good results by just filtering out
anything that starts with "\033[" through anything that is char
>= 'A'.
"\033" is often written ^[ by the shell. If you hold ctrl and
press the [ key, it sends the same character as pressing the esc
key, which is \033.
string outputString;
bool inEscape = false;
bool justEnteredEscape = false;
foreach(c; inputString) {
if(justEnteredEscape) {
justEnteredEscape = false;
if(c == '[')
inEscape = true;
else {
// NOTE: this is actually likely wrong but prolly
good enough
outputString ~= c;
}
else if(inEscape) {
if(c >= 'A')
inEscape = false;
// otherwise we want to skip this character, since it is
part of e.g. a color sequence
} else if(c == '\033') {
justEnteredEscape = true;
} else {
if(c == 8) continue; // skip backspace
if(c == 0) continue; // and so on for whatever else you
don't want....
outputString ~= c;
}
}
// outputString should be ok now
I didn't actually run that code since I don't have test data
available but I think it will work to strip out the majority of
the escape sequences you'll see on an output stream.
Now I said the if(justEnteredEscape) part is wrong, but probably
good enough. The reason it is probably wrong is some terminals
will use other characters there, especially on input. Terminal
input is a huge mess.
For example, if I hit F1 on my xterm, the sequence it sends is
^[OP. We're only looking for ^[[ there.
The problem is how do you tell the difference between xterm
sending ^[OP and the user hitting escape, then typing O and P?
This is why unix sucks btw... you really can't. Real apps tell
the difference by looking at the time delay. In fact, if you open
vim or something and type <esc> OP really quickly in xterm
(assuming your xterm sends the same sequences as mine - it might
not! This is where the termcap and terminfo databases come in and
omg that's painful).
Anyway if you hit it really quickly, vim will pop open its help
screen! Whereas if you type it a little slower, it will go to
command mode then open a new line and type P.
The reason is if you hit it fast, the application has no way it
can possibly tell if you were doing it manually or if xterm was
sending the F1 key.
And that's the problem you'll face with the log file. Unless it
has timing information, you can't use that method. So there's no
way for you to tell.
Also btw those codes have variable length and content, so you'd
really have to understand them to strip them (see my terminal.d
[1] for an example of this, it is a lot of code), so even if it
was totally possible, that's a lot of work for filtering a log
file.
If the majority is output or regular key input though, you don't
have to worry too much about this. The color sequences are pretty
well defined and by far most common in output and this should
handle them.
[1]
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff/blob/master/terminal.d
More information about the Digitalmars-d-learn
mailing list