tolf and detab

Jonathan M Davis jmdavisprog at gmail.com
Sat Aug 7 22:18:07 PDT 2010


On Saturday 07 August 2010 21:59:50 Andrei Alexandrescu wrote:
> Very nice. Here's how I'd improve removeTabs:
> 
> #!/home/andrei/bin/rdmd
> import std.conv;
> import std.file;
> import std.getopt;
> import std.stdio;
> import std.string;
> 
> void main(string[] args)
> {
>      uint tabSize = 8;
>      getopt(args, "tabsize|t", &tabSize);
>      foreach(f; args[1 .. $])
>          removeTabs(tabSize, f);
> }
> 
> void removeTabs(int tabSize, string fileName)
> {
>      auto file = File(fileName);
>      string output;
>      bool changed;
> 
>      foreach(line; file.byLine(File.KeepTerminator.yes))
>      {
>          int lastTab = 0;
> 
>          while(lastTab != -1)
>          {
>              const tab = line.indexOf('\t');
>              if(tab == -1)
>                  break;
>              const numSpaces = tabSize - tab % tabSize;
>              line = line[0 .. tab] ~ repeat(" ", numSpaces) ~ line[tab +
> 1 .. $];
>              lastTab = tab + numSpaces;
>              changed = true;
>          }
> 
>          output ~= line;
>      }
> 
>      file.close();
>      if (changed)
>          std.file.write(fileName, output);
> }

Ah. I needed to close the file. I pretty much always just use readText(), so I 
didn't catch that. Also, it does look like detecting whether the file changed was 
a bit simpler than I thought that it would be. Quite simple really. Thanks.

> Very nice! You may as well guard the write with an if (result !=
> fileStr). With control source etc. in the mix it's always polite to not
> touch files unless you are actually modifying them.

Yes. That would be good. It's the kind of thing that I forget - probably because 
most of the code that I write generates new files rather than updating pre-
existing ones.

> 
> This makes me think we should have a range that detects and replaces
> patterns lazily and on the fly. I've always thought that loading entire
> files in memory and working on them is "cheating" in some sense, and a
> range would help with replacing patterns in streams.

It would certainly be nice to have a way to reasonably process with ranges 
without having to load the whole thing into memory at once. Most of the time, I 
wouldn't care too much, but if you start processing large files, having the whole 
thing in memory could be a problem (especially if you have multiple versions of 
it which were created along the way as you were manipulating it). Haskell does 
lazy loading of files by default and doesn't load the data until you read the 
appropriate part of the string. It shouldn't be all that hard to do something 
similar with D and ranges. The hard port would be trying to do all of it in a 
way that makes it so that all of the processing of the file's data doesn't have 
to load it all into memory (let alone load it multiple times). I'm not sure that 
you could do that without explicitly processing a file line by line, writing it 
to disk after each line is processed, since you could be doing an arbitrary set 
of operations on the data. It could be interesting to try and find a solution for 
that though.

> 
> Looking very good, thanks. I think we should have a feature these and a
> few others as examples on the website.

Well, I for one, much prefer the ability to program in a manner that's closer to 
telling the computer to do what I want rather than having to tell it how to do 
what I want (the replace end-of-line character program being a prime example). 
It makes life much simpler. Ranges certainly help a lot in that regard too. And 
having good example code of how to program that way could help encourage people 
to program that way and use std.range and std.algorithm and their ilk rather 
than trying more low-level solutions which aren't as easy to understand.

- Jonathan M Davis


More information about the Digitalmars-d mailing list