A safer File.readln
Markus Laker via Digitalmars-d
digitalmars-d at puremagic.com
Sun Jan 22 13:29:39 PST 2017
It's pretty easy to DoS a D program that uses File.readln or
File.byLine:
msl at james:~/d$ prlimit --as=4000000000 time ./tinycat.d tinycat.d
#!/usr/bin/rdmd
import std.stdio;
void main(in string[] argv) {
foreach (const filename; argv[1..$])
foreach (line; File(filename).byLine)
writeln(line);
}
0.00user 0.00system 0:00.00elapsed 66%CPU (0avgtext+0avgdata
4280maxresident)k
0inputs+0outputs (0major+292minor)pagefaults 0swaps
msl at james:~/d$ prlimit --as=4000000000 time ./tinycat.d /dev/zero
0.87user 1.45system 0:02.51elapsed 92%CPU (0avgtext+0avgdata
2100168maxresident)k
0inputs+0outputs (0major+524721minor)pagefaults 0swaps
msl at james:~/d$
This trivial program that runs in about 4MiB when asked to print
itself chewed up 2GiB of memory in about three seconds when
handed an infinitely long input line, and would have kept going
if prlimit hadn't killed it.
D is in good company: C++'s getline() and Perl's diamond operator
have the same vulnerability.
msl at james:~/d$ prlimit --as=4000000000 time ./a.out tinycat.cpp
#include <fstream>
#include <iostream>
#include <string>
int main(int const argc, char const *argv[]) {
for (auto i = 1; i < argc; ++i) {
std::ifstream fh {argv[i]};
for (std::string line; getline(fh, line, '\n'); )
std::cout << line << '\n';
}
return 0;
}
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
2652maxresident)k
0inputs+0outputs (0major+113minor)pagefaults 0swaps
msl at james:~/d$ prlimit --as=4000000000 time ./a.out /dev/zero
1.12user 1.76system 0:02.92elapsed 98%CPU (0avgtext+0avgdata
1575276maxresident)k
0inputs+0outputs (0major+786530minor)pagefaults 0swaps
msl at james:~/d$ prlimit --as=4000000000 time perl -wpe '' tinycat.d
#!/usr/bin/rdmd
import std.stdio;
void main(in string[] argv) {
foreach (const filename; argv[1..$])
foreach (line; File(filename).byLine)
writeln(line);
}
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
3908maxresident)k
0inputs+0outputs (0major+192minor)pagefaults 0swaps
msl at james:~/d$ prlimit --as=4000000000 time perl -wpe '' /dev/zero
Out of memory!
Command exited with non-zero status 1
4.82user 2.34system 0:07.43elapsed 96%CPU (0avgtext+0avgdata
3681400maxresident)k
0inputs+0outputs (0major+919578minor)pagefaults 0swaps
msl at james:~/d$
But I digress.
What would a safer API look like? Perhaps we'd slip in a maximum
line length as an optional argument to readln, byLine and friends:
enum size_t MaxLength = 1 << 20; // 1MiB
fh.readln(buf, MaxLength);
buf = fh.readln(MaxLength);
auto range = fh.byLine(MaxLength);
Obviously, we wouldn't want to break compatibility with existing
code by demanding a maximum line length at every call site.
Perhaps the default maximum length should change from its current
value -- infinity -- to something like 4MiB: longer than lines in
most text files, but still affordably small on most modern
machines.
What should happen if readln encountered an excessively long
line? Throw an exception?
Markus
More information about the Digitalmars-d
mailing list