Is this code "D-ish" enough?

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Aug 8 10:00:13 PDT 2013


On Wed, Aug 07, 2013 at 08:25:33PM +0200, lafoldes wrote:
> Hi, this is one of my attempts to code in D:
> 
> Given a binary file, which contains pairs of a 4 byte integer and a
> zero-terminated string. The number of pairs is known. The task is to
> read the pairs, sort them by the integer ("id") part, and write out
> the result with line numbers to the console.
> 
> I tried to do this using the component paradigm, without extra
> classes, helper functions, etc. I've ended up with this code:
> 
> {
>   uint count = 1000;
> 
>   auto f = File("binary.dat", "rb");
> 
>   uint id[1];
>   int lineNo = 0;
> 
>   repeat(0, count).
>   map!(t => tuple(f.rawRead(id)[0], f.readln('\0'))).
>   array.
>   sort!("a[0] < b[0]").
>   map!(t => format("%6d %08x  %s\n", lineNo++, t[0], t[1])).
>   copy(stdout.lockingTextWriter);
> 
>   stdout.flush();
> }
> 
> Is this code "D-ish" enough?
> 
> There are things, I don'like here:
> 
> - the dummy repeat at the beginning of the component chain. The only
> purpose of it is to produce the needed number of items.

This is a bad idea. You should be constructing a range that spans until
EOF, not some arbitrary count.


> Moreover repeat doesn't produce sortable range, so the array is needed
> as well.

You can't sort a one-pass sequence of values. It's only natural to
require storage (in an array, or some other data structure) in order to
be sortable.


> - I don't know how to do this when the number of items is nor known.
> How to repeat until EOF?

You should write a range that consumes exactly the amount of data you
need from the stream. First of all, you should recognize that your input
file has a different structure than just a mere sequence of bytes or
pages. For maximum readability/maintainability, you should make this
explicit by defining a structure to contain this data:

	struct Record {
		uint id;
		char[] str;
	}

Next, you should write a range that takes a File and returns a range of
Record's. Maybe something like this:

	// Warning: untested code
	auto getRecords(File f) {
		static struct Result {
			File f;
			this(File _f) {
				f = _f;
				readNext(); // get things going
			}
			@property bool empty() { return f.eof; }
			Record front;
			void popFront() { readNext(); }
			private void readNext() {
				union U {
					uint id;
					ubyte[uint.sizeof] raw;
				}
				U u;
				f.rawRead(u.raw);
				auto str = f.readln('\0');
				front = Record(u.id, str);
			}
		}
		return Result(f);
	}

Phobos isn't *quite* at the point where you don't have to write custom
code. :)

Once you have this, your code becomes:

	{
		File("binary.dat", "rb")
			.getRecords()
			.array	// this is necessary! you can't sort a one-pass range
			.sort!((a,b) => a.id < b.id)
			.map(t => format("%6d %08x  %s\n", t.id, t.str))
			.copy(stdout.lockingTextWriter);

		stdout.flush();	// this is probably also necessary
	}

If you want line numbers, you can use zip to pair up each record with a
line number:

	// Warning: untested code
	{
		File("binary.dat", "rb")
			.getRecords()
			.array	// this is necessary! you can't sort a one-pass range
			.sort!((a,b) => a.id < b.id)
			.zip(sequence!"n"(0))
			.map(t => format("%6d %08x  %s\n", t[1], t[0].id, t[0].str))
			.copy(stdout.lockingTextWriter);

		stdout.flush();	// this is probably also necessary
	}


> - The variables id, and lineNo outside the chain.

Yeah, those are bad. In my example code above, I got rid of them.


> - rawRead() needs an array, even if there is only one item to read.

This is a std.stdio limitation. But it can be worked around using a
union as I did above.


> - How to avoid the flush() at the end?
[...]

Why would you want to? You do have to flush stdout if you want output to
be written immediately, because it's a buffered output stream.
Forgetting to flush() is OK if your program exits shortly after, since
the runtime exit code will flush any unflushed buffers. But doing it
explicitly is probably better, and necessary if your program isn't going
to exit and you want the output flushed right away.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.


More information about the Digitalmars-d-learn mailing list