Signed word lengths and indexes

Justin Spahr-Summers Justin.SpahrSummers at gmail.com
Thu Jun 17 00:38:36 PDT 2010


On Thu, 17 Jun 2010 03:27:59 -0400, Kagamin <spam at here.lot> wrote:
> 
> Justin Spahr-Summers Wrote:
> 
> > This sounds more like an issue with file offsets being longs, 
> > ironically. Using longs to represent zero-based locations in a file is 
> > extremely unsafe. Such usages should really be restricted to short-range 
> > offsets from the current file position, and fpos_t used for everything 
> > else (which is assumably available in std.c.stdio).
> 
> 1. Ironically the issue is not in file offset's signedness. You still hit the bug with ulong offset.

How so? Subtracting a size_t from a ulong offset will only cause 
problems if the size_t value is larger than the offset. If that's the 
case, then the issue remains even with a signed offset.

> 2. Signed offset is two times safer than unsigned as you can detect
> underflow bug (and, maybe, overflow).

The solution with unsigned values is to make sure that they won't 
underflow *before* performing the arithmetic - and that's really the 
proper solution anyways.

> With unsigned offset you get exception if the filesystem doesn't
> support sparse files, so the linux will keep silence.

I'm not sure what this means. Can you explain?

> 3. Signed offset is consistent/type-safe in the case of the seek function as it doesn't arbitrarily mutate between signed and unsigned.

My point was about signed values being used to represent zero-based 
indices. Obviously there are applications for a signed offset *from the 
current position*. It's seeking to a signed offset *from the start of 
the file* that's unsafe.

> 4. Choosing unsigned for file offset is not dictated by safety, but by stupidity: "hey, I lose my bit!"

You referred to 32-bit systems, correct? I'm sure there are 32-bit 
systems out there that need to be able to access files larger than two 
gigabytes.

> I AM an optimization zealot, but unsigned offsets are plain dead
> freaking stupid.

It's not an optimization. Unsigned values logically correspond to disk 
and memory locations.


More information about the Digitalmars-d mailing list