Signed word lengths and indexes

Thu Jun 17 03:41:33 PDT 2010

Justin Spahr-Summers Wrote:

> > 1. Ironically the issue is not in file offset's signedness. You still hit the bug with ulong offset.
> 
> How so? Subtracting a size_t from a ulong offset will only cause 
> problems if the size_t value is larger than the offset. If that's the 
> case, then the issue remains even with a signed offset.

May be, you didn't see the testcase.
ulong a;
ubyte[] b;
a+=-b.length; // go a little backwards

or

seek(-b.length, SEEK_CUR, file);

> > 2. Signed offset is two times safer than unsigned as you can detect
> > underflow bug (and, maybe, overflow).
> 
> The solution with unsigned values is to make sure that they won't 
> underflow *before* performing the arithmetic - and that's really the 
> proper solution anyways.

If you rely on client code to be correct, you get security issue. And client doesn't necessarily use your language or your compiler. Or he can turn off overflow checks for performance. Or he can use the same unsigned variable for both signed and unsigned offsets, so checks for underflow become useless.

> > With unsigned offset you get exception if the filesystem doesn't
> > support sparse files, so the linux will keep silence.
> 
> I'm not sure what this means. Can you explain?

This means that you have subtle bug.

> > 3. Signed offset is consistent/type-safe in the case of the seek function as it doesn't arbitrarily mutate between signed and unsigned.
> 
> My point was about signed values being used to represent zero-based 
> indices. Obviously there are applications for a signed offset *from the 
> current position*. It's seeking to a signed offset *from the start of 
> the file* that's unsafe.

To catch this is the case of signed offset you need only one check. In the case of unsigned offsets you have to watch underflows in the entire application code even if it's not related to file seeks - just in order to fix issue that can be fixed separately.

> > 4. Choosing unsigned for file offset is not dictated by safety, but by stupidity: "hey, I lose my bit!"
> 
> You referred to 32-bit systems, correct? I'm sure there are 32-bit 
> systems out there that need to be able to access files larger than two 
> gigabytes.

I'm talking about 64-bit file offsets which are 64-bit on 32-bit systems too.
As to file size limitations there's no difference between signed and unsigned lenghts. File sizes have no tendency stick to 4 gig value. If you need to handle files larger that 2 gigs, you also need to handle files larger than 4 gigs.

> > I AM an optimization zealot, but unsigned offsets are plain dead
> > freaking stupid.
> 
> It's not an optimization. Unsigned values logically correspond to disk 
> and memory locations.

They don't. Memory locations are a *subset* of size_t values range. That's why you have bound checks. And the problem is usage of these locations: memory bus doesn't perform computations on the addresses, application does - it adds, subtracts, mixes signeds with unsigneds, has various type system holes or kludges, library design issues, used good practices etc. In other words, it gets a little bit complex than just locations.