D - Unsafe and doomed

Mon Jan 6 05:40:08 PST 2014

On Monday, 6 January 2014 at 04:16:56 UTC, H. S. Teoh wrote:
> Since a null pointer implies that there's some kind of logic 
> error in
> the code, how much confidence do you have that the other 99 
> concurrent
> requests aren't being wrongly processed too?

That doesn't matter if the service isn't critical, it only 
matters if it destructively writes to a database. You can also 
shut down parts of the service rather than the entire service.

> Based on this, I'm inclined to say that if a web request process
> encountered a NULL pointer, it's probably better to just reset 
> back to a known-good state by restarting.

I many cases it might be, but it should be up to the project 
management or the organization to set the policy, not the 
language designer. This is an issue I have with many of the "c++ 
wannabe languages". They enforce policies that shouldn't be done 
on the level of a tool (it could be a compiler option though). My 
pet peeve is Go and its banning of assert() because many 
programmers use it in an appropriate manner. In D you have the 
overloading of conditionals and others. With Ada and Rust, it is 
ok, because they exist to enforce a policy for existing 
organizations (DoD, Mozilla). Generic programming languages that 
claim should be more adaptable.

> No, usually you'd set things up so that if the webserver goes 
> down, an init script would restart it. Restarting is 
> preferable, because it resets the program back to a known-good 
> state.

The program might be written in such a way that you know that it 
is a good state when you catch the null exception.

> careless bug, but a symptom of somebody attempting to inject a 
> root
> exploit?  Blindly continuing will only play into the hand of the
> attacker.

Protection against root exploits should be done on lower level 
(jail).

> The thing is, a null pointer error isn't just an exceptional 
> condition
> caused by bad user data; it's a *logic* error in the code. It's 
> a sign
> that something is wrong with the program logic.

And so is array-out-of bounds, or division-by-zero.

> Tell the client not to do that again? *That* sounds like the 
> formula for
> a DoS vector (a rogue client deliberately sending the crashing 
> request
> over and over again).

What else can you do? You return an error and block subsequent 
requests if appropriate.

In a networked  computer game you log misbehaviour, you drop the 
client after a random delay and you can block the offender. What 
you do not want is to disable the entire service. It is better to 
run a somewhat faulty service that entertain and retain your 
customers than shutting down until a bug fix appears. If it takes 
15-30 seconds to bring the server back up then you cannot afford 
to reset all the time.

I can point to many launches of online computer games that has 
resulted in massive losses due to servers going down during the 
first few weeks. That is actually one good reason to not use C++ 
in game servers, the lack of robustness to failure. In some 
domains the ability to keep the service running, and the ability 
to turn off parts of the service, is more important than 
correctness. What you want is a log of player-resources so that 
you post-failure can restore game balance.

> data and start over. This is a case of a problem with the 
> *code*, which
> means you cannot trust the program will continue doing what you

That depends on how the program is written and in which area the 
null exception happend. It might even be a known bug that might 
take a long time to locate and fix, but that is known to be 
innocent.

> things will still work the way you think they work, will only 
> lead to
> your program running the exploit code that has been injected 
> into the
> corrupted stack.

Pages with execution bit set should be write protected. You can 
only jump into existing code, injection of code isn't really 
possible. So if the existing code is unknown to the attacker that 
attack vector is weak.

> The safest recourse is to reset the program back to a known 
> state.

I see no problem with trapping None-failures in pure Python and 
keeping the service running. The places where it can happen tend 
to be when you are looking up a non-existing object in a 
database. Quite innocent if you can backtrack all the way down to 
the request handler and return an appropriate status code.

If you use the safe subset of D, why should it be different?