HTTP frameworks benchmark focused on D libraries

Sun Sep 20 20:43:54 UTC 2020

With my lib, the -version=embedded_httpd_threads build should 
give more consistent results in tests like this.

The process pool it uses by default in a dub build is more crash 
resilient, but does have a habit of dropping excessive concurrent 
connections. This forces them to retry which slaughters 
benchmarks like this. It will have like 5 ms 99th percentile (2x 
faster than the same test with the threads version btw), but then 
that final 1% of responses can take several seconds complete 
(indeed with 256 concurrent on my box it takes a whopping 30 
seconds!). Even with only like 40 concurrent, there's a final 1% 
spike there, but it is more like 10ms so it isn't so noticeable, 
but with hundreds it grows fast.

That's probably what you're seeing here. The thread build accepts 
more smoothly and thus evens it out giving a nicer benchmark 
number... but it actually performs worse on average in real world 
deployments in my experience and is not as resilient to buggy 
code segfaulting (with processes, the individual handler respawns 
and resets that individual connection with no other requests 
affected. with threads, the whole server must respawn which also 
often slips by unnoticed but is more likely to disrupt unrelated 
users).

There is a potential "fix" for the process handler to complete 
these benchmarks more smoothly too, but it comes at a cost: even 
in the long retry cases, at least the client has some feedback. 
It knows its connection is not accepted and can respond 
appropriately. At a minimum, they won't be shoveling data at you 
yet. The "fix" though breaks this - you accept ALL the 
connections, even if you are too busy to actually process them. 
This leads to more inbound data potentially worsening the 
existing congestion and leaving users more likely to just hang. 
At least the unaccepted connection is specified (by TCP) to retry 
later automatically, but if it is accepted, acknowledged, yet 
unprocessed, it is unclear what to do. Odds are the user will 
just be left hanging until the browser decides to timeout and 
display its error which can actually take longer than the TCP 
retry window.

My threads version does it this way anyway though. So it'd 
probably look better on the benchmark.

But BTW stuff like this is why I don't put too much stock in 
benchmarks. Even if you aren't "cheating" like checking length 
instead of path and other tricks like that (which btw I think are 
totally legitimate in some cases, I said recently I see it as a 
*strength* when you can do that), it still leaves some nuance on 
the ground. Is it crash resilient? Debuggable when it crashes? Is 
it compatible with third-party libraries or force you to choose 
from ones that share your particular event loop at risk of 
blocking the whole server when you disobey? Does it *actually* 
provide the scalability it claims to under real world conditions, 
or did it optimize to the controlled conditions of benchmarks at 
the expense of dynamic adaptation to reality?

Harder to measure those.