[ENet-discuss] ENet scalability benchmark data

Tue Mar 4 13:09:17 PST 2008

Yes, I've thought about streamlining the I/O in ENet to improve
throughput and might do it later as I get more familiar with the
insides of the library. If I tried now, it would certainly not qualify
as "cleanly". It's not an urgent need for me, especially as I don't
know whether 2 k clients is a realistic number for one machine to
serve. I hope it is, and this benchmark was just meant to give me some
kind of indication of that.

I think there's a fairly simple change that can improve performance a
bit, and that's placing all packets to be delivered to the application
into a buffer, so to check for traffic, the application just reads
that buffer instead of checking each peer individually. It's only a
partial solution of course, since it only deals with incoming packets.

Espen Overaae

On 3/4/08, Lee Salzman <lsalzman1 at cox.net> wrote:
> Note that those sorts of benchmarks are somewhat artificial, in that they
>  go against the grain of ENet's design.
>
>  Servicing ENet 1,000 times a second is going to eat up a bunch
>  of processing time just doing user -> kernel -> user transitions for
>  normal system calls, regardless. The enet_host_check_events() function
>  was introduced in 1.2 to combat this by ensuring you never transition
>  into the kernel when all you want is just to eat up the current batch of
>  packets.
>
>  Also ENet is designed for maximum fairness to a smaller number
>  of peers, rather than supporting absolutely huge number of peers, to keep
>  some places in the code simpler. It iterates over the entire list of peers
>  giving each one a shot in turn, as opposed to "first come first serve"
>  which
>  would otherwise let some peers monopolize the link. This is antagonistic
>  to allocating 8,000 peers and only using 2,000 of them. If you only want
>  to use 2,000 peers, then only allocate 2,000 peers.
>
>  Though if these bottlenecks are truly objectionable, then there are
>  some mods that can be done (at the cost of complexity):
>
>  Just keep a list of peers over which there are packets or events to
>  dispatch,
>  and a list of peers over which there are packets to send out. Replace the
>  iteration in those two circumstances with removal of peers from this list,
>  and push the peers to the back of the list again as necessary to ensure
>  fairness if there is still remaining stuff to handle. Ideally these need
>  to be
>  dynamically sized ring-buffers to keep the cost of shuffling pointers in and
>  out of the lists sane.
>
>  Though, this becomes complicated and messy in the case of dealing with
>  timed events liked  resends or periodic pings, which is precisely why I
>  avoided it. You can't just have a separate list of peers for timed events,
>  or otherwise they will not get serviced in a fair order if you push them
>  onto the back of "has stuff to send" list. So peers waiting on
>  acknowledgements
>  pretty much have to live within that list, and get repeatedly pushed to the
>  back until they get the acknowledgements. Pings would have to be handled
>  specially/separately, since pings are essentially always waiting to be
>  sent, so
>  you would lose the inherent fairness of pinging only on the peer's
>  actual turn,
>  and instead have to either send all pings before all other traffic, or
>  after. But
>  given that pings are piggy-backed on normal packets, it becomes trickier yet
>  with clients only being in the "needs a ping" list sometimes, and
>  sometimes not.
>
>  At some point the performance and complexity the above just becomes not
>  worth
>  it for something that is intended to be a simple, efficient library for
>  modest games.
>  If you want make an "MMOG"  with it, then some slight re-engineering of
>  these
>  issues might be necessary. Though, I am not opposed to patches if someone
>  wants to cleanly implement the above.
>
>
>  Lee
>
>
>  Espen Overaae wrote:
>  > I've been running some benchmarks to see how many clients I could get
>  > a single server process to serve, and ran into some interesting
>  > bottlenecks.
>  >
>  > I call enet_host_service 1000 times per second. I have 4 channels.
>  > Each connected client sends a small packet every 100 ms, and gets
>  > various amounts of data in return every 100 ms.
>  >
>  > First the no connections and no traffic scenario:
>  > With max peers at a low number like 64, cpu usage is 0%
>  > With max peers at about 2 k, cpu usage is 2-3%
>  > With max peers at about 8 k, cpu usage is about 9%, with most of it
>  > being spent in this function:
>  > 72.74% - enet_protocol_dispatch_incoming_commands
>  > Dropping the polling rate to 100 Hz reduces cpu usage to 1% for 8K max peers.
>  >
>  >
>  > When I connect a bunch of peers and start sending data all over the place:
>  >
>  > With 8 k max peers and 100 Hz polling rate, the server stays
>  > responsive until about 2 k clients, and uses about 90% cpu.
>  > Profiling shows nearly 25% of this time is spent in ENet:
>  > 12.83% - enet_protocol_dispatch_incoming_commands
>  > 9.31% - enet_protocol_send_outgoing_commands
>  >
>  > With 8 k max peers and 1 kHz polling rate, the server is more
>  > responsive all over, but still only handles about 2 k clients, and cpu
>  > usage rises to about 150% (the server is multithreaded and running on
>  > a quad-core).
>  > Profiling shows more than 50% of this time is spent in ENet, which
>  > translates to about 80% cpu usage for the thread servicing ENet.
>  > The big culprits are, according to gprof:
>  > 27.35% - enet_protocol_dispatch_incoming_commands
>  > 26.32% - enet_protocol_send_outgoing_commands.
>  >
>  > Creating two server processes with 2 k max peers each and a 1 kHz
>  > polling rate, allows me to connect a total of 3.5 k clients spread
>  > over the two processes before the servers become unresponsive. CPU use
>  > with two server processes is about 150% for each process, 40%
>  > system(kernel - I guess this is the time spent inside system calls)
>  > time, and only 5% idle (the remaining 55% probably spent in the client
>  > processes and other background processes).
>  > Profiler still shows about 50% time spent in ENet:
>  > 29.43% - enet_protocol_dispatch_incoming_commands
>  > 18.00% - enet_protocol_send_outgoing_commands
>  >
>  >
>  > These numbers do not show how much time is spent in system calls and
>  > how much is spent in actual enet code, they only show the grand total
>  > of time spent within the functions listed and all subfunctions. I
>  > assume much of it is spent in ENet. Looking at the ENet code, I assume
>  > increasing the number of channels would increase the cpu time spent in
>  > ENet. The total throughput in these tests has been a few megabits per
>  > second, most of it unreliable. Responsiveness is simply measured by
>  > the percieved time it takes to connect a batch of 500 new clients and
>  > seeing how many of them fail to connect.
>  >
>  > The server processes did some computations on the data transmitted.
>  > Previously I've done essentially nothing with them and the profiler
>  > showed enet to use a greater share of the cpu time, but the total
>  > number of clients I could connect remained fairly constant. Even with
>  > 4 server processes and no computations, the servers got unresponsive
>  > when the total client number approached 4 k.
>  >
>  > To get useful profiling information from multiple threads, I used
>  > this: http://sam.zoy.org/writings/programming/gprof.html
>  >
>  >
>  > Espen Overaae
>
> > _______________________________________________
>  > ENet-discuss mailing list
>  > ENet-discuss at cubik.org
>  > http://lists.cubik.org/mailman/listinfo/enet-discuss
>  >
>  >
>
>  _______________________________________________
>  ENet-discuss mailing list
>  ENet-discuss at cubik.org
>  http://lists.cubik.org/mailman/listinfo/enet-discuss
>