[ENet-discuss] ENet scalability benchmark data
Espen Overaae
minthos at gmail.com
Tue Mar 4 13:09:17 PST 2008
Yes, I've thought about streamlining the I/O in ENet to improve
throughput and might do it later as I get more familiar with the
insides of the library. If I tried now, it would certainly not qualify
as "cleanly". It's not an urgent need for me, especially as I don't
know whether 2 k clients is a realistic number for one machine to
serve. I hope it is, and this benchmark was just meant to give me some
kind of indication of that.
I think there's a fairly simple change that can improve performance a
bit, and that's placing all packets to be delivered to the application
into a buffer, so to check for traffic, the application just reads
that buffer instead of checking each peer individually. It's only a
partial solution of course, since it only deals with incoming packets.
Espen Overaae
On 3/4/08, Lee Salzman <lsalzman1 at cox.net> wrote:
> Note that those sorts of benchmarks are somewhat artificial, in that they
> go against the grain of ENet's design.
>
> Servicing ENet 1,000 times a second is going to eat up a bunch
> of processing time just doing user -> kernel -> user transitions for
> normal system calls, regardless. The enet_host_check_events() function
> was introduced in 1.2 to combat this by ensuring you never transition
> into the kernel when all you want is just to eat up the current batch of
> packets.
>
> Also ENet is designed for maximum fairness to a smaller number
> of peers, rather than supporting absolutely huge number of peers, to keep
> some places in the code simpler. It iterates over the entire list of peers
> giving each one a shot in turn, as opposed to "first come first serve"
> which
> would otherwise let some peers monopolize the link. This is antagonistic
> to allocating 8,000 peers and only using 2,000 of them. If you only want
> to use 2,000 peers, then only allocate 2,000 peers.
>
> Though if these bottlenecks are truly objectionable, then there are
> some mods that can be done (at the cost of complexity):
>
> Just keep a list of peers over which there are packets or events to
> dispatch,
> and a list of peers over which there are packets to send out. Replace the
> iteration in those two circumstances with removal of peers from this list,
> and push the peers to the back of the list again as necessary to ensure
> fairness if there is still remaining stuff to handle. Ideally these need
> to be
> dynamically sized ring-buffers to keep the cost of shuffling pointers in and
> out of the lists sane.
>
> Though, this becomes complicated and messy in the case of dealing with
> timed events liked resends or periodic pings, which is precisely why I
> avoided it. You can't just have a separate list of peers for timed events,
> or otherwise they will not get serviced in a fair order if you push them
> onto the back of "has stuff to send" list. So peers waiting on
> acknowledgements
> pretty much have to live within that list, and get repeatedly pushed to the
> back until they get the acknowledgements. Pings would have to be handled
> specially/separately, since pings are essentially always waiting to be
> sent, so
> you would lose the inherent fairness of pinging only on the peer's
> actual turn,
> and instead have to either send all pings before all other traffic, or
> after. But
> given that pings are piggy-backed on normal packets, it becomes trickier yet
> with clients only being in the "needs a ping" list sometimes, and
> sometimes not.
>
> At some point the performance and complexity the above just becomes not
> worth
> it for something that is intended to be a simple, efficient library for
> modest games.
> If you want make an "MMOG" with it, then some slight re-engineering of
> these
> issues might be necessary. Though, I am not opposed to patches if someone
> wants to cleanly implement the above.
>
>
> Lee
>
>
> Espen Overaae wrote:
> > I've been running some benchmarks to see how many clients I could get
> > a single server process to serve, and ran into some interesting
> > bottlenecks.
> >
> > I call enet_host_service 1000 times per second. I have 4 channels.
> > Each connected client sends a small packet every 100 ms, and gets
> > various amounts of data in return every 100 ms.
> >
> > First the no connections and no traffic scenario:
> > With max peers at a low number like 64, cpu usage is 0%
> > With max peers at about 2 k, cpu usage is 2-3%
> > With max peers at about 8 k, cpu usage is about 9%, with most of it
> > being spent in this function:
> > 72.74% - enet_protocol_dispatch_incoming_commands
> > Dropping the polling rate to 100 Hz reduces cpu usage to 1% for 8K max peers.
> >
> >
> > When I connect a bunch of peers and start sending data all over the place:
> >
> > With 8 k max peers and 100 Hz polling rate, the server stays
> > responsive until about 2 k clients, and uses about 90% cpu.
> > Profiling shows nearly 25% of this time is spent in ENet:
> > 12.83% - enet_protocol_dispatch_incoming_commands
> > 9.31% - enet_protocol_send_outgoing_commands
> >
> > With 8 k max peers and 1 kHz polling rate, the server is more
> > responsive all over, but still only handles about 2 k clients, and cpu
> > usage rises to about 150% (the server is multithreaded and running on
> > a quad-core).
> > Profiling shows more than 50% of this time is spent in ENet, which
> > translates to about 80% cpu usage for the thread servicing ENet.
> > The big culprits are, according to gprof:
> > 27.35% - enet_protocol_dispatch_incoming_commands
> > 26.32% - enet_protocol_send_outgoing_commands.
> >
> > Creating two server processes with 2 k max peers each and a 1 kHz
> > polling rate, allows me to connect a total of 3.5 k clients spread
> > over the two processes before the servers become unresponsive. CPU use
> > with two server processes is about 150% for each process, 40%
> > system(kernel - I guess this is the time spent inside system calls)
> > time, and only 5% idle (the remaining 55% probably spent in the client
> > processes and other background processes).
> > Profiler still shows about 50% time spent in ENet:
> > 29.43% - enet_protocol_dispatch_incoming_commands
> > 18.00% - enet_protocol_send_outgoing_commands
> >
> >
> > These numbers do not show how much time is spent in system calls and
> > how much is spent in actual enet code, they only show the grand total
> > of time spent within the functions listed and all subfunctions. I
> > assume much of it is spent in ENet. Looking at the ENet code, I assume
> > increasing the number of channels would increase the cpu time spent in
> > ENet. The total throughput in these tests has been a few megabits per
> > second, most of it unreliable. Responsiveness is simply measured by
> > the percieved time it takes to connect a batch of 500 new clients and
> > seeing how many of them fail to connect.
> >
> > The server processes did some computations on the data transmitted.
> > Previously I've done essentially nothing with them and the profiler
> > showed enet to use a greater share of the cpu time, but the total
> > number of clients I could connect remained fairly constant. Even with
> > 4 server processes and no computations, the servers got unresponsive
> > when the total client number approached 4 k.
> >
> > To get useful profiling information from multiple threads, I used
> > this: http://sam.zoy.org/writings/programming/gprof.html
> >
> >
> > Espen Overaae
>
> > _______________________________________________
> > ENet-discuss mailing list
> > ENet-discuss at cubik.org
> > http://lists.cubik.org/mailman/listinfo/enet-discuss
> >
> >
>
> _______________________________________________
> ENet-discuss mailing list
> ENet-discuss at cubik.org
> http://lists.cubik.org/mailman/listinfo/enet-discuss
>
More information about the ENet-discuss
mailing list