BatchConn - sendmmsg/recvmmsg in Go

BatchConn - sendmmsg/recvmmsg in Go

Hey all,

this time I want to talk about using sendmmsg and recvmmsg in Go and their impact on performance compared to sendmsg/recvmsg.

Sendmmsg aims to provide increased performance by reducing the number of syscalls required to write data to the network using sockets. This is done by transmitting multiple packets, up to X, in one syscall. Sendmmsg works in a similar way, by reading multiple packets in a single syscall. So looks quite promising to check this out. If we assume the context-switches of syscalls limit the actual performance of sending/receiving data over sockets, this may result in a huge performance increase. Let’s check it out!

My first try (which was really hacky btw…) was to wrap the actual write on sockets with a C implementation of sendmmsg in cgo. Well, this was as awkward as it sounds. But after some time, I found that Go provides a built-in wrapper for sendmmsg and recvmmsg, the so called WriteBatch and ReadBatch methods of ipv4.PacketConn source. Using them is more or less straight forward, but much more easy compared to implement some weird C code. I created a small benchmark tool that sends and reads UDP packets as fast as possible, using sendmsg/recvmsg and sendmmsg/recvmmsg, depending on the passed parameters.

I performed the following benchmark using two servers (Intel Xeon Silver 4114 CPU with 10x2.20GHz, 48GB of DDR4 RAM with a clock rate of 2666MHz), connected via 10Gbit/s direct links using Intel X722 network cards. The actual outcome was quite suprising. I started testing sendmmsg and analyzed the amount of outgoing traffic on the sending node. Using sendmmsg, the benchmarking tool was able to transfer around 5Gbit/s traffic with a packet size of 1400bytes. Hm wait, I think I remember this number. So I continued benchmarking sendmsg, resulting in a very similar result: also around 5Gbit/s. Wow, I wasn’t expecting this at all. Since there is no real benefit of using sendmmsg in this example (and if I’m not doing something completely wrong also in other use-cases), I didn’t look further into the receiving performance of recvmmsg compared to recvmsg, because my use-cases require to send data at high rates, which was improved by sendmmsg over sendmsg.

So what could limit the actual performance of sendmmsg in Go, that we do not see any increase? I guess there is some overhead of preparing the actual messages in a way, the native sendmmsg syscall can handle them. But that’s just an assumption, not a proven fact.

Final note: Please keep in mind, that my implementation was not further optimized, and there may be tricks to improve the performance of the WriteBatch and ReadBatch used in this article which I’m not aware of. So in case you know any of those potential tricks, please let me know.

As always, please mail me any feedback you have, I really appreciate any kind of comments or additional information, and I will update this article with any helpful input I get.

Cheers, Marten