IO Completion Ports

At the place I’m currently working, I spend a lot of time working on Windows servers. The servers are very heavily multithreaded to deal with loads of clients simultaneously. However, as the received wisdom is that having one thread per connection is bad, we use a thread pool to better distribute work amongst threads.

One of the issues with using thread pools is the thundering herd problem — lots of threads in a pool all waiting for some work, and then when that work turns up, they all jump on it. Only one can win, but in the mean time you have a great big fightas they all try to acquire the exclusive lock. Eventually it all settles down, but you’ve probably thrown away any performance gained from pooling right there in the dusty cloud of context switching.

On Windows, a solution to this problem lies in the interesting concept of “IO Completion Ports.” These are queue objects (represented as a HANDLE) which can be associated with file or network HANDLEs, upon which many threads can wait. When an input or output operation on an associated HANDLE completes (for example a network write), one (and only one) of the waiting threads is woken up to deal with the event. Additionally, user events can be enqueued on the port, so the waiting threads can be made to do non-IO specific operations too.

I don’t know of an equivalent analogue on a non-Windows operating system, but then I’m a little out of touch — my last venture into internet programming on Linux was for my MUD back at university.

This is all great, and they’re pretty easy to use — once you’ve worked out their many subtleties. Firstly, the documentation is rubbish. There’s no decent overview documentation, and some of the reference material is plain misleading. After some email nagging from me, they’ve improved it a little, but they seem to have changed the way they document return values, making them almost as confusing as they used to be.

There are some unusual edge cases you need to deal with. The GetQueuedCompletionStatus function blocks a thread until there’s work for it to do. Based on the return value, the number of bytes and the overlapped structure, there’s a lot of possible “reasons” for the function to have returned. Deciphering all the possible cases:

Return value	`OVERLAPPED`	number of bytes	Description
zero	`NULL`	n/a	The call to GetQueuedCompletionStatus failed, and no data was dequeued from the IO port. This usually indicates an error in the parameters to GetQueuedCompletionStatus.
zero	non-`NULL`	n/a	The call to GetQueuedCompletionStatus failed, but data was read or written. The thread must deal with the data (possibly freeing any associated buffers), but there is an error condition on the underlying `HANDLE`. Usually seen when the other end of a network connection has been forcibly closed but there's still data in the send or receive queue.
non-zero	`NULL`	n/a	This condition doesn't happen due to IO requests, but is useful to use in combination with PostQueuedCompletionStatus as a way of indicating to threads that they should terminate.
non-zero	non-`NULL`	zero	End of file for a file `HANDLE`, or the connection has been gracefully closed (for network connections). The `OVERLAPPED` buffer has still been used; and must be deallocated if necessary.
non-zero	non-`NULL`	non-zero	"num bytes" of data have been transferred into the block pointed by the `OVERLAPPED` structure. The direction of the transfer is dependant on the call made to the IO port, it's up to the user to remember if it was a read or a write (usually by stashing extra data in the `OVERLAPPED` structure). The thread must deallocate the structure as necessary.

That’s about as succinct as I can get it. Hopefully that will be useful to somebody somewhere — I know I didn’t find the Microsoft documentation much use.

Though their name seems to suggest they’re all about IO requests, IO Completion Ports can also be used for general purpose thread pools. By creating an IO Completion Port unassociated with any sockets, files or so on, one can schedule “work” by calling PostQueuedCompletionStatus on the port, passing in anything you like in the OVERLAPPED structure, number of bytes and completion key.

One final gotcha, and a bit of a “wtf” moment: When creating unassociated IO Completion Ports, watch out for the parameters to CreateIoCompletionPort. The first HANDLE you pass has to be INVALID_HANDLE_VALUE, whereas the second has to be NULL. You’d think if both HANDLEs were basically saying “no handle”, convention would have it that they’d be the same, but no. The only thing I can think of is that the second parameter is marked as “optional”, so NULL is more appropriate, but it does make you wonder why INVALID_HANDLE_VALUE isn’t NULL. Seems it’s legacy reason.

Ironically, after all the work on IO Completion Ports, one of our major components does use a one thread per connection model, and has scaled beautifully. A reminder to check your assumptions before embarking on a lengthy new implementation.

IO Completion Ports

About Matt Godbolt