Imagine you were making a simple asynchronous HTTP request API. If you were coding in C you might sketch out something a little like1:
// Starts an asynchronous HTTP fetch of the given url.
// Returns a positive integer request handle to be used
// to identify this request.
int StartRequest(const char* url);
// Blocks waiting for a request to complete. Returns the
// number of bytes received.
int WaitForRequest(int handle_id);
// ... APIs to read data from the request here ...
// Note that we're done with a handle, freeing up resources
// associated with it, and returning the handle to the pool
// of available handles.
void CloseRequest(int handle_id);
All is good until you realise that sometimes you need to be able to cancel an
in-progress request. Say for example, a user hits the cancel button in a web
browser. You tinker around with your implementation, and then realise you can
reuse CloseRequest
— you clarify this in the comment:
// Note that we're done with a handle, freeing up resources
// associated with it, and returning the handle to the pool
// of available handles. If the request is still
// in-progress, it is cancelled. Any threads blocked on
// this request will be released: WaitForRequest will
// return -1 to indicate the cancellation.
void CloseRequest(int handle_id);
Cool! We now have a simple API. But wait - there’s a threading problem waiting to bite us later on. Have a quick look and see if you can spot it.
Got it? If so — congratulations, go to the top of the class.
If not, don’t fret as you’re in good company: Microsoft missed this in their Windows Internet (WinInet) API design too.
handling. To see where the problem lies, consider this situation:
// Cancelling example.
int handle = -1; // an invalid handle
// Fetches a URL, waits for it to download, then does some
// form of (unspecified) processing on it.
void ProcessUrl(const char* url) {
handle = StartRequest(url);
int error = WaitForRequest(handle);
if (error >= 0) {
// ...process data here...
CloseHandle(handle);
handle = -1;
} else {
// cancelled!
}
}
// Assume this is called on receipt of a UI event,
// asynchronous to the fetching of HTTP requests.
void BrowseToUrl(const char* url) {
if (handle >= 0) {
// Cancel any existing connection.
CloseHandle(handle);
handle = -1;
}
// Create a new thread to process this URL.
StartNewThread(ProcessURL, url);
}
“Aha,” you might say, “there’s no lock on the handle
variable so there’s a
race there. Where’s my prize?” And indeed you’d be right. There’s no locking
there.
But how would you add a lock?
// Thread-safe version?
void ProcessUrl(const char* url) {
LockMutex();
handle = StartRequest();
int error = WaitForRequest(handle);
UnlockMutex();
...
}
void BrowseToUrl(const char* url) {
LockMutex();
if (handle >= 0) {
CloseHandle(handle);
handle = -1;
}
UnlockMutex();
...
}
Job done. Except…you’re now unable to cancel an in-progress request!
The mutex is locked for the duration of the whole StartRequest
and
WaitForRequest
calls. If BrowseToUrl
is called on another thread it will
be unable to cancel the existing request: It requires the mutex that the
requesting thread holds — a mutex that will be held for the duration of the
request!
There is no place you can place the lock and not end up with a race during
cancellation. The only way you can do this is to have an atomic
UnlockMutexAndWaitForRequest
function2. In the absence of such a
function, the best you can do is something like:
// Supports cancellation. Very nearly thread safe.
void ProcessUrl(const char* url) {
LockMutex();
handle = StartRequest();
int temp_handle = handle;
UnlockMutex(); // fingers crossed we don't yield here.
int error = WaitForRequest(temp_handle);
...
}
The race is still there, but it’s now just at the point between unlocking the
mutex and starting the request. Another thread could cancel the request between
us taking a copy of the handle in temp_handle
and starting to wait on it.
I mentioned before this is a problem in the Windows Inet library. So now to the really interesting bit: why is this not picked up by more people?
The key is that the handles provided by WinInet are not recycled - a unique
handle (modulo the number of possible values of handles there are) is given out
every time. In the race condition described above, that means in the worst case
we try to WaitForRequest
on an invalid handle, and provided the implementation
is rigourous in checking the validity of handles, we’ll just get an “invalid
handle” type of error.
This hides the problem — it’s still there, but the non-recycled handles prevent it from ever biting.
Unfortunately, on Windows Mobile devices, the handles are aggressively
reused3. In the case described above, that could mean that between us
taking a copy of the handle in temp_handle
and us waiting on it; another
thread could have cancelled our request, started a new one — receiving the
same handle that the original thread thinks it owns. When the scheduler
allows the original thread to run, we will call WaitForRequest
on the other
thread’s handle — with disastrous results.
So is there a way around this? Thankfully, yes — though it’s not perfect. More on this in another post, though!
Matt Godbolt is a C++ developer working in Chicago for Aquatic. Follow him on Mastodon.