Solving the threading problem in WinInet
In my previous post, I introduced a simple asynchronous HTTP request API which suffered from a quite major problem: it was impossible to cancel an in-progress request in a thread-safe manner. As it happens, the Windows Internet library suffers the same problem: though it’s hidden as desktop Windows doesn’t re-use handle values. However, Windows CE and Windows Mobile do; and so this problem does affect these embedded OSes.
Take a look at this code snippet, designed to show up the problem:
volatile int handle = -1;
void ThreadA() {
handle = StartRequest("http://google.com/");
int result = WaitForRequest(handle);
if (result >= 0) {
// We got the request ok. Process it here...
CloseRequest(handle);
handle = -1;
}
void ThreadB() {
Sleep(); // Wait a bit (gives Thread A a chance to run)
CloseRequest(handle); // Cancel Thread A's request
handle = StartRequest("http://yahoo.com/");
// ... process this request instead ...
CloseRequest(handle);
}
It’s pretty clear that the two threads will race on handle. As discussed
before, with the interface we have to work with, there’s no way of introducing a
mutex that protects handle without making it impossible to cancel an
in-progress request.
However, assuming that the API reliably gives an error if an invalid handle is passed to it or an invalid operation is carried out on an existing handle, there’s a workaround:
using std::set;
set<int> active_handles;
// Tries to register a handle as being in-use. Returns true
// if the handle is safe to use, and false if the handle is
// already in use by another thread.
bool TryRegisterHandle(int handle) {
LockMutex();
if (active_handles.find(handle) != active_handles.end()) {
// Although the system has given us 'handle' as a valid
// new handle to use, another thread currently believes
// it owns this handle. It is not safe for the current
// thread to use the handle.
UnlockMutex();
return false;
}
// It's safe to use this handle: prevent any other threads
// from using it for the duration this thread believes it
// has ownership by keeping a note of this handle in the
// active set.
active_handles.insert(handle);
UnlockMutex();
return true;
}
void UnregisterHandle(int handle) {
LockMutex();
active_handles.remove(handle);
UnlockMutex();
}
The idea here is that we keep track of which handles we think we own and can safely use, regardless of the handle the system gives us. If we get a handle that another thread of ours is already using, we back off and try again. Writing a couple of utility functions to hide this gives something like:
void SafeStartRequest(const char* url) {
int new_handle = -1;
// Loop forever until we get a handle we can use.
for (;;) {
new_handle = StartRequest(url);
if (TryRegisterRequest(handle)) {
// Safe to use; break out of the loop.
break;
}
// Another thread thinks it owns this handle even though
// we got it fair and square from the system. It's not
// safe to use it! Do not close the request - the thread
// that thinks it owns it will close it. Sleep a bit to
// give other threads a chance to run, then loop around
// and get a fresh new handle, hopefully not one another
// thread thinks it owns.
Sleep();
}
return new_handle;
}
// Close the handle, then mark it as available to other
// threads. Should only be called by the owner of the handle.
void SafeCloseRequest(int handle) {
CloseRequest(handle);
UnregisterRequest(handle);
}
It’s certainly not elegant, and is non-obvious. It doesn’t even save you from all possible problems — what if the handles are system-wide and another program which doesn’t know about your registration system is running? What if the first thread has just created a new handle when the cancellation happens? If it gets caught at just the right time, the firstthread gets the second thread’s handle and uses it without error, oblivious to the fact that it should have been cancelled1.
Experience has shown that this is usually “good enough”. The original code becomes:
volatile int handle = -1;
void ThreadA() {
handle = SafeStartRequest("http://google.com/");
int result = WaitForRequest(handle);
if (result >= 0) {
// We got the request ok. Process it here...
SafeCloseRequest(handle);
handle = -1;
}
}
void ThreadB() {
Sleep();
// Cancel thread A's request.
// NB this does not use SafeCloseRequest as we don't "own"
// the handle.
CloseRequest(handle); // Cancel Thread A's request
handle = SafeStartRequest("http://yahoo.com/");
// ... process this request instead ...
SafeCloseRequest(handle);
}
Ideally though, you’d go back to the drawing board with your design and separate the cancellation from the closing of handles. Unfortunately, if you’re using the Windows Internet library on a mobile device, you’re stuck with this workaround.
-
This eventuality (and similar ones) can be covered with judicious use of a thread-safe variable storing “am I cancelled”, checked and acted upon before any handle usage. But this article is already long enough! ↩
Filed under: Coding
Posted at 17:55:00 BST on 7th July 2009.
Threading issues in the Windows Internet API
Imagine you were making a simple asynchronous HTTP request API. If you were coding in C you might sketch out something a little like:
For brevity I’ve ignored the actual reading of HTTP data, and any form of error handling.
// Starts an asynchronous HTTP fetch of the given url.
// Returns a positive integer request handle to be used
// to identify this request.
int StartRequest(const char* url);
// Blocks waiting for a request to complete. Returns the
// number of bytes received.
int WaitForRequest(int handle_id);
// ... APIs to read data from the request here ...
// Note that we're done with a handle, freeing up resources
// associated with it, and returning the handle to the pool
// of available handles.
void CloseRequest(int handle_id);
All is good until you realise that sometimes you need to be able to cancel an
in-progress request. Say for example, a user hits the cancel button in a web
browser. You tinker around with your implementation, and then realise you can
reuse CloseRequest — you clarify this in the comment:
// Note that we're done with a handle, freeing up resources
// associated with it, and returning the handle to the pool
// of available handles. If the request is still
// in-progress, it is cancelled. Any threads blocked on
// this request will be released: WaitForRequest will
// return -1 to indicate the cancellation.
void CloseRequest(int handle_id);
Cool! We now have a simple API. But wait - there’s a threading problem waiting to bite us later on. Have a quick look and see if you can spot it.
Got it? If so — congratulations, go to the top of the class.
If not, don’t fret as you’re in good company: Microsoft missed this in their Windows Internet (WinInet) API design too.
To see where the problem lies, consider this situation:
// Cancelling example.
int handle = -1; // an invalid handle
// Fetches a URL, waits for it to download, then does some
// form of (unspecified) processing on it.
void ProcessUrl(const char* url) {
handle = StartRequest(url);
int error = WaitForRequest(handle);
if (error >= 0) {
// ...process data here...
CloseHandle(handle);
handle = -1;
} else {
// cancelled!
}
}
// Assume this is called on receipt of a UI event,
// asynchronous to the fetching of HTTP requests.
void BrowseToUrl(const char* url) {
if (handle >= 0) {
// Cancel any existing connection.
CloseHandle(handle);
handle = -1;
}
// Create a new thread to process this URL.
StartNewThread(ProcessURL, url);
}
“Aha,” you might say, “there’s no lock on the handle variable so there’s a
race there. Where’s my prize?” And indeed you’d be right. There’s no locking
there.
But how would you add a lock?
// Thread-safe version?
void ProcessUrl(const char* url) {
LockMutex();
handle = StartRequest();
int error = WaitForRequest(handle);
UnlockMutex();
...
}
void BrowseToUrl(const char* url) {
LockMutex();
if (handle >= 0) {
CloseHandle(handle);
handle = -1;
}
UnlockMutex();
...
}
Job done. Except…you’re now unable to cancel an in-progress request!
The mutex is locked for the duration of the whole StartRequest and
WaitForRequest calls. If BrowseToUrl is called on another thread it will
be unable to cancel the existing request: It requires the mutex that the
requesting thread holds — a mutex that will be held for the duration of the
request!
There is no place you can place the lock and not end up with a race during
cancellation. The only way you can do this is to have an atomic
UnlockMutexAndWaitForRequest function1. In the absence of such a
function, the best you can do is something like:
// Supports cancellation. Very nearly thread safe.
void ProcessUrl(const char* url) {
LockMutex();
handle = StartRequest();
int temp_handle = handle;
UnlockMutex(); // fingers crossed we don't yield here.
int error = WaitForRequest(temp_handle);
...
}
The race is still there, but it’s now just at the point between unlocking the
mutex and starting the request. Another thread could cancel the request between
us taking a copy of the handle in temp_handle and starting to wait on it.
I mentioned before this is a problem in the Windows Inet library. So now to the really interesting bit: why is this not picked up by more people?
The key is that the handles provided by WinInet are not recycled - a unique
handle (modulo the number of possible values of handles there are) is given out
every time. In the race condition described above, that means in the worst case
we try to WaitForRequest on an invalid handle, and provided the implementation
is rigourous in checking the validity of handles, we’ll just get an “invalid
handle” type of error.
This hides the problem — it’s still there, but the non-recycled handles prevent it from ever biting.
Unfortunately, on Windows Mobile devices, the handles are aggressively
reused2. In the case described above, that could mean that between us
taking a copy of the handle in temp_handle and us waiting on it; another
thread could have cancelled our request, started a new one — receiving the
same handle that the original thread thinks it owns. When the scheduler
allows the original thread to run, we will call WaitForRequest on the other
thread’s handle — with disastrous results.
So is there a way around this? Thankfully, yes — though it’s not perfect. More on this in another post, though!
Filed under: Coding
Posted at 23:12:00 BST on 3rd July 2009.
Samba PDC problems after changing server IP
Samba is great, but is a real pain to configure, and very hard to diagnose problems. There’s plenty of log information to plough through once you’ve remembered how to enable it, but whenever I find myself having to change anything about my setup, I’ll spend at least a couple of hours scratching my head wondering what on earth is going on. In the interest of saving somebody else from this pain…if you find yourself banging your head against a wall primary domain controller (PDC) having weird symptoms like:
- Long hangs in Windows when browsing the domain.
- Logins fail intermittently.
- Trying to joining a machine to the domain fails with the error “domain unavailable” even though the domain is definitely there.
And you see errors in log.nmbd of the form:
domain_master_node_status_fail:
Doing a node status request to the domain master browser
for workgroup PROFACTOR at IP 192.168.200.220 failed.
Cannot sync browser lists.
…then there might be a simple explanation. If you’ve recently changed your server’s IP address, there are a number of places where the old IP will still be lurking, possibly causing the issues above.
nmbd caches browse master information in /var/cache/samba/browse.dat — your PDC’s old IP
address will be listed here, and when nmbd starts up, it sees it as an existing, distinct
domain master browser that then doesn’t respond to it any more. Stop nmbd and then delete the
file.
In a similar way the WINS resolution system caches IPs — delete /var/lib/samba/wins.dat.
Restart the samba servers, and hopefully all will be well.
Filed under: Coding
Posted at 22:15:00 BST on 10th September 2008.
Another weekend and some more data recovery
This weekend:
- I became a year older — boo. I got some lovely presents — yay!
- My son smiled at me for the first time — yay!
- I was able to find an O2 shop that had a 16GB iPhone in stock — yay!
- I set up my Dad’s new PC, and it seems to work — yay!
- I was able to recover his data from his old broken disk — yay!
- I took an extra day off (today!) to spend more time with my family — yay!
So, all in all, pretty damn funky weekend.
The data recovery was achieved in the end by booting a ‘backup‘ Windows 2000 ISO on QEMU, and then buying R-Studio Data Recovery and running that on the data. A bit of a disappointment, but I ran out of time either rolling my own or getting either ScroungeNTFS or testdisk to work.
As I write this I’m now waiting for a 150GB file copy from the virtual disk to a cleanly formatted 1TB drive to give to my Dad. Because I’m an idiot, I originally recovered the data onto the virtual disk, and not the final destination.
To add extra fun, rather than copy the data out somehow from the virtual booted Windows 2000, I’m copying using:
- A network block device server to read the data block-by-block from the virtual disk.
- The network block device client mounting that image as
/dev/nbd0. kpartxto get the partition out from the raw block device as/dev/mapper/nbd0p1.- Finally, mounting
/dev/mapper/nbd0p1as an NTFS disk (natively in Linux), and copying using nautilus. (Though Malc points out rsync might have been a better choice.)
Posted at 22:20:00 BST on 18th August 2008.