Woke up this morning…
May 5th, 2009…to the sound of a webserver in trouble. I run an update service for a software company. They need to distribute software and data updates to their existing customers, and were tired of endlessly burning and mailing out DVDs. So I set up a web service and made a nice REST API for them, so they can upload files, queue up those files for download by specific customers, and so on. The software on the customer site just checks the server, and downloads whatever is queued up for it. Authentication is based on a dongle that is shipped to the customer. It works great, with very little effort on anyone’s part. It’s a lot cheaper than shipping out DVDs, too.
While testing, and indeed in the first few months of the service being live, I saw the update files were all pretty small — a couple of MB usually — so like the idiot that I am, I wrote the code for sending an update via the web service like this:
- In the webserver child process, read the entire file into memory
- output that content towards the webserver via CGI
Not a problem, works like a charm. Then they uploaded a 200MB update.
The child process reads in the 200MB file, thus growing to that size in memory. The webserver is apache, and it gets that 200MB of content from the child process, thus also growing to 200MB in size. For some reason I don’t quite understand, apache does NOT let go of the child process, even though it demonstrably has all its output. So I now have 400MB worth of memory in use, for a single download. Five customers started up their application, and the 2GB of memory in the server melted like snow in a name-writing contest. The memory can’t be shared, because the child process handles putting it in memory; the kernel can’t possible know it’s all the same bunch of bits. It can’t even be shared between the apache and its child process, the CGI interface is too crude for that. It’s just a pipe, forget about mmap or anything like that.
The machine eventually ran itself out of swap, and started killing processes. That’s when I noticed, of course. I threw a ton of swap at the machine to keep it going, found the code that caused the crazy memory usage, put a brown paper bag over my head, and started looking for a better solution.
The obvious way to handle this is to have the kernel send the file to the remote end, and have the userspace process — apache in this case — wash its hands of the whole thing. Linux can do this with the sendfile system call. It takes input and output file descriptors, and just sends one to the other. The calling process simply waits for sendfile() to return.
But how can I do this from an apache subprocess? It doesn’t have access to the remote end’s socket file descriptor, only apache has this. That’s where mod_xsendfile comes in. It examines headers coming from subprocesses, looking for a X-Sendfile header. If found, it takes the value (everything after the colon) to be a filename to send to the remote end. The module then discards the content sent by the subprocess, and calls sendfile() instead. So instead of reading in the whole file and outputting it back to apache, I now return only the full path to the file in an X-Sendfile header, and the whole thing is solved. Apache now needs no memory at all per update it sends out, so it’s a lot more scalable.
Apache has a fairly crufty old codebase, and it’s not the most dynamic of projects — but I still don’t understand why mod_xsendfile isn’t a standard module. This should be in every apache distribution — it solved this problem beautifully. Nevertheless, the module is available, and it works. There’s life in the old girl yet!
I should note that lighttpd has this functionality built in, as a part of the included mod_fastcgi. This is perhaps no surprise — lighttpd is specifically intended to be more lightweight and scalable, and perform better, than apache.