So it's been quite a while since tracksperanto-web went production, with more than 300 files converted so far with great results. I hope some sleepless nights for matchmovers worldwide have already been preserved and nothing could be better.
I would like to bring a little breakdown on how Tracksperanto on the web is put together (interesting case for background processing). The problem with this kind of applications (converters to be specific) is that the operation to convert something might take quite some time. To mitigate that, I decided to write the web UI for Tracksperanto in such a way that it would not be blocking for processing, so extensive use of multitasking is made.
On my servers I am still using FCGI since it works and frees me from any additional process management (to reload apps it's enough to restart the webserve, no monit, no god - no nothin). However, when you want to do background-processing in Ruby you need to make choices - and fork() was the best choice for this case. When I start the app, the following happens:
- Lighttpd starts up and spins off the FCGI dispatcher, supported by
rackup. To make Rack and Sinatra function properly I had to rewrite the lighttpd fix middleware like this. - The rackup script, in turn,
forks()a dispatcher process. The beauty of this is that the dispatcher process will die when lighttpd will tell the FCGI process to quit. - Then the work is split. The main FCGI process presents the default Sinatra app (supported by Rack::Cache). When a new upload is done, the file is saved by the Sinatra process and a new record in the database is made.
- The dispatcher process constantly polls the database for new jobs, and if one is found within a transaction it will be fetched, marked in the DB as being busy - we need a concurrent database for this - and then the processing itself happens.
- The dispatcher process spins up a new forked child just to process the job, again via
fork(). Tracksperanto jobs use alot of processor time and alot of memory, and with Ruby nobody can guarantee that jobs won't leak memory.fork()is highly beneficial in this way since after the job has been completed the forked worker will just die off, releasing any memory that has been consumed in the process. When one day I will switch to REE the gain will be even bigger since the Tracksperanto code will be copy-on-write. - When processing is taking place, the worker process writes the status into memcached. Tracksperanto is designed so that every component can report it's own status and a simple progress bar can be constructed to display the current state. So basically we constantly (many times per second) write a status of the job into memcached - percent complete and the last status message. To let the user see how processing is going, I've made an action in Sinatra that quickly polls memcached for the status and returns it to the polling Javascript as a JSON hash.
This scheme has the following benefits:
- Status reporting does not load the database (not needed and the information is hardly crucial).
- Zero memory leakage
- No Ruby daemon processes (the
daemonsgem is the worst piece of crap I've ever seen, overspinning processes and almost always failing either to restart or to stop the processes that have been spun up). - Start/stop control is tied into the webserver.
For jobs in the database I am using a hand-rolled solution resembling delayed_job.
Here's how the Rackup file looks:
require File.expand_path(File.dirname(__FILE__) + "/../app")
require File.expand_path(File.dirname(__FILE__) + "/../lighty_workaround")
require 'rack/cache'
worker_controller_pid = fork do
log_path = TW_ROOT + "/log/worker.log"
main_logger = Logger.new(log_path, "daily")
ActiveRecord::Base.establish_connection(TW_CONFIG["db"])
loop do
# We might have lost the connection when we forked, so reconnect
ActiveRecord::Base.verify_active_connections!
job = Job.find_first_pending
# Process the actual job in the child,
# so that any memory that gets overconsumed will be freed
# upon exit independently of the Ruby GC. We can easily do this
# since we don't need to receive any info back from the processor
if job
child_pid = fork do
# Internally reopen the logger
worker_logger = Logger.new(log_path)
# Ensure all sockets are reinstated in the child process
MessageBus.reconnect!
ActiveRecord::Base.verify_active_connections!
# Run the job
job.exec!
# Goodbye
worker_logger.warn "Completed the job #{job.to_param}"
# And exit explicitly and nicely
exit!(0)
end
main_logger.warn("Spun up worker #{child_pid} for #{job.to_param}")
# Spin up a thread that will wait for the child to terminate
# to avoid <defunct> children
Thread.new { Process.waitpid(child_pid) }
end
sleep(1) # Do not torture the DB all the time
end
end
# Establish a thread that will wait for the worker manager process to quit
Thread.new { Process.waitpid(worker_controller_pid) }
builder = Rack::Builder.new do
use Rack::ShowExceptions
use LightyWorkaround
use Rack::Cache, :metastore => 'file:/tmp/rack/cache/meta',
:entitystore => 'file:/tmp/rack/cache/body'
map "/" do
run TW.new # Run the Sinatra app
end
end
run builder
A few important snags:
Job.find_first_pending
actually does the transactional code that will also mark the job as being in progress if found (to prevent other dispatchers from hijacking it).
Thread.new { Process.waitpid(child_pid) }
uses Ruby threads for what they do best - waiting for something to happen. In this case, we monitor every process that has been spun up and ensure that it gets killed properly. The same is done for the loop process controlled by the FCGI dispatcher.
# Ensure all sockets are reinstated in the child process
MessageBus.reconnect!
If you did any work with Ruby subprocesses you should know that most sockets (and thus Ruby objects holding them) will disconnect when you fork(). MessageBus is a simple wrapper around Memcached calls that we use and ActiveRecord has to reconnect as well.
# Run the job
job.exec!
The exec! method will actually capture and record any exceptions that might occur during processing. When such a thing happens, the Job class itself will see to it that the file which failed will be emailed to me. The same happens when a job completes succesfully, but then no attachment is sent. To send messages with attachments, I am using the pony gem. Also a special status flag is written to memcahed so that the user gets a concise error message.
To prevent abuse, I do some security-by-obscurity and the code for tracksperanto-web will not be published (but you can get a peek at it if you contact me privately).
What others said
http://topsy.com/tb/bit.ly/7TR4PE
Pinged from http://topsy.com/tb/bit.ly/7TR4PE