julik live

The sad state of human arrogance today

Lately an article has been making rounds on HackerNews where eevee laments on his tribulations when installing Discourse.

I've read it cover to cover, and I must admit it did strike a few notes with me, especially because I write Ruby web apps for a good decade now, with Rails and without. Most of the points named in this article are valid to some extent, but I profoundly disagree with the overall tone and the message.

First let's look at the narrative:

  • Ruby is not a primary language eevee is faimilar with (he's primarily a Python/Rust guy from what I could see - because I have been reading his other articles just a few days prior).
  • He does not have a lot of experience deploying a modern Ruby stack in production for web apps
  • Through his experience with Python he probably misses the fact that the Python deployment story is probably just as painful as Ruby's at this point. Moreover, some horrible relics of Ruby packaging (gemsets) still exist at large in the Python world (virtualenv).
  • He picked a project which is explicitly pushing the envelope in it's usage of fancy tools, and thus indeed wants to have it's mother and the kitchen sink for dependencies. This is not because you need fancy tools, but because for a modern web app you do need search, you do need a job queue, you do need push messaging.
  • Exactly because the developers of Discourse (whom I admire greatly) realise that the dependency story in Discourse is effin hard they suggest, loudly and clearly, to deploy it via images or containers. Eevee chose to use neither of these approaches, but facing the consequences of this decision proved to be a world of pain (exactly as predicted).
  • He has a complex Linux configuration, which to me (from my uneducated perspective at least) looks like a Linux build that has been accrued over the years with all sorts of manual tweaks (90% of them probably having to do with X11 and sound of course), and migrated over and over - as a result of which you indeed end up with a 32 bit runtime on top of a 64 bit kernel. This, for tools that assume a default configuration for most of situations, is a recipe for butthurt.
  • He also had to use a PostgreSQL extension, which does not ship as a default part of the package.

Instead of raising my hands in violent agreement with him, or rebutting his post point by point, I would like to look at the actual reasons why this stuff happens.


Let's start at the beginning. It is indeed so, that deploying a web app in a fancy language is difficult. Fancy for our purposes is anything that is not PHP, WAR-packaged Java, or .NET. Simply because the environments for the new fancy languages vary greatly. Platform support varies greatly. All sorts of things vary greatly. And not nearly enough people give those environments a tough grind to provide a good experience for most of the users of the actual products written in those languages. Most distributions skim on updating the environments on time. Most of the distributions skim on providing compilers by default, and you need compilers because those languages do not imply that you will have the entirety of Ubuntu statically linked into an Apache module (which is the case with PHP) or carry 80% of the runtime with your core OS (which is the case with .NET) or carry precompiled dependencies in a machine-independant format (which is the case with Java, and is actually a compelling case at that).

When you make a choice to use a product written in a fancy language you have to realize that you will, in one way or another, have to carry the burden of this language not being widely used. Consequently, there will be some tinkering involved if you do not want to use pre-packaged distributions (which eevee chose not to do), because when you skip on using that configuration you get into setting up the product on a development workstation. This is a very different use case than just uploading the forum on the server and see it load it's index.php.

This is a fact for Ruby. This is a fact for Python. This is a fact for modern Smalltalks. Probably the only exceptions are Go and Rust, because they can statically link everything and their kitchen sink into the binaries - but then again, good luck running that in a mixed 32/64 bit setup, when you link to mixed binaries.

Where the actual issue lies

The fundamental problem here is one crucial component of today's computing, which, in my opinion, is not receiving nearly the attention it deserves. It is the ABI, and the linker that excsercises this ABI. While the various C++ commitees and development groups for the modern strict compiled languaes are spending time figuring out how to make this work

 auto auto<Auto&auto[*auto]> var;

they are intentionally leaving the linking and dependency tracking laying by the wayside (it is unspecified, it is left to the host OS, insert-random-excuse-to-avoid-platform-holy-wars-here). I'd wager that 80% of complexity of Ruby deployment, including the Windows horror story, has to do with the fact that you need architecture- and OS-specific machine code, compiled on the machine, and linkable for the machine, as part of your infrastructure.

It can take different forms - for example, before you get shared packages for PHP, someone actually had to go and issue a ./configure command which takes about 80 lines describing all the various extensions and libraries that the resulting PHP blob will include. It is a painful process. Close to (or more so) painful as installing 60 gems and seeing that 1 of them does not build because .a or because some .so is 1 minor version below requirement, or headers are missing.

To make PHP available on your distro of choice, someone actually had to go and enter that command. I personally spent a couple of days of my life exclusively figuring out why a particular version of PHP would not link to the Apache server it was meant for, just because the versions of expat were mismatched.

Today's ABIs and linkers and shared library workflow is shit. It sucks for a million reasons, and I do not see any movement with the computing community at large to displace this situation or think up a replacement solution. Why there has to be a trailer of guano dumped on the authors of Discourse - who have not created this situation in the first place - is beyound me.

Database extensions

The story with the extension is also typical. See, most web apps these days want to provide full-text search. Even though I never used Discourse myself, I can guess with 99% certainty that the extension eevee hates installing so much is nothing else than an FTS engine. Do you know why there has to be a database extension for this? Well, the reasons is this: up until now there is no decent plug-in full-text search system for C-related runtimes. If you want FTS, basically the only shrink wrap solution that is widely used is ElasticSearch. ElasticSearch implies Lucene. Lucene implies a JVM. The circle has closed.

Maybe MySQL is better in this respect? Wait, no, it isn't - you also have to link (see this ABI shit hitting home again?) a plugin into it to provide search. Yes there is a built-in fulltext feature. No, I (personally) got no idea as to what it does regarding language support for instance.Deploy an app with a table for stemmed words and search it (like some widely used forum engines has been doing for eternity)? Will likely provide a bad user experience but will work. Is it the approach to take for a product that is pushing the envelope, technically (and Discourse is pushing it)? No. Especially if it says on the tin "our runtime is complex, better deploy it wholesale".

There were efforts to implement FTS as separate extensions for Python and Ruby. They usually have locking issues. They are marvelous. There is not nearly enough work going on them to supplant the ElasticSearch monopoly or to displace the status-quo with those database extensions we need to install.

RVM

Because fancy languages iterate often, and distros ship them in a state which is 50% complete and obsolete by half a decade, you might want to use a version manager. Version managers sometimes suck, and most of the time contain more magic than you would want. This is the reason why I personally removed all rbenv/rvm solutions from my servers and just run The Damn Version Of Ruby that the project needs. You tell me PHP is better? Wait until the project you want to run has PHP 5.x and you have 5.(x-1). Good luck convincing your shared hosting provider to update.

If you want sanity you just install the Ruby your project needs. If you want multiple Rubies be prepared to managed them. This is not something horrible, this is reality.

Also for some entertainment try some parallel JVM versions (because various Java projects also have different JVM version requirements) and parallel python builds.

Equilibrium

When we choose to write a project in a fancy stack, it provides tremendous productivity boosts. If we then write on the packaging that "This thing is hard to install, better use an image/container", this writing is there for a reason. When you want something that is "copy the binary to the server, run it" then by all means - go for it, use a product written in a runtime that supports this use case explicitly, from the outset (like Go). It might or might not be the product that you want feature wise, but it will spare you a lot of grief - especially if you do not know and do not want to know the details of the runtime the project uses. For the developers of a project, though, it comes to equilibrium - where does the size and the complexity of the stack outweigh the ease of development. Each project balances it differently. I maintain a whole ecosystem of gems that should still work on Ruby 1.8.7 because I chose to do so, and it was right for the goals I have set for that specific project.

The cloud deployment engine

It is a very valid use case, and a very valid way to earn money. As a matter of fact, the first stepping stones for this have already been laid, by the name of Heroku. It costs you shitloads of money, exactly so that you do not have to think about all of those things and manage them. Ditto for sandstorm.io. However, I keep about a dozen Ruby apps running in production (super-small scale all the way to super-cloud-scale) and I do not encounter the issues eevee is describing - because I know that most of those problems are easily avoidable, but you have to know the runtime to make decisions based on that knowledge in the first place.

By the way - Heroku buildpacks are also a pain in the butt, because linkers.

Morale

Having said all that, a very simple message will suffice: bashing on a particular product (which is in fact great), built using a runtime you do not know, then bashing the containerisation product that is the best of what we have now, and then bashing Rails as a framework, and screaming "I want to FTP my scripts to my shared hosting like we did in 2001" does not make anyone happy, does not offer any solutions, and does not invite discussion. I'd wager a large cloud product eevee was working on was not using FTP-to-server as it's deployment mechanism either. Neither did it probably use a 100% stock Python distribution with 100%-stock wheels/eggs/whatevers.

As a matter of fact insulting all of the Ruby devs on the planet is not going to solve those problems. Going back to PHP will not solve those problems.

Doing something about the shitty ABI and linkers we have chained ourselves to is. Maybe the JVM is the answer after all.

 
comments powered by Disqus

Aspirine not included.