The Pain of Upgrades. How It's Mucked Up and Complicated

I started the weekend with one simple objective - update my blog. Fun, quick, easy and painless.

After all writing is the perfect remedy for most things and a good way to calm the nerves after surviving a long work week, traffic and a few beers on a Friday. Ah meeean,...what could be more relaxing than striking a couple keys while basking in the gentle glow of the sunrise early on a Saturday morning. Nothing, right? Ok. Maybe there are a few things better but that's really beside the point.

Let's get back on track. Writing, Yes, good ol' writing. On your regular day updating the blog is as elementary as 123. It only involves me musing then putting words to those thoughts and hitting publish and just like that an idea is broadcast on to the internet. However, this week I decided in my infinite wisdom to add a few more steps to the previous sequence.

Take a snapshot. In layman's terms make a copy of the files on the host system and application.
Update the server OS to the latest available release
Update the blog software to the latest release
Write a new blog post

Literally, one, two, three, four. A couple more tasks but even in the worse case it should only add an hour or two to the workflow? Not a big deal in the grand scheme of things. Oh' how wrong I was about to be. Very wrong! In reality, this turned out to be two days worth of effort. But weekends are only two days long!! Exactly. Argh...I knew there was a reason I only did this once a year.

Day One

First, starts the complications of updating the Operating System (OS) on the server and determining its compatibility with the underlying hypervisor used by the chosen cloud provider. This is a necessary check since in this modern age everything is virtualized. The dependency chain can get worse still if you are using a cloud provider like DigitalOcean for example who uses special optimizations implemented on the hosting side which require them to plug hooks into the new kernel so that everything runs smoothly. This therefore requires you to not only check if the OS version is supported but also that the specific kernel version on the desired OS version is supported too. Otherwise if you neglect these basic checks, after the nice update and reboot your network maybe gone and other mysterious things will start happening to you and your service. All this checking and rechecking becomes time consuming.

Anyways. Checks completed. Moving on. OMG, F@#!$k. My OS developers release a new version every six months plus it bundles all those security patches because the Russians or others keep hacking everyone nowadays and all of a sudden I am 3 major releases behind. Now I also have to figure out if an upgrade straight to the latest release from a version so far back is even possible. Of course, not. You have to hop to an intermediate release then to the latest shiny new thing. Geez. This is starting to feel like work. What did I get myself into...wasn't this supposed to be relaxing and more about writing than stringing commands together?

Ok. Half day gone but I think I got through that. One failed upgrade. One restore later and a retry. Viola!. Running on the shiny new thing now. Oh' Heck. This service runs on an application which itself runs on nodejs. However, the updated version of the application depends on several npm modules which no longer support earlier non LTS versions of node. Guess, what? I have to update the middleware to node 6+. Simple enough I think and so because I'm smart I'm going to take a shortcut and just bundle the app upgrade in this step as well. Two for price of one.

A few minutes later and I've downloaded binaries, unzipped a few things and moved around a few symlinks. Everything looks good so far. Manual tests passed and database and blog runs but....Hmm, when I start the service it keeps failing. Systemd indicates that my Start-Limit has been exceeded for this service. Why?

Is it systemd? Is it node? What the heck is the message about /usr/bin/env: node: No such file or directory anyway? Why is it showing up in the logs even though the path to node is updated in the shell environment and in /etc/profile.d? Dubious errors. Intermittent outages. Hours pass by. I'm anxious for a fix but Google is useless. You know you are in a desperate situation when you find yourself looking at redhat Bugzilla mail trails. Then a stroke of genius comes out of thin air. I should just put a symlink in /bin to node. Never needed it before but it can't hurt to try. Look at that, it works :)

But now the day is done. Time to eat popcorn and enjoy TV.

Day Two

Whew. It seems I made it. Things are working. Fingers crossed. Life's good and all is rosy. Even managed to change the colours on my theme and pushed it to GitHub. Man, things must really be going my way as seriously I haven't made a code commit in six months not even so much as to tweak a CSS file. Finally, something refreshing and there is still time left to muse and actually write. Yes, write. My initial goal remember.

Oh' Oh. I can't upload images? Whaaaaaaaaaaattt!!! What's wrong now? What is /var/lib/nginx/tmp/client_body/0000000009" failed (13: Permission denied). Huh??? The internet says its an issue with my nginx configuration? For God's sake why would I suddenly need to meddle with that again? Nah. The problem could be some SELinux context as I'm starting to think I know more about what's going on than those people on the internet. I frantically browse through old articles and my personal wiki pages. C'mon, nobody really remembers by heart those set context commands. Do they?

But deep down I'm hoping it has nothing to do with SELinux. Besides, why would the application developers suddenly decide to write files to a new temporary directory. Time to investigate. Let me try the obvious thing like actually looking to see which user owns the directory and the permissions applied to it. Immediately, the cause of my new frustration is clear and the solution is executed. The bloody OS update reverted the user account and permissions on the client upload directory. A random side effect of upgrading. I pray there are no new surprises out there lurking in dark corners just waiting - ready to get me.

It's evening. I am finally writing. It's a rant. A rant about something I expected to take less than an hour. It dawns on me that maybe I should just give up on this whole manage the server and everything on my own gig. What I really want is something like AWS Lambda, Heroku or GCP. Just take my code and run it and do all that provisioning, updating and upgrading in the background and let me sleep at night.

Nooooooooooo, but but.. but this is the future we dreamed about and it should all just work and be easy. I'm a techie worth his salt not some Sci-Fi novel writer. I'm a Solution Architect, a former Systems Administrator. If I can't do this no one can? I need to keep my feet wet.I can beat this. I can solve this. I can't and I won't just purchase some subscription to an *Everything As A Service (aaS) to let the upgrade pain all go away running to someone else. I smell the solution in the air...it's...it's Docker, LXC, Puppet or even OpenShift which has Kubernetes built in. Awesomeness. I need to go get some of that magic in my life. I need it now. Right now.

Then I read It's The Future on Medium and it all came crashing down. The future might be now but no technology is a panacea despite the fact that it looks like humanity discovered the God particle a few years ago. Instead, all weekend I found myself feeling like...

I just want to launch an app. Sigh. Fuck, OK, deep breaths. Jesus. OK, what’s Paxos?

And in the end I ended up reading How it feels to learn JavaScript in 2016 which reminded me that us humans, no matter how good we have it, always, always, find a way to muck it up and make it complicated.

Article originally published on nbranche.org and shared on Medium.