10/25/2016

Horror Stories from the Development Trenches

web development, source control, multi-environment hosting

Why Multiple Developer Workflow is Important

Some of our clients experience major culture shock when they first come on board with us. Often, they’ve had previous experience working with a freelancer or have been taking care of their site on their own. They are used to developing live on their website. Want something changed? Edit a file on the server, or download it with FTP, edit it, and upload it back up to the server again. Super easy, right?

Um, kinda?

There can be a downside to that ease and all too often, clients don’t recognize that developing live on site might cause problems. Then we come in with our multiple developer process, talking about things like “version control” and “development, testing, and production environments”. I bet it’s really overwhelming to be a client at that point. After all, they know what they’ve been doing and it’s been working for them! Maybe for years! Why should they change now?

Well. There’s a tricky little phrase in that last paragraph, one that fundamentally alters the challenges of building and maintaining a website. The tricksy bit is “multiple developer”. The process gets much more complicated when multiple people have to be able to work on a website at the same time. Let me drop a scenario on you, and we’ll recap at the end.

Romulus and Remus Development Agency

From Bad … to Worse… to Even Worse … to… You Get It

Imagine that Romulus is the lead developer working with a new client. Everyone he talks to has a different set of files for the site. The previous developer (from a different company) made a zip file a couple of months ago, but that contains lots of files named “index.html.old” and “index.html.bak” and “index.html.old.bak”. Romulus asks the client to send over the code from the live site, but when he tries to use those files, it is quickly obvious that they are not the same version as what’s currently on the site.

Eventually, the client gives Romulus access to the site and he FTPs all the files down. After taking a couple of hours to get everything downloaded, he realizes that he doesn’t have the database, so he can’t run the site locally. The client gives him the credentials to use the database on the server remotely; as he is testing out some code, he accidentally sets all of the email addresses in the database to testing@test.com.  The client doesn’t have any db backups, and it takes a week to get most of the user accounts restored.

After that mistake the client really doesn’t want Romulus accessing the production database anymore. They also don’t want to give him their live data because it has information about their users in it, so the client spends two weeks trying — and failing — to figure out how to strip the user data effectively. Finally, they just give Romulus access to the control panel for their website. He’s unfamiliar with this host’s control panel and when he goes to make a backup of the database, he deletes it by mistake.

They still have no backups.

Remus jumps in to recover after that gaffe. He apologizes heartily for Romulus’s mistakes and puts in a bunch of hours getting the site restored… again… and gets Romulus back on board. Romulus is in the middle of coding a change that’s needed on the website next week, when suddenly there’s an important marketing announcement that needs to be made right now, and that requires some quick code changes.

Romulus tries to separate out the the changes for next week, even enlisting Remus to assist, but the two of them accidentally end up uploading part of the code for next week’s change. The site is now broken… just as the marketing announcement is being made. They both forgot to make backups of some of the files that they were going to be overwriting, and now they are  running around trying to find someone with a clean backup of the files.

Remus apologizes to the client — again — for breaking the site. The client is kind of uncomfortable working with them after multiple outages, but Romulus and Remus are committed to helping their (mostly still on board) client. They’ve been working on making a major update to the site, a total branding refresh. The client has seen and approved the designs, and they are really excited to refresh their site. They have a team of people working on the refresh, each developing a piece. Everyone completes their work and they go to put it on a temporary server for the client to review, but all the pieces don’t play nicely together. One developer used Angular.js, another used Ember.js, another developer did a whole bunch of work in jQuery 2, but another developer needs to use jQuery 1 to make their pieces work. It’s a mess, and the client is furious because there’s going to be major delays while Romulus and Remus sort it all out.

The client misses their sales goals for the quarter and starts looking for a new vendor to support their site.

Then site goes down again. This time, however, it’s totally not Romulus’s fault. Romulus and Remus scratch their heads, trying to figure out what happened. Calls to the hosting company show that there’s nothing wrong on their side with the server; they didn’t change anything. Calls to the client show that there have been no changes on their side in weeks. Romulus has been on vacation trying to recover from all of the stress of the last few weeks, so he knows that — this time anyway — he didn’t do anything. No one knows what’s going wrong. Thankfully, Romulus has a backup of the site from right before he went on vacation. The client is going to lose some content, but they just want their site back up.

It takes a couple of hours to transfer all the files back up to the server and everything is working like a champ. Two hours later it goes down again, and the hosting vendor is saying that some of the files look corrupted. So Romulus transfers all the files back up, and the site is back online… and then it’s down. After several days of the site see-sawing like this, Remus realizes that a hacker is injecting some ransomware onto the server using a bug in the outdated PHP framework that the site uses and they’ve had Romulus and Remus chasing their tails for a week.

The Promised Recap

While none of this has ever happened at Bōwst –most of it is dramatized (and somewhat hyperbolized) second or third-hand horror stories — the reality is most of that stuff happens. Regularly. It can be really easy to make a mistake and break something, and then it can be very difficult to figure out why things are broken — especially if you don’t have a process to guard against it.

So what do we do at Bōwst to prevent treating our clients like our poor fictionalized client? Well, a lot actually! But for the purposes of this post I’m going to focus on two major pieces: source control, and multi-environment hosting companies.

Source Control

What is source control? Most non-developers have probably either never heard of it or don’t know what it does. Source control allows one or more developers to work on a collection of code in a coordinated fashion. Everyone checks out their code from a central source, and as they work on it they make “commits” back to that central source. Those commits are just a way of integrating the changed bits back to the central source.

What does that get us? There are a ton of benefits to this approach, but let’s focus on three things. First, everyone knows where to go to get the code for the site. There’s never a disagreement about what the correct code is because it’s all right there in the repo. Second  is that it makes it easy for multiple developers to coordinate their changes. The horror show detailed above, where everyone was using different code, is less likely to happen because everyone is now sharing the same code through source control during development. Third, and most important, with source control there’s a log of every change made to the code. You can review the changes if something is broken, and revert back to non-broken states quickly. You also know who to talk to about the broken bits.

If you’re a not a developer, source control can feel really complicated and confusing. In our experience, however, it truly is a critical component of any well-maintained site.

Multi-Environment Hosting Companies

Source control would have made some of the drama above less severe, but it wouldn’t be enough to prevent all the disasters.

In order to have a well-maintained site, it is vital to have at least a test version and a live version of your site. (Ideally? You should have a development version as well.)

However, simply having these versions of your site isn’t enough. You also need to be able to use them in a coordinated fashion — that’s the super important bit. There are companies who have developed integrated systems to make this easy for developers. When we’re building Drupal sites we like to work with Acquia or Pantheon because they’ve put so much work into making it super simple to move changes from development to testing (where the client can review and approve changes), and then on to production (which is only updated with code that’s been approved). It’s literally as easy as a couple of clicks or click and dragging the code, files and/or database you want to each environment.

There’s a tremendous amount of goodness that goes into these platforms, including integrated source control and easily scheduled backups!

Romulus and Remus Revisited

To illustrate our point, let’s revisit poor Romulus and Remus, but this time let’s give them the right tools.

Romulus and Remus get started by having their new client add them to the existing team on Acquia, thereby giving them access both to the source repository as well as the development and staging instances of the site.

Romulus pulls the development site down using Acquia’s Dev Desktop and has a local development instance up and running in just a few minutes. That includes a local development copy of the database and any necessary files for the site; this rules out the likelihood of the first two problems that Romulus had with the database before and as an added level of protection Acquia supports both ad hoc and scheduled database backups.

Another benefit is that the integrated source repository lets Romulus and Remus be confident that they are working with the right code for the site. The code repository is what is deployed to the development, staging, and production servers through the Acquia Cloud web interface. There will be no confusion about whether they have the right versions of code.

With these tools, Romulus is able to start working efficiently within a few hours instead of a few weeks. He still hits the problem of the priority marketing change needing to be made, but instead of needing to manually remove his in-process work he simply commits it to a new branch in source control. After that he creates a new branch based on the production code, makes the priority change and has it up on staging for the client to review within a few minutes. The client reviews the change and Remus pushes the change to the production server with just a click and drag. No site downtime, no messy apologies. Just a really happy client.

For the site refresh project, the developers would be working together on the same branch in the source code repository. This allows everyone to integrate and test their work regularly, generally every day. Incompatibilities are caught early and resolved. Remus can watch over what everyone is committing and make sure that it makes sense. It’s also incredibly easy to see the changes live on the development server without anyone having to manually copy files around.

It should be noted that the Acquia tools themselves don’t prevent security problems like our hacker’s ransomware example. However, Acquia’s servers are configured and monitored for maximum security and have a process in place that makes sure Drupal security updates are tested, reviewed, and deployed quickly using the Acquia Cloud Workflow. This allows Romulus and Remus to head off a lot of problems. They can also fall back on being able to redeploy the site easily from the source repository and database backups when needed. Finally, Acquia’s tech support is only a phone call or email away should any extra assistance be required.

The Last Word

No one wants to describe a job as a horror story — not the development shop and certainly not the client.  That’s why when we talk to new clients, we advocate early for source repositories and multi-environment hosting companies with integrated workflow. It may sound more complicated than developing on the live site, but in the end it creates for a much better experience all around. Just ask Romulus and Remus.

More Thoughts

Need help with your next digital project?

Click on that button and fill in the simple form.