Resiliency and Game Day Exercises at Acquia

In March of 2017 I came across the idea of “Game Day” in the DevOps Handbook by Gene Kim and others. Game Day is brilliantly advocated by Jesse Robbins in his presentation from 2011. It’s the idea that deliberately staging periodic system outages forces engineers to think about and design for resiliency in those systems. The extreme programming example is Chaos Monkey, which operates under only the one constraint that the outages should happen during working hours. Other than that, the outages caused by Chaos Monkey can happen anywhere in the system (even production!) and at any time.

Game Day is a step removed from Chaos Monkey, conceived of as a planned activity for engineers to resolve systemic outages. The resiliency exercises held at Acquia were yet another step away from the extreme towards the approachable. Our exercise included two activities, one geared for non-support engineers and the other for support. The non-support engineers had to bring back up a down site, and the support engineers had to attack and compromise an insecure site. The idea was to challenge engineers to step outside their comfort zone, and attempt to resolve technical challenges beyond the requirements of their every-day work.

The Team
The personalities involved in Game Day were a strong influence on the event. There’s Amin Astaneh, an Ops manager with the temperament of the proverbial town crier, faithfully and urgently supporting us in our DevOps transformation. Then there’s Apollo Clark, expert in secure systems who contributed the idea of doing a security vulnerability exercise. Finally there’s James Goin, seasoned Ops warrior relentlessly invested in the improvement of systems administration, including resiliency and disaster recovery training.

It just so happened that the idea for Game Day came two months in advance of Acquia’s annual engineering-wide event called Build Week, a truly awesome gathering of the entire team at Acquia HQ in Boston (read more on Dries’ blog!). Holding our Game Day at the same time would allow it to reach a broader audience across the company, so we requested a slot on the calendar. We ended up with 8-9pm on the Tuesday during Build Week. We had our opportunity!

Build Week imposed two constraints that had a significant and positive influence on our interpretation of Game Day. The whole event needed to fit in a single hour, and the event had to be accessible to engineers other than just the Ops subject matter experts. A Game Day exercise typically involves only the core engineering team which works directly with critical systems, and it takes however long they need to bring the systems back up. These constraints made the whole thing more approachable, and inspired the introduction of an Easy Mode and a Hard Mode.

Game Day as Exercise
The original idea was to have a trouble-shooting session with an Acquia development installation of a Drupal site (managed Enterprise-grade Drupal being the chief product of Acquia). The site would have some failure that either smaller teams or the whole group would have to resolve. Since we needed to accommodate varying levels and areas of expertise in the product, we settled on two “modes”, Easy Mode and Hard Mode, that participants would opt into based on their familiarity with troubleshooting techniques. The difference between the modes would only be in the level of difficulty. Easy Mode would be for those who don’t handle troubleshooting support calls as part of their regular day-job, Hard Mode for those who do.

The Identity Crisis
At this point, it hit home for me that the exercise was not going to be what I had originally intended – it wasn’t going to be a cookie-cutter Game Day. Although this seemed disappointing at the time, looking back it was a blessing in disguise, since it motivated us to create a new idea instead of copying someone else’s.

Apollo’s suggestion which we ended up following was to stage a Hard Mode Capture the Flag exercise instead of a site outage. Capture the Flag in a security context is an exercise where teams gain access to privileged resources in a system by leveraging security vulnerabilities. We could hide hashes – randomized strings of a fixed length – throughout the site. The winner of the competition would be the team that found all the hashes first.

The exercise would demonstrate that a site that works from a user perspective can still need work to become secure and performant. We would have Easy Mode to include some troubleshooting, which would then flow directly into the Capture the Flag exercise.

Trying It Out
We ran through the whole event a few weeks before Build Week. Easy Mode troubleshooting took up the first half hour, transitioning to Hard Mode Capture the Flag for the second half hour. This was pure thought experiment at this stage, and shockingly for me, it worked really really well.

During Easy Mode, non-Ops engineers drove the resolution with Ops experts only acting as consultants. Once the site was back up, we switched over to Capture the Flag. For this run through we only had one shared site for all the Hard Mode participants. One mischievous participant who found the site credentials deliberately locked out everyone else. This incident motivated much of the end-game setup for prevention of cross-site hacking.

Game Day!
Our Game Day-inspired exercise followed the flow established in our run through, with the addition of the isolated environments for Capture the Flag.

The Easy Mode troubleshooting took less time than we had allowed for, putting the start of Hard Mode right on time. The teams dove in, probing their environment – a Drupal site – for weaknesses. The narrative revolved around a fictional user submitting a question to the forum about how to enable the PHP module in Drupal, which would allow access to the bash shell on the server. The fictional admin replied that she had enabled the module for him, and reset his login to a “temporary password”. These were the credentials the participants were expected to use to hack the site. Since the user had access to the PHP module, they could also use it to gain shell access. Using this shell access to the server, they had easy access to the privileged resources and opportunities to discover the hashes.

When time ran out at 8:55, three of our twelve teams and forty participants had found all five of their hashes. The first team with all five hashes won the grand prize, an invitation for morning coffee with our resident tech celebrity, Drupal founder and Acquia CTO Dries Buytaert. As an aside, when I thanked Dries for agreeing to have coffee with our winners, he graciously replied, “No, thank you – now I get to have coffee!”

The decision to pivot from the established Game Day resulted in a new kind of learning in the spirit of Game Day. This learning was more accessible for our engineers and bridged the gap between where we are and where we are headed. While this isn’t the end of the story, I think it’s a fantastic start. Game Day, Day 2, here we come …

From Zero to DevOps through Minecraft

I am blessed with a pretty awesome family, and as part of that awesomeness it just so happens that my sister’s two kids have a passion for Minecraft.  Her oldest, Cyrus, is 9 years old and had on his own already gotten into the more technical aspects available in the PC version of Minecraft.  When I saw what he was doing with the basic functionality, I figured it would be fun for him to take it to the next level and add some DevOps-themed wizardry to his Minecraft chops.

Minecraft is Already a Learning Playground

It’s worth it to mention that I myself am passionate about Minecraft, both for my own enjoyment and also for its success as a teaching platform.  Minecraft gives kids a context from which they end up almost accidentally developing a massive variety of skills, many of them technically oriented.  Minecraft encourges resourcefulness, initiative, curiosity … it’s discovery through play, the very best kind of learning.

Here are just a few examples of what my sisters’ kids learned on their own through Minecraft without any adult intervention:

The six year old:

  • How to recognize words – Minecraft’s crafting heads up display includes the name of the tool above its picture
  • How to be constructive – when she first started playing, all she wanted to do was tear apart what her brother had built.  Eventually destruction became boring so she started her own creations
  • Fair play – you learn really quickly that what you can do to others they can – and will – do back to you

The nine year old:

  • Resourcefulness and Teaching Ability – When I first started playing Minecraft, it was not obvious to me why you had to seek out recipes for crafting objects.  To make a pickaxe, for example, you create sticks and combine them in a pattern with cobblestone.  I thought that this barrier would turn off beginners who wouldn’t want to have to figure out how to make things.  Then I realized that discovery of new crafting recipes is the very point of the game. 

    When Cyrus wants to build something in Minecraft, he searches the Minecraft wiki, looks for similar projects on Youtube, and reads relevant content.  By doing this, not only has he learned to educate himself, he has also learned to educate others effectively.  When he wants to show me how to do something in Minecraft, he starts at a simple example and then builds on that with increasingly complicated iterations. I’ve seen professionals on stage giving demos that haven’t learned this yet.

  • Rudimentary backup/source control – Cyrus discovered that by using the Minecraft console (accessible in the PC with the slash “\” character), he could quickly copy any structure he created to a different set of coordinates.  This gave him a way to “save” his work at a certain stage of development. He would build something complex, such as a castle, in a large rectangle until he was moderately satisfied. Then he would use a console command to copy what he had onto a separate area.  That gave him the freedom to experiment with the structure, and if he didn’t end up liking it, he could restore the earlier version.  Clearly not yet actual source control, but the fundamental idea is already there.
  • Programming logic gates – Minecraft has control blocks which can be placed next to each other in order to chain together their output.   This, in addition to the redstone wiring component, is advanced enough that whole functioning computers can be built within the virtual world of Minecraft. I still don’t really understand how to use these things – despite Cyrus patiently explaining them to me …

Taking it to the Next Level

I decided we needed a Minecraft server of our own so that the three of us could join up in our own private Minecraft world.  Prior to this Cyrus was playing on his own Minecraft world in single player mode, which does not allow for collaboration.  I’ll describe the steps I took to set up the server so that Cyrus could start administering it.

I started by renting a droplet from DigitalOcean and enabling it for Docker.  DigitalOcean does a good job of making this easy – all I had to do was check the box and it appeared. By default you get a password to connect to the droplet over SSH, so the first thing I did was set up passwordless SSH from my PC to the remote server.  I generated a default SSH public key with ssh-keygen, then copied it to an authorized_key file in my Linux user’s .ssh directory, making sure the permissions were correct. Then I logged out and back in, confirming that I could SSH in without typing a password.  

Next up was finding a decent Docker image for Minecraft.  Now I could have messed around with installing Java and fetching the Minecraft server jar to run straight from the machine, but Docker makes it ridiculously easy to fetch pre-configured environments … so why worry?

I pulled up Docker Hub and searched for Minecraft.  There were a few choices there, but also a clear winner – itzg/minecraft with over 500k pulls.  The thorough documentation on this image is first class effort on the part of the developer – a role model for the rest of us.

I went ahead and executed a docker run as described in the excellent instructions, and within a little over a minute had a fully functional Minecraft server.  I went back and tweaked the command a bit, ending up with this:

docker run -e EULA=TRUE -e MODE=creative -e 'JVM_OPTS=-Xmx1024M -Xms1024M' -v /home/sara/minecraft/data:/data -d -p 25565:25565 --name creative itzg/minecraft-server

I found the Java opts setting to be a definite necessity, as we experienced laggy play without it.

To make things easy, I also registered a free duckdns url so we could avoid having to type in the IP address all the time.

The Making of an Admin

At this point we were able to play multiplayer Minecraft by accessing the url as a server from the Minecraft client.  Success! However the whole point of this was for Cyrus to learn some fundamentals of how to manage a Minecraft server.  His first step was installing Cygwin on his Windows laptop. Next he added his PC’s public key to the remote server, following the same procedure I had gone through at the beginning.  Once that was set up, he was able to SSH in and run the command “docker start creative” and “docker stop creative” to start and stop the server.  

He also learned the basics of vim, enough to edit the file and configure the Minecraft world any way he wanted.  This was quite an achievement, since vim usage is one of the more esoteric arts in the world of Unix.    His first act as super administrator was to enable PvP mode, or player vs player, in which your in-game avatar can injure or kill other avatars.  This of course was the ultimate satisfaction – what better reward for your time and hard work than finding a new way to troll your younger sibling …

Containers to the Rescue

Cyrus wanted to understand the commands he was typing in, so I gave him an analogy for the technical explanation of what was happening.  I explained that we were using a Minecraft server in a Docker container, which I likened to a Twinkie in a plastic wrapper.  You could bake up a Twinkie yourself, or you could find one pre-made and packaged. Setting up a Minecraft server is like baking a Twinkie, while running a Docker container is getting the pre-packaged one. The analogy breaks down a bit since with this Twinkie, you never remove it from its plastic wrapper, but instead enjoy the cream filling from the outside without ever breaking open the box.

One thing I didn’t explain to Cyrus directly but that he did learn to appreciate in a practical way is Docker mapping.  The cream filling in this case is a volume and a port – the local directory in which to store the Minecraft world and the port on the DigitalOcean droplet that is accessible to the outside world (and us). By mapping a Docker data volume to a directory on the host machine and a Docker port to the host port, we were able to benefit from all the functionality of the application running inside the container from the outside (read more here).

At one point we had to destroy the Minecraft server so we could re-create it running in a different configuration.  Would Cyrus lose all the work he had put into the world? The answer was a big No! He found out that we could – and should – treat the server and the data it generated as separate components. Minecraft does a good job of maintaining this separation on its own, since the server writes the worlds each to its own folder. Docker gave us the ability to delete and recreate the server easily whenever we wanted, and by running the container mapped to a local volume, it wouldn’t affect his creations.

How Much DevOps?

In a short amount of time, Cyrus learned the fundamentals of maintaining his very own server.  Minecraft gave him an incentive and and a long term reward in exchange for this investment of effort.  Let’s ask the question though, how much DevOps did he actually learn?

The particularly DevOps-themed concepts here revolved around accessing a remote server on a cloud infrastructure, and making use of an application, complete with its own special snowflake environment, that some other person already dealt with for us.  We didn’t have to spend much time at all finding a host for the server or setting up the server itself.  Instead, we went straight to the fun part, playing Minecraft and watching him kill his sister (in -game of course) …

As a DevOps professional and enthusiast, it’s thrilling to me to have a member of the family who is so eager to investigate new technical skills and concepts.  I am also deeply impressed and in awe of the magnitude of his parents’ success in raising such inquisitive, bright, and motivated young people.  Finally, I have to hand it to Markus Persson (better known as Notch, creator of Minecraft) for conceiving and implementing a game that is incredibly fun to play, all while quietly preparing kids to be successful in the real world.  

Javascript: Finding the First Monday of the Month

Inspired by a PHP version of this found here, here is a Javascript function that takes the number of the month (0-11) and the year and returns the Date object of the first Monday in that month.  I find it useful for determining weeks in the month.

 // get first Monday of the month, useful for determining week durations

 // @param - integer: month - which month

 // @param - integer: year - which year

  function firstMonday (month, year){

 var d = new Date(year, month, 1, 0, 0, 0, 0)

 var day = 0

// check if first of the month is a Sunday, if so set date to the second

 if (d.getDay() == 0) {

 day = 2

 d = d.setDate(day)

 d = new Date(d)


// check if first of the month is a Monday, if so return the date, otherwise get to the Monday following the first of the month

 else if (d.getDay() != 1) {

 day = 9-(d.getDay())

 d = d.setDate(day)

 d = new Date(d)


 return d  


Please let me know if you find it useful.  Enjoy!

function firstMonday(month, year) { var d = new Date(year, month, 1, 0, 0, 0, 0) var day = 0 if (d.getDay() == 0) { day = 2 d = d.setDate(day) d = new Date(d) } else if (d.getDay() != 1) { day = 9-(d.getDay()) d = d.setDate(day) d = new Date(d) } return d.toString() } firstMonday(10, 2011)