Category: devops

How to play Pokemon in VR Minecraft

Post author By sarajarjoura
Post date April 11, 2026

My young daughter is enjoying Minecraft, but somehow the vanilla game doesn’t cut it anymore. I am a gamer and a techie, which motivates me to find ways to spice up my own Minecraft experience while joining her in a game.

For her, the answer (on Java Minecraft at least) is to catch and train up Pokemon in the context of Minecraft. This makes the game way more fun for her, and makes it a more interesting challenge for me. Particularly since my client of choice is a Meta Quest 3, and I play Minecraft in VR via QuestCraft.

Not only that, but we had been utilizing Aternos for free servers, and while they are a solid choice for trying out Minecraft in multiplayer, they decidedly aren’t the right fit for a long term server. So I invested some time in finding a free solution that would be reliable and easy to maintain.

I resolved all of my pain points with the following architecture.

Server side:

A Kubernetes cluster on the Oracle free tier, with the itzg minecraft chart installed and modded with Fabric and Cobblemon, all applied using Terraform

Client side:

PC-side – Modrinth client for my daughter (playing Java edition Minecraft on an Ubuntu PC) also modded with Fabric and Cobblemon

Meta Quest (VR) – Sideloaded Questcraft with the patched Fabric installer

And for good measure, I also discovered how to battle wild Pokemon in VR with this mod https://modrinth.com/mod/quest-rebound (which needs this mod https://modrinth.com/mod/owo-lib as a dependency).

Want to try your hand at this yourself? Check out this three part tutorial series!

Create a Kubernetes Cluster on Oracle Cloud for free using Terraform

Create a Modded Minecraft Server on a hosted Kubernetes Cluster

Connecting to a Cobblemon Minecraft Java Server in VR

Create a Kubernetes Cluster on Oracle Cloud for free using Terraform

Post author By sarajarjoura
Post date April 11, 2026

Credit: I forked this repository https://github.com/nce/oci-free-cloud-k8s in order to create this infrastructure on Oracle Cloud. Thank you to https://github.com/nce for the awesome Terraform bootstrap!

Parent Article: How to play Pokemon in VR Minecraft

If you are starting from scratch with Terraform on Oracle, I recommend you check out their tutorial series https://developer.hashicorp.com/terraform/tutorials/oci-get-started to get familiar with the concepts.

Then again, if you are already familiar with the concept of Infrastructure as Code you may be able to dive right in. Up to you!

Disclaimer: This infrastructure utilizes a Pay-As-You-Go account setup. Only follow this guide if you are ok with the possibility of incurring costs on Oracle. This setup is completely free, but I won’t be held responsible if you end up with a bill when something went sideways. I had to pay about a dollar to Oracle while exploring and learning how this worked.

Check out my repository https://github.com/saranicole/minecraft-oci-k8s with git. You will be making use of the infra directory in this guide, so change into that in your terminal:

git clone https://github.com/saranicole/minecraft-oci-k8s.git
cd oci-free-cloud-k8s/terraform/infra

If you haven’t registered for a Pay as You Go account on Oracle, do that now. Sign up at https://signup.cloud.oracle.com and add a payment method to convert it to Pay as You Go. Without doing this payment method step, your resources will never get created (the terraform will hang). Make sure you have the oci command line tool installed and authenticated as well – see https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm .

Manually create a single bucket in Oracle Cloud to hold your Terraform state – see https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/managingbuckets_topic-To_create_a_bucket.htm . Name it “terraform-states” or similar – this name will need to match the “bucket” parameter in backend.conf . Note the “Namespace” in the bucket details – you will need that for you backend.conf file.

You will need to create a few files to make this work.

In the infra directory:

backend.conf

general.auto.tfvars

Example backend.conf ( replace <<Storage Namespace>> with the namespace you retrieved when you created the bucket ):

namespace = "<<Storage Namespace>>"
bucket    = "terraform-states"
key       = "infra/cluster.tfstate"

Example general.auto.tfvars

compartment_id = << Tenancy OCID >>
region = << Region >>
ssh_public_key = << SSH Public Key >>
kubernetes_version = "v1.35.0"
user_ocid = << User OCID >>
fingerprint = << API Key fingerprint >>
private_key_path = << Path to API Key Secret File >>
bucket_namespace = << Storage Namespace >>
enable_minecraft_port = true

Parameters Explanation:
compartment_id – this is your tenancy ID. Retrieve it by clicking the profile dropdown, selecting “Tenancy”, then copying the first “OCID” value.

region – the region where you provisioned your account, this should be reasonably closest to you. See https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm for a list of available choices. Try to avoid going with Ashburn since it is the most popular and has a chance of not being able to provision resources

ssh_public_key – This is the public (non-secret) component of an API Key. See https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm for an explanation of API Key authentication. Create one at https://cloud.oracle.com/identity/domains/my-profile/auth-tokens and make sure to save the private key so that you can authenticate to Terraform using the oci command line tool

kubernetes_version – This is the Kubernetes Cluster version. See https://kubernetes.io/releases/ for the available versions. v1.35.0 was the latest at the time of writing.

user_ocid – This is the OCID under “User Settings”. Copy the value from https://cloud.oracle.com/identity/domains/my-profile/details

fingerprint – This is the fingerprint given to you when you created the API Key

private_key_path – this is the file path to the location where you saved the private key part of the API Key. Something like $HOME/my_secret_key.pem . Note this is not an SSH private key

bucket_namespace – this is the Storage Namespace again

enable_minecraft_port – set to true if you plan on using this as a Minecraft server. This will open up port 25565. If you want this for some other purpose, keep this turned off using false .

Run the init first to confirm that Terraform is working and make sure to inspect everything it will create.

terraform init -backend-config=backend.conf

Then run the plan

terraform plan

It should output something like this:

data.oci_containerengine_node_pool_option.node_pool_options: Reading...
module.vcn.data.oci_core_services.all_oci_services[0]: Reading...
data.oci_identity_availability_domains.ads: Reading...
oci_identity_tag_namespace.k8s_node_pool: Refreshing state...

...

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # oci_core_security_list.private_subnet_sl will be created

... yadda yadda

If there are any errors due to missing information, make sure your files are all filled out correctly.

Once the plan succeeds with no errors, you can apply the Terraform.

terraform apply

Enter “yes” when it prompts you.

It should finish like this:

Apply complete! Resources: xx added, 0 changed, 0 destroyed.

It will also create a .kube.config file that you will need for further provisioning and to access the cluster with kubectl.

After everything is running for a day, make sure to visit https://cloud.oracle.com/account-management/cost-analysis to make sure everything is truly running without any costs, or that any cost you did get is within your expectation.

Congratulations! You now have a Kubernetes cluster running on Oracle Cloud. Check out the next post in this series to install Minecraft on this cluster.

Create a Modded Minecraft Server on a hosted Kubernetes Cluster

devops minecraft

Create a Modded Minecraft Server on a hosted Kubernetes Cluster

Post author By sarajarjoura
Post date April 11, 2026

Previous Post: Create a Kubernetes Cluster on Oracle Cloud for free using Terraform

Credit: This guy is the man, https://github.com/itzg . He maintains https://github.com/itzg/minecraft-server-charts/tree/master/charts/minecraft which is just an epic Minecraft Helm chart.

If you already have a Kubernetes cluster that’s perfect, if not you can follow the previous post to get one for free on Oracle Cloud.

If you are not using Oracle cloud you will need to modify the minecraft/_terraform.tf file slightly to use whatever storage backend you want for the terraform state file.

So change:

backend "oci" {}

to something like

backend "s3" {}

Or whatever your preference might be. See https://developer.hashicorp.com/terraform/language/backend if you need guidance on how to choose and what it means.

Check out the repository https://github.com/saranicole/minecraft-oci-k8s with git. We will be using the minecraft directory.

git clone https://github.com/saranicole/minecraft-oci-k8s.git
cd oci-free-cloud-k8s/terraform/minecraft

You will need to create a backend.conf file, a general.auto.tfvars, a minecraft.auto.tfvars file and possibly an oracle.auto.tfvars file. Depending on how you want this to end up you will put different contents into these tfvars files.

backend.conf

namespace = "<<Storage Namespace>>"
bucket    = "terraform-states"
key       = "infra/minecraft.tfstate"

general.auto.tfvars

chart_version = "5.1.2"

If you are using Oracle Cloud, create the oracle.auto.tfvars file.

oracle.auto.tfars

compartment_id = << Tenancy OCID >>
region = << Region >>
ssh_public_key = << SSH Public Key >>
user_ocid = << User OCID >>
fingerprint = << API Key fingerprint >>
private_key_path = << Path to API Key Secret File >>
bucket_namespace = << Storage Namespace >>

If you want a vanilla Minecraft server, use the structure below and tweak as needed (making sure to change accept_eula to TRUE):

# Change this to TRUE
accept_eula = "FALSE"
world_version = "26.1.1"
server_type = "VANILLA"
fabric_loader_version = "0.18.6"
difficulty = "easy"
whitelist = "<< YOUR MINECRAFT USERNAME >>,<< SOME OTHER MINECRAFT USERNAME >>"
ops = "<< YOUR MINECRAFT USERNAME >>,<< SOME OTHER MINECRAFT USERNAME >>"
motd = "Welcome to Minecraft on Kubernetes!"
mod_urls = [
  "<< MODRINTH JAR DIRECT DOWNLOAD URL >>",
  "<< SOME OTHER JAR DIRECT DOWNLOAD URL >>"
]
download_world_url = "<< DIRECT DOWNLOAD LINK OF ZIP FILE CONTAINING WORLD DATA >>"
rclone_dest_dir = "<< OBJECT STORAGE BUCKET SUCH AS S3 OR OCI BUCKET NAME >>/<< OBJECT STORAGE BUCKET PREFIX SUCH AS 'worlds' >>"
enable_oci_backup_bucket = false
enable_oci_load_balancer = false
rcon_password = "<< SUPER SECRET PASSWORD >>"

If you want Cobblemon, particularly cobblemon on Meta Quest, you will need to use contents like the following:

minecraft.auto.tfvars

# Change this to TRUE
accept_eula = "FALSE"
world_version = "1.21.1"
server_type = "FABRIC"
fabric_loader_version = "0.18.6"
difficulty = "easy"
whitelist = "<< YOUR MINECRAFT USERNAME >>,<< SOME OTHER MINECRAFT USERNAME >>"
ops = "<< YOUR MINECRAFT USERNAME >>,<< SOME OTHER MINECRAFT USERNAME >>"
motd = "Welcome to Cobblemon on Kubernetes!"
mod_urls = [
"https://cdn.modrinth.com/data/MdwFAVRL/versions/Ygf8KJFC/Cobblemon-fabric-1.7.0%2B1.21.1.jar",
"https://cdn.modrinth.com/data/P7dR8mSH/versions/yGAe1owa/fabric-api-0.116.9%2B1.21.1.jar"
]
download_world_url = "<< DIRECT DOWNLOAD LINK OF ZIP FILE CONTAINING WORLD DATA >>"
rclone_dest_dir = "<< OBJECT STORAGE BUCKET SUCH AS S3 OR OCI BUCKET NAME >>/<< OBJECT STORAGE BUCKET PREFIX SUCH AS 'worlds' >>"
enable_oci_backup_bucket = false
enable_oci_load_balancer = false
rcon_password = "<< SUPER SECRET PASSWORD >>"

If you are using Oracle, you will want to change both enable_oci_backup_bucket and enable_oci_load_balancer to true.

In the case of Oracle the rclone_dest_dir will be “minecraft-backups/worlds”.

Run the init first to confirm that Terraform is working and make sure to inspect everything it will create.

terraform init -backend-config=backend.conf

Then run the plan

terraform plan

If there are any errors due to missing information, make sure your files are all filled out correctly.

Once the plan succeeds with no errors, you can apply the Terraform.

terraform apply

Enter “yes” when it prompts you.

It should finish like this:

Apply complete! Resources: xx added, 0 changed, 0 destroyed.

At the end you will see:

Outputs:
  + my_minecraft_server_ip = "xxx.xxx.xxx.xxx"

This is your Minecraft server ip address! Connect to port 25565 with this directly, or create a friendly DNS name in your preferred way.

If connecting with the ip address you can type in these numbers in this format:

xxx.xxx.xxx.xxx:25565

I recommend https://www.duckdns.org/ as a convenient way to give your Minecraft server a friendly url.

Way to go! You now have a Minecraft server in your preferred configuration set up on your Kubernetes cluster.

Interested in more? Take a look at the last blog post in this series:

Connecting to a Cobblemon Minecraft Java Server in VR

devops

Celebrating 10 Years in Cloud Engineering

Post author By sarajarjoura
Post date January 15, 2024

The year was 2014 and I was working at Internet of Things startup Axeda (later acquired by PTC). I was building prototypes of Internet of Things solutions using the Axeda platform, and this particular one needed a UI, customer facing component. My manager suggested using an EC2 instance on AWS, which I had never tried before. I went through the flow of choosing an AMI, tweaking the knobs, and launching it. The invisible computer appeared!

That very year, the first annual State of DevOps report was released Nicole Forsgren, Gene Kim, Jez Humble and others. After the acquisition of Axeda, I applied for my first “DevOps Engineer” job and worked at Teradata automating the provisioning of flavors of Hadoop clusters across various clouds. Although SRE would later become known as the “implementation of class DevOps”, I always preferred to consider myself working in DevOps as that embodied to me the principle of bridging gaps between siloed developers and ops engineers.

At Acquia, I was able to mature my engineering experience to encompass Enterprise-grade practices. I worked with compliance concerns, went on call for the first time, and used Puppet to automate complex tasks. However at that time they did not offer a DevOps career track, so I transitioned into a full time DevOps role at Tomorrow.io (formerly ClimaCell).

It was at Tomorrow.io that I was able to find my sweet spot of working with Kubernetes on a daily basis, adopting Terraform and Helm as orchestration go-to tools. I used GitOps to ensure that the state of the deployment always matched the code, and hooked it all up to a Gitlabs pipeline for on demand releases.

The basis for my current work at Sensible Weather relies on the foundation laid by my combination of Enterprise and Start-up experience. I have the depth to understand how mature companies need to work, and the breadth to implement solutions tailored to the particular challenges of a growing org.

Ultimately, I am extremely grateful for the opportunity to enjoy meaningful work that remains fresh and interesting to this day.

Here’s to another ten years!

Tags devops

devops

Eliminating NaNs in PromQL Histogram Quantile

Post author By sarajarjoura
Post date June 23, 2023

Working with Prometheus and PromQL can be tricky. I found it challenging to define an SLO using sloth-sli for a service whose histogram had a lot of NaN (Not a Number) values. I searched around for an out of the box solution and didn’t find what I was looking for.

The key to the solution was the knowledge that in PromQL an NaN doesn’t equal itself: NaN != NaN.

With this in mind I could use the “unless” operator to filter out any value that didn’t equal itself.

p50 Latency Error Query

histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le)) > 5 unless (histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le)) != histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le))) OR on() vector(0)

This query returns the 0.5 bucket values that are over 5, and if the values that it evaluates do not equal themselves (i.e., are NaNs), it turns those into nulls. The nulls are evaluated to false (while the NaNs are evaluated to true) and then OR’ed with the vector 0 to create a consistent graph with no gaps.

Hope this tidbit helps someone, SLOs are hard enough as it is without NaNs in your life!

Tags devops

devops

Startup Life with DevOps at Sensible Weather

Post author By sarajarjoura
Post date May 10, 2022

During my time so far at Sensible Weather, I’ve built an infrastructure pipeline which moves code from Github to an EKS cluster via Argo CD, a project lovingly named Margaritaville, since once you’re done with it you can go sip margaritas … 🙂 This project leveraged Terraform in Github Actions to deploy Helm applications onto the cluster. The only gap here was the management of secrets, which I initially deployed using Velero as a stopgap. Velero is a powerful tool which allows you to back up Kubernetes resources from a cluster and then restore them either to the same or a different cluster. Using it for secrets management didn’t make sense in the long term however, so as soon as I was able I migrated us to the External Secrets project, which enables the syncing of Kubernetes secrets to an external secrets provider. In this case we used AWS Parameter Store since we didn’t need cross-region or cross-account secrets sync ( and hey, free ).

Once this was all set up we needed centralized metrics, logging, and tracing, so I made use of the excellent Grafana open source LGTM stack . I used Pomerium to provide SSO for these tools which I covered in a post featured on Sensible Weather’s official blog.

Next steps – setting up an official on call with integrations into our comms. #StartupLife !

devops

DevOps at ClimaCell

Post author By sarajarjoura
Post date January 24, 2021

I joined ClimaCell as a Senior DevOps Engineer in January. I love it! I am working in Terraform spooling up Kubernetes infrastructure as well as using Helm to bootstrap applications.

JAJ is learning her letters and walking around opening doors wherever she can. Super proud of her.

devops scrum

Scrum Master Class

Post author By sarajarjoura
Post date December 8, 2017

I have a new mission at Acquia – I am taking over as Scrum Master for my team. Bwahahahaha …

I’ve been doing Scrum ever since I started in tech full time at Axeda. Stand-ups and sprints are all old hat, but for the first time thanks to Acquia, I have received formal training in Scrum and a certification from Scrum Inc in Boston.

Effectiveness
How does Scrum help? Why do it?

You use Scrum because you want to improve team velocity without increasing team resources.

I liked the fact that the Scrum Master class itself was organized as a Sprint using Scrum. This let us “live the example” and see first-hand the effectiveness of the techniques.

History
What is Scrum? What is its relationship to Lean and Agile?

Scrum is different from Lean and Agile but it derives from both. It came about as an evolution in workflow process based on techniques pioneered at Toyota. Scrum is an adaptation of manufacturing floor process to software engineering.
Lean

Eliminate waste
Understand Value Stream Analysis
Implement Single Piece Continuous Flow

Agile competition – rapid prototyping and fail fast mentality, permit the customer to determine what the product will be jointly with the producer

Requirements
What do you have to have and do in order to be doing Scrum authentically?

In order to do authentic Scrum, you must satisfy these requirements – these particular 3 artifacts, 5 events, and 3 roles as part of the team’s process.

3 Artifacts

Product Backlog – vision, priorities
Sprint Backlog – known work, capacity
Product Increment – scrum board, burndown, velocity

5 Events

Backlog Refinement
Sprint Planning
Daily Scrum
Sprint Review
Retrospective

3 Roles

Product Owner
Scrum Master
Team

Values
What do these workflow philosophies consider worthwhile?

Scrum inherits values from Agile. The five values of scrum are focus, courage, openness, commitment and respect.
Agile Manifesto

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Happiness is important because it is a precursor to great team performance and high velocity. Team velocity will predictably go down after happiness does. Scrum practice works to improve happiness by relying on intrinsic motivators over external ones – purpose, mastery, and autonomy over money, power and status. The workplace should ensure the external rewards are present to the extent that they are no longer preoccupying thought, but the flow of the team should emphasize those internal motivators over the external ones.

Best Practices
What are the best practices included in Scrum that help the teams that adopt them?

There were numerous best practices discussed during the class. The first one that I’ll be introducing to my team is the observance of strict scrum rules during our standup.

During our standup, we will kick it off with a short inspirational music clip and physically stand up to report our answers to “What did you do yesterday”, “What will you do today” and “What are your impediments if any.”

Since we are a remote team doing our standup over a call, we will determine the order of statuses according to the order that the team members join the call. Since not everyone joins right at 10:30, the standup will start at 10:35, and the first five minutes will be used for team sync and/or a single ticket triage. At 10:35 the alarm goes off, and the team gets up for standup! Post scrums optional, anyone with a post scrum says who they need and everyone else should drop off.

One further thing I learned that I will mention is about measuring velocity. While you can’t compare velocity across teams, you can compare acceleration. Subtle but important.

Terminology
What Scrum terminology provides useful ideas?

Kaizen (evolution) and Kaikaku ( revolution).Kaikaku means the philosophy of challenging entrenched dogma, refusing to accept waste no matter how it is disguised

Forms of waste

Muda – work in progress but not finished
Mura – inconsistency
Muri – unreasonable demands

Types of Waste and their Mnemonic
DOWNTIME – Defects, Overproduction, Waiting, Not utilizing talent, Transportation, Inventory, Movement, Extra processing

“Don’t take my word for it” – any change we make should have a follow up to ensure that the data proves the change is helping team velocity

JJ Sutherland, Alex Sheive, Sara Jarjoura

All in all, I had a fantastic time and learned a ton. I am stoked to bring this to the team and see how we do.

Sara Jarjoura-ScrumAlliance_CSM_Certificate — Certified Scrum Master

code devops geek

Resiliency and Game Day Exercises at Acquia

Post author By sarajarjoura
Post date May 30, 2017

In March of 2017 I came across the idea of “Game Day” in the DevOps Handbook by Gene Kim and others. Game Day is brilliantly advocated by Jesse Robbins in his presentation from 2011. It’s the idea that deliberately staging periodic system outages forces engineers to think about and design for resiliency in those systems. The extreme programming example is Chaos Monkey, which operates under only the one constraint that the outages should happen during working hours. Other than that, the outages caused by Chaos Monkey can happen anywhere in the system (even production!) and at any time.

Game Day is a step removed from Chaos Monkey, conceived of as a planned activity for engineers to resolve systemic outages. The resiliency exercises held at Acquia were yet another step away from the extreme towards the approachable. Our exercise included two activities, one geared for non-support engineers and the other for support. The non-support engineers had to bring back up a down site, and the support engineers had to attack and compromise an insecure site. The idea was to challenge engineers to step outside their comfort zone, and attempt to resolve technical challenges beyond the requirements of their every-day work.

The Team
The personalities involved in Game Day were a strong influence on the event. There’s Amin Astaneh, an Ops manager with the temperament of the proverbial town crier, faithfully and urgently supporting us in our DevOps transformation. Then there’s Apollo Clark, expert in secure systems who contributed the idea of doing a security vulnerability exercise. Finally there’s James Goin, seasoned Ops warrior relentlessly invested in the improvement of systems administration, including resiliency and disaster recovery training.

Constraints
It just so happened that the idea for Game Day came two months in advance of Acquia’s annual engineering-wide event called Build Week, a truly awesome gathering of the entire team at Acquia HQ in Boston (read more on Dries’ blog!). Holding our Game Day at the same time would allow it to reach a broader audience across the company, so we requested a slot on the calendar. We ended up with 8-9pm on the Tuesday during Build Week. We had our opportunity!

Build Week imposed two constraints that had a significant and positive influence on our interpretation of Game Day. The whole event needed to fit in a single hour, and the event had to be accessible to engineers other than just the Ops subject matter experts. A Game Day exercise typically involves only the core engineering team which works directly with critical systems, and it takes however long they need to bring the systems back up. These constraints made the whole thing more approachable, and inspired the introduction of an Easy Mode and a Hard Mode.

Game Day as Exercise
The original idea was to have a trouble-shooting session with an Acquia development installation of a Drupal site (managed Enterprise-grade Drupal being the chief product of Acquia). The site would have some failure that either smaller teams or the whole group would have to resolve. Since we needed to accommodate varying levels and areas of expertise in the product, we settled on two “modes”, Easy Mode and Hard Mode, that participants would opt into based on their familiarity with troubleshooting techniques. The difference between the modes would only be in the level of difficulty. Easy Mode would be for those who don’t handle troubleshooting support calls as part of their regular day-job, Hard Mode for those who do.

The Identity Crisis
At this point, it hit home for me that the exercise was not going to be what I had originally intended – it wasn’t going to be a cookie-cutter Game Day. Although this seemed disappointing at the time, looking back it was a blessing in disguise, since it motivated us to create a new idea instead of copying someone else’s.

Apollo’s suggestion which we ended up following was to stage a Hard Mode Capture the Flag exercise instead of a site outage. Capture the Flag in a security context is an exercise where teams gain access to privileged resources in a system by leveraging security vulnerabilities. We could hide hashes – randomized strings of a fixed length – throughout the site. The winner of the competition would be the team that found all the hashes first.

The exercise would demonstrate that a site that works from a user perspective can still need work to become secure and performant. We would have Easy Mode to include some troubleshooting, which would then flow directly into the Capture the Flag exercise.

Trying It Out
We ran through the whole event a few weeks before Build Week. Easy Mode troubleshooting took up the first half hour, transitioning to Hard Mode Capture the Flag for the second half hour. This was pure thought experiment at this stage, and shockingly for me, it worked really really well.

During Easy Mode, non-Ops engineers drove the resolution with Ops experts only acting as consultants. Once the site was back up, we switched over to Capture the Flag. For this run through we only had one shared site for all the Hard Mode participants. One mischievous participant who found the site credentials deliberately locked out everyone else. This incident motivated much of the end-game setup for prevention of cross-site hacking.

Game Day!
Our Game Day-inspired exercise followed the flow established in our run through, with the addition of the isolated environments for Capture the Flag.

The Easy Mode troubleshooting took less time than we had allowed for, putting the start of Hard Mode right on time. The teams dove in, probing their environment – a Drupal site – for weaknesses. The narrative revolved around a fictional user submitting a question to the forum about how to enable the PHP module in Drupal, which would allow access to the bash shell on the server. The fictional admin replied that she had enabled the module for him, and reset his login to a “temporary password”. These were the credentials the participants were expected to use to hack the site. Since the user had access to the PHP module, they could also use it to gain shell access. Using this shell access to the server, they had easy access to the privileged resources and opportunities to discover the hashes.

When time ran out at 8:55, three of our twelve teams and forty participants had found all five of their hashes. The first team with all five hashes won the grand prize, an invitation for morning coffee with our resident tech celebrity, Drupal founder and Acquia CTO Dries Buytaert. As an aside, when I thanked Dries for agreeing to have coffee with our winners, he graciously replied, “No, thank you – now I get to have coffee!”

Epilogue
The decision to pivot from the established Game Day resulted in a new kind of learning in the spirit of Game Day. This learning was more accessible for our engineers and bridged the gap between where we are and where we are headed. While this isn’t the end of the story, I think it’s a fantastic start. Game Day, Day 2, here we come …