Categories
devops

Celebrating 10 Years in Cloud Engineering

The year was 2014 and I was working at Internet of Things startup Axeda (later acquired by PTC). I was building prototypes of Internet of Things solutions using the Axeda platform, and this particular one needed a UI, customer facing component. My manager suggested using an EC2 instance on AWS, which I had never tried before. I went through the flow of choosing an AMI, tweaking the knobs, and launching it. The invisible computer appeared!

That very year, the first annual State of DevOps report was released Nicole Forsgren, Gene Kim, Jez Humble and others. After the acquisition of Axeda, I applied for my first “DevOps Engineer” job and worked at Teradata automating the provisioning of flavors of Hadoop clusters across various clouds. Although SRE would later become known as the “implementation of class DevOps”, I always preferred to consider myself working in DevOps as that embodied to me the principle of bridging gaps between siloed developers and ops engineers.

At Acquia, I was able to mature my engineering experience to encompass Enterprise-grade practices. I worked with compliance concerns, went on call for the first time, and used Puppet to automate complex tasks. However at that time they did not offer a DevOps career track, so I transitioned into a full time DevOps role at Tomorrow.io (formerly ClimaCell).

It was at Tomorrow.io that I was able to find my sweet spot of working with Kubernetes on a daily basis, adopting Terraform and Helm as orchestration go-to tools. I used GitOps to ensure that the state of the deployment always matched the code, and hooked it all up to a Gitlabs pipeline for on demand releases.

The basis for my current work at Sensible Weather relies on the foundation laid by my combination of Enterprise and Start-up experience. I have the depth to understand how mature companies need to work, and the breadth to implement solutions tailored to the particular challenges of a growing org.

Ultimately, I am extremely grateful for the opportunity to enjoy meaningful work that remains fresh and interesting to this day.

Here’s to another ten years!

Categories
devops

Eliminating NaNs in PromQL Histogram Quantile

Working with Prometheus and PromQL can be tricky. I found it challenging to define an SLO using sloth-sli for a service whose histogram had a lot of NaN (Not a Number) values. I searched around for an out of the box solution and didn’t find what I was looking for.

The key to the solution was the knowledge that in PromQL an NaN doesn’t equal itself: NaN != NaN.

With this in mind I could use the “unless” operator to filter out any value that didn’t equal itself.

p50 Latency Error Query

histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le)) > 5 unless (histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le)) != histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket{job="$job", namespace="$namespace"}[{{.window}}])) by (le))) OR on() vector(0)



This query returns the 0.5 bucket values that are over 5, and if the values that it evaluates do not equal themselves (i.e., are NaNs), it turns those into nulls. The nulls are evaluated to false (while the NaNs are evaluated to true) and then OR’ed with the vector 0 to create a consistent graph with no gaps.

Hope this tidbit helps someone, SLOs are hard enough as it is without NaNs in your life!