Hi everyone, it’s currently 11:49 PM and I am writing this in hope that it finds its way to that person looking for a change. I wanted to share the initial set of resources that I used to transform my career from an SDET to an SRE. There is significantly more information that goes into the personal journey of someone’s career and I feel that an entire book could be written on the topic. A book that I am not destined to write, so these resources should suffice and serve as a reference to anyone willing to put in the time.

What is an SDET or SRE? Aren’t we all just engineers?

The SDET title stands for Software Design Engineer in Test and is sometimes used, but lately I have seen more job posts for QA Engineer, Test Automation Engineer, etc. The titles might vary but a high-level responsibility likely consists of someone that can write some code and champions the testing efforts across teams. There’s a great analogy in the book Explore It! by Elisabeth Hendrickson that shows a net and the holes that automation can’t cover. I like to picture SDET’s spending time exploratory testing in addition to writing tests in order to address the gaps of coverage.

The SRE title stands for Site Reliability Engineer. You can sometimes stumble across companies unintentionally mixing up the SRE and DevOps roles. The high-level SRE responsibility may consist of addressing infrastructure and operations problems as code, reducing toil, and sharing ownership of production issues with the product development team. Of course, this depends on the organization and I have learned that a title can quickly become an enigma.

However, I promise you that Liz Fong-Jones and Seth Vargo discuss this role in detail better than I ever could, please watch their video!

class SRE implements DevOps <3

Now that we got the titles and brief role descriptions out of the way, we can discuss if changing roles interests you. Since I am making the assumption that you’re already in the realm of testing we can make some further assumptions:

  • You are a curious individual and question everything
  • Perhaps you break things on purpose
  • You can take on the roles of users and apply them as needed
  • Context is a guiding light

How would we apply some of these traits to SRE related work?

  • Curiosity – Questioning why you’re doing the same task over and over e.g. creating users, toil
  • Fault injection – intentionally remove a node in a cluster
  • Personas – Taking on the role of an attacker and safeguarding the infrastructure of exposed secrets
  • Context – You understand that applying Kubernetes to every piece of infrastructure may not be cost effective

Some personal experience with some of the above is when I took on the persona of an attacker and used Nmap to scan a newly exposed service to look for vulnerabilities.

Another time is when we were writing some Infrastructure as Code, specifically terraform and my first instinct was to see if we could write tests for it. We would eventually settle on using Terratest despite the learning curve of using a new language.

The last example is when a sensitive tightly coupled service was deployed without a load balancer. I used Bug Magnet to poke around for some basic vulnerabilities and found an endpoint that responded to a suite of my HTTP POSTs. Fast forward several minutes and we are seeing performance degradation from our observability platform across multiple services.

These scenarios led to multiple process changes and our best efforts to pull security and testing into the inception of projects versus at the end of development.

Quality engineering resources to help you learn

If you want to dive deeper into determining if the role is right for you, read Krishelle Hardson Hurley’s post So You Want to Be an SRE. And after you’ve read that check out Alice Goldfuss’s thoughts on How to Get into SRE.

After diving a bit deeper into what’s involved in the SRE role, you can check out some of the resources that helped me in the order shown below. You might find that some of these books are too abstract, such as The Goal, and that’s perfectly okay, skip them and move onto what makes sense for you.

I also want to respect that everyone has a different amount of time that they can dedicate to learning. Using audiobooks, specifically Audible, has had a really positive impact on me during my commute and allowed me to cram more learning into my day. Using audiobooks might work for you too.

Books:

  • DevOps Handbook, Gene Kim, Jez Humble, Patrick Debois, and John Willes
  • A Practical Guide to Testing in DevOps, Katrina Clokie
  • Accelerate, Dr. Nicole Forsgren, Jez Humble, Gene Kim (editor’s note: Dr. Forsgren reads the Audible version herself, highly recommended!)
  • The Goal, Eliyahu M. Goldratt
  • The Phoenix Project, Gene Kim, Kevin Behr, and George Spafford
  • Learning the bash Shell, Cameron Newham
  • Foundations of Information Security: A Straightforward Introduction, Jason Andress
  • Continuous Delivery, Jez Humble and David Farley
  • Site Reliability Engineering, Betsy Beyer, Jennifer Petoff, Niall Richard Murphy, and Chris Jones
  • Practical Monitoring: Effective Strategies for the Real World, Mike Julian
  • Infrastructure as Code, Kief Morris
  • Cloud Native DevOps with Kubernetes, Justin Domingus, and John Arundel
  • Kubernetes in Action, Marko Luksa
  • Terraform Up & Running, Yevgeniy Brikman
  • Cybersecurity Ops with bash: Attack, Defend, and Analyze from the Command Line, Paul Troncone, and Carl Albing Ph.D.
  • Istio: Up & Running, Lee Calcote and Zack Butcher
  • Wizard Zines, Julia Evans

Podcasts:

  • Kubernetes Podcast from Google
  • The Podlets
  • Test Guild Performance
  • Test Guild Security
  • On Call Nightmares
  • Test & Code
  • O11y Cast
  • HashiCast

Interactive learning:

  • YouTube: Start here and navigate to freeCodeCamp, these are free and you can find incredibly helpful videos)
  • LinuxAcademy (now owned by A Cloud Guru)
  • KodeKloud (the design is reminiscent of CodeSchool circa 2015, very nostalgic for me)

Community:

  • Check out your local Meetups and find some positive people to learn with
  • Slack groups
  • Conferences (Shout-out to DevOpsDays conferences)

Public Cloud support tiers:

When you move into an Ops/SRE role, try and get approval for a support role on AWS, GCP, etc. If you hit a blocker and find yourself on a small team this becomes an invaluable resource.

Beyond resources:

Don’t stop learning, but don’t get stuck in tutorial hell. If you want to change roles, you don’t have to leave everything behind. I believe in you.

Feel free to reach out to me on Twitter if you find yourself stuck or want to share some awesome resources.

*This post was originally published on March 4, 2020 on the DevTestOps Community site.

Author Biography

Evan is an engineer with a wide range of test automation development, security, software quality, DevOps, and Salesforce.com experience. He organizes the Austin Automation Pros Meetup for everyone interested in learning. You can find him on Twitter and LinkedIn, and learn more from his blog.