Slight Reliability
Learning SRE, one day at a time.
Episodes
91 episodes
Slight Reliability Episode 89 - Blameless Post-mortems with Karanveer Anand
This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:🦅 The recent Crowdstrike outage and their public post-mortem🚑 When do we do a blameless post-mortem?😕 H...
•
Season 2
•
Episode 89
•
26:06
Slight Reliability Episode 88 - OpenTelemetry Revisited with Zach Michel
This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover:🌩️ Achieving observability in a SaaS world🥫 Context propagation ...
•
Season 2
•
Episode 88
•
26:51
Slight Reliability Episode 87 - Measuring the value of SRE with Artem Yakimenko
In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this...
•
Season 2
•
Episode 87
•
35:33
Slight Reliability Episode 86 - Evolving SLOs with Dom Finn
In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes ...
•
Season 2
•
Episode 86
•
25:57
Slight Reliability Episode 85 - Feeling SaaSsy
This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.You can find the Bleeding Tech blog on ...
•
Season 2
•
Episode 85
•
11:08
Slight Reliability Episode 84 - Clinical Troubleshooting with Dan Slimmon
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.You can find Dan's blog at https://blog.danslimmon.com/ or connect wi...
•
Season 2
•
Episode 84
•
27:40
Slight Reliability Episode 83 - An Unfulfilled Promise with Itiel Shwartz
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working i...
•
Season 2
•
Episode 83
•
30:32
Slight Reliability Episode 82 - CI/CD with Amin Astaneh
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observ...
•
Season 3
•
Episode 2
•
25:47
Slight Reliability Episode 81 - Incident Management in Non-Prod Environments
"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently?In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environme...
•
Season 3
•
Episode 1
•
10:09
Slight Reliability Episode 80 - What's Been Bugging Niall Murphy
This week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy.We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articula...
•
Season 2
•
Episode 80
•
36:45
Slight Reliability Episode 76 - Sampling Distributed Traces with Paige Cruz
Paige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there?You can check out Chronosphere's cloud native observability platform here:
•
Season 2
•
Episode 76
•
45:27
Slight Reliability Episode 79 - Incident Story Time with Valeska Victoria
This week Valeska Victoria returns to share some of her experiences working as an SRE at eBay.We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer li...
•
Season 2
•
Episode 79
•
37:51
Slight Reliability Episode 78 - Developer Experience with Ankit Jain
This week I chat with Ankit Jain from aviator.co about developer experience.We define developer experience and developer productivity, and how this applies to SRE. We discuss the growing expectation on developers and how this leads to f...
•
Season 2
•
Episode 78
•
32:21
December 2023 Update
A brief mid-week update on my changing circumstances and the future of the podcast.
•
Season 2
•
5:07
Slight Reliability Episode 77 - SRE to DevRel with Liz Fong-Jones
This week I had the privilege of interviewing Liz Fong-Jones from honeycomb.io about DevRel, Developer Advocacy, and how that applies to SRE.We discuss the difference between Developer Relations (DevRel) and Developer Advocacy, how Liz ...
•
Season 2
•
Episode 77
•
31:53
Slight Reliability Episode 75 - Enterprise SRE with Steve McGhee
This week I had the honour of chatting with Steve McGhee (former Google SRE, current Google Reliability Advocate, and co-author of Enterprise Roadmap to SRE).We discuss the evolution of SRE from where it began at Google and how it is be...
•
Season 2
•
Episode 75
•
39:00
Slight Reliability Episode 74 - The Hidden Side of Vendor Lock-In
This week on Slight Reliability Stephen discusses observability vendor lock-in. What is it? What does OpenTelemetry do to help? What areas are yet to be solved?You can find the official Slight Reliability podcast website at:
•
Season 2
•
Episode 74
•
8:55
Slight Reliability Episode 73 - Enterprise SLOs with Brian Singer
This week we sit down and talk about SLOs with CPO and co-founder of Nobl9 Brian Singer.We talk about the importance of reviewing operational effectiveness, getting buy in from leadership, using SLOs to reduce noise, how to implement SL...
•
Season 2
•
Episode 73
•
32:18
Slight Reliability Episode 72 - Rapid Incident Response with Valeska Victoria
This week Stephen chats with Valeska Victoria about her time working as an SRE at eBay.Valeska shares her data driven approach to SRE, having a voice as a less experienced engineer, handling incidents under high pressure, leveraging lar...
•
Season 2
•
Episode 72
•
42:19
Slight Reliability Episode 71 - Implementing SRE with Dr. Vlad Ukis
This week Stephen chats with Dr. Vlad Ukis about his journey discovering, and then implementing SRE practices at Siemens Healthineers (which led to him writing a book). They discuss how the evolution of infrastructure necessitates a shi...
•
Season 2
•
Episode 71
•
29:25
Slight Reliability Episode 70 - Meta SRE with Amin Astaneh
Amin Astaneh (from Certo Modo) is back to discuss his experience working as a production engineer (SRE equivalent) at Meta.Stephen and Amin discuss what it's like interviewing for big tech, "you build it, you own it", different SRE enga...
•
Season 2
•
Episode 70
•
42:24
Slight Reliability Episode 69 - Developer to SRE with Praveen Kasam
This week Stephen talks to Praveen Kasam from Diconium Digital Solutions about how he led SRE transformations.Praveen shares his experience transitioning from development to SRE and how leveraging automation and bringing application know...
•
Season 2
•
Episode 69
•
30:10
Slight Reliability Episode 68 - Dashboards and Modern Observability with Eric Schabell
This week Stephen asks Eric Schabell (Director of Technical Marketing & Evangelism @ Chronosphere) about how dashboards fit into modern observability.They discuss how untamed observability can lead to unexpectedly high cloud bills, ...
•
Season 2
•
Episode 68
•
32:31
Slight Reliability Episode 67 - Single Pane of Glass with Jamie Allen and Adam Kinniburgh
This week Stephen chats with Jamie Allen (Cheif Technologist AWS & SRE @ EPAM Systems) and Adam Kinniburgh (VP Innovation @ SquaredUp) about the concept of a single pane of glass (SPOG) for SRE.Is it performance art or something acti...
•
Season 2
•
Episode 67
•
34:36
Slight Reliability Episode 66 - Building Digital Assistants for SRE with Kyle Forster
This week Stephen brings back Kyle Forster from RunWhen to talk about the purple elephant in the room… “AI”. What makes it GenAI, LLM, Advanced Statistics, or ML? Kyle shares his experience surrounding building AI powered search engines...
•
Season 2
•
Episode 66
•
29:51