O bservability Eng ineering Charity Majors, Liz Fong-Jones & George Miranda with Austin Parker Observability Engineering Achieving Production Excellence 2nd Edition
9 7 9 8 3 4 1 6 0 8 4 1 2 ISBN: 979-8-341-60841-2 DEVOP S / SYSTEMS ENGINEERING Production is not what happens after development ends; it’s a stage of development, and the most important one. The only way to move swiftly with confidence is by building observability into every step, at every stage, and by validating your changes along the way. As AI continues to radically reshape how software gets built, that need has never been more urgent. In this second edition, which is twice as long as the original and mostly new or rewritten, authors Charity Majors, Liz Fong-Jones, George Miranda, and Austin Parker make the case for observability as a foundational practice of software development. This is a book for builders and technical decision-makers. Ten new chapters for staff+ engineers, architects, and technical leaders cover the strategic and organizational side. Four more chapters tackle LLMs and AI agents, frontend observability, and open source tooling. • Learn how to develop with observability, whether you’re using AI or not • Implement modern observability practices across your organization • Make the business case and navigate build versus buy decisions • Maximize the cost-effectiveness of your observability tooling • Find answers fast when reliability is on the line Charity Majors is the cofounder and CTO of honeycomb.io and writes about software, systems, and management at charity.wtf. Liz Fong-Jones is a technical fellow at honeycomb.io, where she’s an advocate for the SRE and observability communities. George Miranda is VP of marketing at InsightFinder AI. He has a strong background in systems engineering, developer relations, and marketing. Austin Parker is director of AI strategy at honeycomb.io and serves on the OpenTelemetry Governance Committee. Observability Engineering “This is as close to a guide to the software future we’re rushing toward as you’ll get in a book without ‘AI’ in the title. It’s authentic, authoritative, and literally covers everything—past, present, and future. Recommended.” Niall Murphy, CEO, Site Reliability Engineering Limited “As AI accelerates software creation, production becomes the only place where intent meets reality. This book shows how to build the feedback loops that make that reality legible.” Chad Fowler, general partner & CTO, BlueYard Capital
Observability for what comes next_ Honeycomb gives you all the context, speed and scale you need to understand and debug your code at the pace of AI. START FOR FREE How are my frontend services performing? Let’s investigate
(This page has no text content)
Praise for Observability Engineering, Second Edition Most enterprises still mistake observability for a monitoring dashboard. This book proves it is a production intelligence layer. By tying telemetry to business outcomes, it explains why the dashboards stay green while the mysteries become folklore, and helps those who sign the checks understand exactly what they’re paying for. —Rick Clark, Global Head of Cloud Advisory, UST The best engineers understand what’s actually happening in production, not just the code they wrote. This matters more than ever in an AI-driven world, and this book is a treasure trove of guidance on how. —Darragh Curran, CTO, Fin Observability gives our systems a voice, and as we hand the building over to agents, that signal from production becomes the truest context for making sense of our creations. This book teaches it completely and humanely, from first principles up. —Annie Vella, Distinguished Engineer, Westpac New Zealand This book! I feel seen. Observability Engineering is packed with battle-tested tools and methodologies for building and implementing an effective observability program. If it stopped there, it would be a great reference tome, but the last section—focusing on the socio-technical barriers to implementing a clear-eyed observability program in existing organizations—is pure gold. Any SRE or Principal Engineer suffering in a broken, siloed, or observability-starved organization will recognize the patterns Charity et al. describe. When you are living it, it can be difficult to understand what is going on. This book names the problem and gives you practical advice on how to face it together! —Todd Beckett, Principal Software Engineer, Microsoft
Reliability of complex sociotechnical systems is as much a cultural movement as it is a systemic one. This book captures an accurate picture of the path to building solid systems, with teams that own the accountability. —Alex Ewerlöf, CTO, Service Level Toolbox Observability Engineering is audacious: it tackles the entire observability space. No matter what part of the tech stack you focus on, or where in the org chart you sit, you’ll find value here. —Lorin Hochstein, Staff Software Engineer—Reliability This book was already a great reference for what observability is and how to implement it. The second edition adds new chapters that dig deeper into the business case for observability, and the organizational changes that will allow you to be successful. Love it! —Sarah Wells, technical engineering leader and author of Enabling Microservice Success “The rise of LLMs has been polarizing: some (loudly!) proclaim the End of Software Engineering, while others dismiss them as slop generators. The truth lies somewhere in the middle. LLMs are an extraordinarily powerful tool, but one that must be wielded carefully. That care requires observability of our software systems. This timely second edition provides an expanded, updated foundation and also shows how LLMs can be leveraged to enhance our systems for observability. The future is weirder than ever, but with the guidance in this book, it needn’t be opaque!” —Bryan Cantrill, cofounder and CTO, Oxide Computer Company
Charity Majors, Liz Fong-Jones, and George Miranda with Austin Parker Observability Engineering Achieving Production Excellence SECOND EDITION
979-8-341-60841-2 [LSI] Observability Engineering by Charity Majors, Liz Fong-Jones, and George Miranda, with Austin Parker Copyright © 2026 Hound Technology, Inc. All rights reserved. Published by O’Reilly Media, Inc., 141 Stony Circle, Suite 195, Santa Rosa, CA 95401. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Louise Corrigan Development Editor: Rita Fernando Production Editor: Beth Kelly Copyeditor: Piper Content Partners Proofreader: Sonia Saruba Indexer: nSight, Inc. Cover Designer: Susan Brown Cover Illustrator: Karen Montgomery Interior Designer: David Futato Interior Illustrator: Kate Dullea May 2022: First Edition June 2026: Second Edition Revision History for the Second Edition 2026-06-16: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098179922 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Observability Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Honeycomb. See our statement of editorial independence.
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv Part I. Introduction to Observability 1. What Is Observability?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Origins of Observability 4 Applying Observability to Software Systems 4 Properties of Software Dependability 5 Observability Is a Property of Dependable Software 6 How Observable Is Your Software? 7 Two Competing Models for Telemetry: Three Pillars Versus Unified Data 9 The Three Pillars Model 9 The Unified Storage Model 12 Observability Is the Validation of Developer Intent 16 The Agentic Incursion Has Just Begun 16 Guardrails Are Having a Moment 17 Conclusion 18 2. How Code Crosses Over: Validating Developer Intent in Production. . . . . . . . . . . 19 What Makes Software Good? 20 Good Software Serves Its Purpose 20 Good Software Delivers Efficiently Over Time 20 The Maintenance Horizon: Disposable Code Versus Durable Code 21 Durable Code Is a Model for Mutating and Maintaining Code in Place 21 Disposable Code Is a Model for Avoiding Maintenance Costs 22 The Maintenance Horizon Is Not Always Knowable Up Front 22 Production Quality Code Is a Function of Dependability 23 Identifying the Critical Path 23 The Closer You Get to Persisting Data, the More Cautious You Should Be 24 vii
Development to Production: Tools for Crossing Over 25 Practice 0: Give Yourself the Gift of Rich Data and Precision Tooling 25 Practice 1: Build a Feedback Loop Between Developers and Production 26 Practice 2: Test Your Code Before You Deploy It 28 Practice 3: Instrument Your Code and Validate in Production 29 Practice 4: Decouple Deploys from Releases Using Feature Flags 30 Practice 5: Invest in Progressive Delivery, Canaries, Automated Rollbacks 32 Bonus Practices: Traffic Splitters, Capture/Replay, Strangler Figs 35 Observability Is the Feedback Loop of Feedback Loops 35 The Work of Development Is Not Done Until It’s Working in Production 36 Conclusion 36 3. The Origins of Observability in Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 An Introduction from Charity 39 The Dominant Model of “Observability” Is Just Monitoring, Rebranded 40 We Lost the Fight to Define Observability 41 Production Has Become Too Complex for Us to Debug Via Intuition 42 The Facebook Experience That Showed Us What’s Possible 43 The Control Theory Definition Made It All Click 44 The Modern Observability Landscape Is Confusing and Continues to Expand 45 Costs Are Driving the Need to Change, and AI Is Enabling That Change 46 Distilling the Lessons That Matter 47 Conclusion 48 Part II. Instrumentation Fundamentals 4. Getting Started with Instrumentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Instrumentation Basics 52 The Vocabulary of Telemetry: Logs, Metrics, and Traces 52 OpenTelemetry: The Universal Language for Observability 54 Automatic Instrumentation Versus Custom Instrumentation 55 Ownership of Instrumentation 56 Building a Custom Instrumentation Strategy 57 Cost Considerations, Volume Management, and Processing 59 Processing and Pipelines 59 Cost Considerations and Volume 59 Essential Concepts for Sampling 60 Conclusion 62 5. Structured Events Are the Building Blocks of Observability. . . . . . . . . . . . . . . . . . 63 What Is a Structured Event? 64 viii | Table of Contents
The Limitations of Metrics 65 What Is a Metric? 65 Metrics Are Typically Aggregated 66 The Inner Workings of Logs 68 Turning Traditional Logs Into Structured Logs 70 Is a Structured Log the Same as a Structured Event? 71 Tracking a Single Operation 71 The Inner Workings of Distributed Traces 72 A Brief Introduction to Distributed Tracing 73 The Components of Tracing 74 Turning Logs Into Distributed Traces 76 Traces Are Collections of Spans 79 Trace Context 79 Dynamically Generating the Right System Views 81 Conclusion 83 6. Making Structured Events Arbitrarily Wide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Service and Code Context 86 Service Metadata 86 Build Information 89 Feature Flags 91 Versions of Important Things 92 Request and Execution Flow 92 HTTP Information 93 Route Information 96 Timings 97 Async Request Summaries 98 Errors 99 User and Business Context 101 User and Customer Information 101 Rate Limits 103 Caching 105 Localization Information 105 Operational Information 105 Uptime 106 Metrics 107 A Convention to Filter Out Everything Else 110 Attributes Important to Your Specific Application 110 Conclusion 111 7. Instrumenting Your Code with OpenTelemetry. . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 What It Means to Use OpenTelemetry 114 Table of Contents | ix
Effective Instrumentation With OpenTelemetry 115 Trace-First Telemetry 115 Traces for Common Architectural Patterns 118 Metrics, Spans, Logs, Events—Oh My? 125 Using AI Agents to Instrument Your Code 127 What Is an Agent, Anyway? 128 Instrumentation with Agents 128 Useful Strategies for Agents 130 Conclusion 131 Part III. Analysis Workflows 8. Getting Started with Observability Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Debugging from Known Conditions 136 Debugging from First Principles 138 Using the Core Analysis Loop 139 Automating the Brute-Force Portion of the Core Analysis Loop 141 Automating Analysis with Generative AI 144 Agentic AI Personas 145 Using Agentic AI for Observability in Practice 146 Conclusion 147 9. Observability-Driven Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Test-Driven Development 149 Observability in the Development Cycle 150 Determining Where to Debug 151 Debugging in the Time of Microservices 152 How Instrumentation Drives Modern Observability 153 Shifting Observability Left 155 Using Observability to Speed Up Software Delivery 156 Observability-Driven Development with AI 157 Conclusion 159 10. The Role of AI Agents for Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 What Is an AI Agent for Observability? 162 The Pitfalls of Querying Without Context 164 Proven Use Cases for Observability Agents 165 Incident Response 166 Explaining Errors and Patterns in Telemetry Data 166 Improving Instrumentation Quality 166 Observability-Adjacent Use Cases for Agents 167 x | Table of Contents
The Production Problem: The Mental Model That’s Disappearing 168 The Need for Context 171 Conclusion 171 11. Using Service Level Objectives for Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Threshold-Based Alerting Creates Alert Fatigue 174 Threshold Alerting Is Only for Known-Unknowns 175 User Experience Is a North Star 178 Reliable Alerting with Service Level Objectives 179 Case Study: Changing Culture Toward SLO-Based Alerts 181 Accelerating SLO Adoption with Generative AI 184 From Targets to Implementation: Drafting a Service Level Indicator 186 Encoding Best Practices in Your Prompts 189 Conclusion 191 Part IV. Observability Technical Deep Dives 12. Acting On and Debugging SLO-Based Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Alerting Before Your Error Budget Is Empty 195 Framing Time as a Sliding Window 197 Forecasting to Create a Predictive Burn Alert 198 Threshold-Crossing Alerts 199 Relative Burn Alerts 200 Predictive Burn Alerts 201 The Baseline Window 209 Acting on SLO Burn Alerts 210 Using Structured Event Data for SLOs Versus Time-Series Data 212 Conclusion 214 13. Efficient Data Storage with Retriever. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 The Functional Requirements for Observability 216 Time-Series Databases Are Inadequate for Unified Observability 217 Other Possible Datastores 219 Data Storage Strategies 221 Case Study: The Implementation of Retriever 224 Partitioning Data by Time 224 Storing Data by Column Within Segments 226 Performing Query Workloads 228 Queries on Parts of Fields and Aggregated Fields 230 Querying for Traces 230 Joins 231 Table of Contents | xi
Querying Data in Real Time 232 Making It Affordable with Tiering 232 Making It Fast with Parallelism 233 Dealing with High Cardinality 234 Scaling and Durability Strategies 234 Notes on Building Your Own Efficient Datastore 237 Conclusion 238 14. Efficient Data Storage with ClickHouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 ClickHouse Core Concepts 240 MergeTree Fundamentals 241 Query Execution 243 Collecting Data 248 The OpenTelemetry Collector and ClickHouse Exporter 248 Alternate Ingestion Approaches 249 Querying Data 249 Simple Queries 250 Complex Joins 252 Scaling Vertically 254 Profiling Your Queries 254 Practical Data Modeling and Optimization 255 Data Lifecycle Management 261 Scaling Horizontally 263 Replication 263 Sharding 267 Single-Region Observability Cluster 268 Multiregion Observability Cluster 269 SharedMergeTree 270 Visualizing Your Data 272 Using ClickHouse for Observability Workloads 272 Conclusion 273 15. Cheap and Accurate Enough Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Sampling to Refine Your Data Collection 275 Using Different Approaches to Sampling 277 Constant-Probability Sampling 277 Sampling on Recent Traffic Volume 278 Sampling Based on Event Content (Keys) 278 Combining per Key and Historical Methods 279 Choosing Dynamic Sampling Options 279 When to Make a Sampling Decision for Traces 279 Translating Sampling Strategies into Code 280 xii | Table of Contents
The Base Case 281 Fixed-Rate Sampling 281 Recording the Sample Rate 281 Consistent Sampling 283 Target-Rate Sampling 284 Having More Than One Static Sample Rate 286 Sampling by Key and Target Rate 287 Sampling with Dynamic Rates on Arbitrarily Many Keys 288 Putting It All Together: Head and Tail per Key Target-Rate Sampling 290 Conclusion 291 16. Telemetry Management with Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 An Introduction to Telemetry Pipelines 294 The Telemetry Pipeline Solution 294 Why This Matters Now 295 Core Functions of a Modern Telemetry Pipeline 295 Collect 296 Normalize and Secure 296 Enrich 297 Reduce 297 Route 298 Data Resilience 298 Pipeline Control and Observability 299 What You Should Remember About Core Functions 300 Pipeline Architecture in Practice 300 Core Components 300 Deployment Patterns 301 Scaling and Performance in Practice 302 Build-Versus-Buy Considerations 303 What You Should Remember About Pipeline Architecture 304 Adoption and Migration 304 A Phased Path to Adoption 304 What You Should Remember About Adoption and Migration 306 Use Cases: Collect and Reduce, Combined 307 Business Case 308 Cost Control 308 Vendor Neutrality 308 Telemetry as a Strategic Asset 309 The Role of Telemetry Pipelines 309 Conclusion 310 Table of Contents | xiii
17. Ontologies as a Shared Language for Humans and AI. . . . . . . . . . . . . . . . . . . . . . . 311 Ontologies and Their Role in Observability 312 Design the Ontology 313 Define the Core Entities (the Nouns) 314 Define the Invariants (the Rules) 314 Visualize the Semantic Grammar of the Domain 315 Glue the Schema Through Metadata 316 Schematize Intent with the ActionPlan 317 Validate the Contract: Continuous Integration 318 Layer Three Gates for Defense 318 Establish the Team Workflow 319 Create Signal Parity in Production: Continuous Deployment 319 Understand the Hierarchy of Signals 320 Create Shared Instrumentation: The Universal Payload 320 Implement the AI Sandwich Architecture 321 Close the Loop: Production Driving Tests 322 Putting Ontologies into Practice 323 Conclusion 324 Part V. Observability Use Cases 18. Observability for CI/CD Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Why Reliable and Fast CI/CD Matters 328 Build Observability Has the Best ROI of All Observability Applications 328 CI/CD Through the Lens of Observability 329 The Ontology of Continuous Integration and Deployment 329 Instrumentation Basics 330 Defining Service Level Indicators and Achieving Predictability 331 From Jobs, to Directed Acyclic Graphs and Traces 331 Making Improvements and Measuring Them 332 Understanding Performance: Treating Continuous Integration Like Production 333 Predictability: Real-World Trade-Offs 334 (The Lack of) Incrementality: The Bane of Continuous Integration’s Existence 335 Keeping Your Build Performance Tight and Your Developers Happy 336 Case Study: The Importance of Quick Build Times at Honeycomb 336 History of Improving Build Times at Honeycomb 337 Applying These Lessons to Your Build System 339 Conclusion 341 xiv | Table of Contents
19. Observability for Mobile and Frontend. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Status Quo for Mobile and Frontend 344 Instrumentation Limitations 346 Overcoming Domain Challenges 348 Recognizing the Problem 348 Instrumentation Difficulties 350 Applying Local Storage and Real-Time Control to Mobile Observability 354 What to Observe and Why 358 Opportunities for Improvement 359 Making Observability User-Focused 359 Quantifying User Experience 361 Adapting Existing Approaches 362 Applying User-Focused Observability 363 Iterative Analysis Process 363 Example: Food Delivery App 363 Mobile and Frontend Applications Deserve Observability 366 Conclusion 367 20. Performance Engineering with Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 The Case for Performance Engineering 369 Building a Performance Engineering Practice 372 Optimizing Cost Without Modifying Application Code 373 Infrastructure Purchasing Models 373 Fleet-Wide Optimization 375 Cost Optimizing Kubernetes 376 Cost Optimizing Serverless 380 Observing the Costs 381 How Application Observability Reduces Cost 382 Using CPU Profiling Tools 383 Using the Correct Observability Signals, Together 385 Conclusion 388 21. Observability for Large Language Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Why Observability Matters for LLMs 392 Using Evaluations for LLM Reliability 393 Designing Telemetry for LLM Applications 395 Analyzing Telemetry for AI Applications 396 Feeding Observability Data Into AI Application Development 398 Using Evaluations and Observability Together 400 Conclusion 402 Table of Contents | xv
22. Fin’s Case Study in Modern Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Increasing Resolution Rate 404 Speeding Up Without Losing Efficiency 405 And Thus, “Time to First Token” Was Born 406 Speeding Fin Up 409 Finance Enters the Chat 411 Empathy 413 Conclusion 415 Part VI. Observability Governance 23. Organizational Learning Speed Is Now Your Biggest Constraint: An Open Letter to CTOs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 An Open Letter to CTOs 420 The Sociotechnical Debts That Will Hold You Back 421 When Developers Live In a World of Tests, Not Reality 423 When Telemetry Gets Treated Like Infrastructure, Not Product 424 When Developer Tools Were Never Designed as Products 425 These Debts Will Sabotage Your Adoption of AI 426 Turning the Ship Around 428 Measuring Value 428 Build Good Feedback Loops 428 Change Actions That Drive This Strategy 429 A Letter from a CTO: How Fin Engineering Optimizes for Learning 432 Conclusion 434 24. Systems Thinking for Software Delivery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Sociotechnical Systems 435 You Must Optimize Both the Social and Technical at the Same Time 436 Changing Information Flows 437 How Feedback Loops Drive Change 438 Amplifying Feedback Loops Accelerate Change 439 Balancing Feedback Loops Create Stability and Equilibrium 439 Small Shifts Can Trigger Massive Changes 440 The Difference Between a Virtuous Cycle and a Death Spiral Is the Ability to Self-Correct 441 Feedback Loops in Software Delivery Systems 441 Observability in Amplifying Loops 442 Observability in Balancing Loops 444 Partial Feedback Creates Systemic Distortions 446 Leverage Points in Sociotechnical Systems 448 xvi | Table of Contents
Leverage Points and the Limits of Observability 448 Push in the Right Direction 449 Conclusion 450 25. The Observability Landscape Through a Systems Lens. . . . . . . . . . . . . . . . . . . . . . 451 The Landscape Feels Noisy Because the Labels Are Noisy 451 The Loops Most Organizations Run Today (and What Is Missing) 452 Development Feedback Loops 452 Operational Feedback Loops 453 The Missing Loop 455 The Feedback Loop for Value Creation 456 Shipping Is Your Heartbeat 456 How to Build the Loop and Close the Gap 458 Why Closing the Gap Is Rare: The Economics of Cognitive Load 459 Two Observability Models, Two Feedback Loops 459 Three Pillars Model (Built for Operational Outcomes) 460 Unified Storage Model (Built for Developer Learning) 460 AI Changes the Game and Opens New Interaction Models 461 Both Feedback Loops Matter: Lead with the Right One 462 Align on the Outcomes You’re Building Toward 463 Conclusion 464 26. The Business Case for Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Identifying Your Priorities 465 Complementary Roles of the Two Feedback Loops 466 Read the Fire Codes 466 The Business Case for Operational Loops 467 Mapping Observability Models to Operational Outcomes 467 The Operational Mandate 468 The Business Case for Developer Learning Loops 469 Estimating the Value of Developer Learning 470 The Developer Mandate Has Two Halves 471 Strategic Investment or Cost Center Optimization? 475 Observability as Organizational Investment 476 Conclusion 477 27. Diagnosing Your Observability Investment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Don’t Pay Observability Prices for Monitoring Outcomes 480 The Firefighting Trap 480 An Investment Posture Mismatch 481 Activities Versus Learning 481 How to Know If Your Investment Is Working (or Not) 482 Table of Contents | xvii
Advice for Observability Investments 483 Conclusion 484 28. The Organizational Shift. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Recognizing Legacy Masquerading as Modern Observability 488 The Ownership Test 488 The Two-or-Three People Test 489 The Mystery Test 489 The Arbitrary Question Test 490 The Deployment Confidence Test 490 After the Tests 491 Understanding the Resistance 491 The Organizational Immune Response 492 Why Legacy Vendors Fight 492 Why Existing Teams Resist 492 Why Leadership Is Skeptical 493 Why You Cannot Fight It Alone 494 Building the Coalition 494 Find the People Whose Pain Is Immediate 494 Honor the Heroes Who Held It Together 495 What Makes a Good Ally 496 Building the Case Together 496 Securing the Mandate 496 Sponsorship Versus Authority 497 Finding the Right Executive 497 The Pitch That Works 497 The AI Forcing Function 498 The Roadmap 499 Start Small: One Domain, Full Depth 499 Pave the Path: Make the New Way Easier 499 Platform Engineering Principles 500 Demonstrate Wins: Rerun the Tests 500 Expand Deliberately 501 Building the Team 502 What the Observability Team Owns 502 Capabilities of the Observability Team 502 Bringing the Existing Team Along 503 Breaking the Vendor Identity Trap 504 What Not to Promise 505 Conclusion 505 xviii | Table of Contents
Loading comments...
Reply to Comment
Edit Comment