|
|
|||
|
||||
OverviewDiscover how AIOps is transforming the observability landscape for cloud-native and traditional systems. Learn how to build, monitor, and operate resilient services using AI-drive dynamic insights for smarter and more scalable operations Key Features Practical Integration of AI and Observability in Modern Engineering Workflows Real-World Use Cases Grounded in Industry Experience Tailored for Modern Engineering Roles and Organizations Book DescriptionObservability is mandatory for building and operating cloud-native distributed systems. Tools like OpenTelemetry have standardized how observability data is sourced, and AI now transforms how we extract value from the vast amounts of observability data generated by modern systems. This book guides you in implementing scalable observability, improving engineering efficiency with AI, and integrating observability throughout the Software Development Lifecycle (SDLC) via modern self-service internal developer platforms. You'll start with observability basics and learn how AIOps enhances signal correlation, anomaly detection, and root-cause analysis. Using real-world examples, the book demonstrates how to implement AIOps, build proactive detection pipelines, and automate diagnostics and remediation. You'll explore best practices for expanding observability using OpenTelemetry, Prometheus, Grafana, Dynatrace, Datadog, and New Relic alongside machine learning models, ensuring your systems are accurate, efficient, and secure. You'll also learn how to benchmark, measure, and secure your AIOps implementation, and gain a practical understanding of software compliance and how it applies to your systems. By the end of this book, you'll be ready to design and deliver AIOps-enabled observability solutions that make cloud-native systems more resilient, efficient, and secure.What you will learn Build observability pipelines for logs, metrics, traces and events Implement standards such as OpenTelemetry and Prometheus Correlate signals from multiple sources for better incident triage Apply AI/ML for anomaly detection and root cause analysis Design scalable architectures for intelligent monitoring Automate resiliency through self-healing and remediation agents Who this book is forThis book is for Software engineers and engineering leaders working on teams with operational responsibilities, such as platform engineering, site reliability engineering (SRE), DevOps, or application development, who want to integrate AIOps capabilities into their workflows will benefit from this book. If your team is responsible for building and running high-performing, resilient software systems, this book is for you. Full Product DetailsAuthor: Hilliary Lipsig , Andreas Grabner , Robert Rati , Max KörbächerPublisher: Packt Publishing Limited Imprint: Packt Publishing Limited ISBN: 9781806389599ISBN 10: 1806389592 Pages: 420 Publication Date: 13 March 2026 Audience: Professional and scholarly , Professional & Vocational Format: Paperback Publisher's Status: Active Availability: In Print This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us. Table of ContentsTable of Contents Observability: The Art of Turning Data into Insights The Elephant in the Room: Artificial Intelligence From Observability to AIOps and the Use Cases it Solves Today ACME Financial Services: Implementing AIOps Democratizing Observability: A Primer to Self-Service Platforms The Observability Agent: Real-Life Use Cases ACME Financial Services: How to Move from AIOps to Agentic Platforms Evolving Operations: Proactive > Preventive > Self-Driven Architecture No Future Without Challenges ACME Financial Services: How Will the AI Future Shape Our Company?ReviewsAuthor InformationHilliary Lipsig is an autodidact and start-up veteran who has frequently learned and applied technologies to get a job done. She's had her hand in every part of the application delivery process, honing her skills originally as a quality engineer. Hilliary is an IT polyglot, able to talk the lingo of both the Operations and Development teams. She's currently a senior principal site reliability engineer at Red Hat Inc., working on Kubernetes-based platforms. She's passionate about GitOps, continuous integration, scalable processes, consistency in tooling, and good developer documentation. Her open source activities include contributions to the CNCF Glossary, and she's a member of the Code of Conduct Committee for the Cloud Native Computing Foundation (CNCF). Andreas Grabner is a technical advocate for making distributed systems observable and making automated data-driven decisions across the software development lifecycle. In his capacity as a CNCF ambassador and a DevRel at Dynatrace, he connects and educates global software engineering communities on building and continuously validating digital services for resiliency, high availability, and security. Since his early days, he has been passionate about software quality and performance engineering, as it results in building excellent digital products. Andi uses his advocacy platforms to share best practices on topics such as observability, progressive delivery, DevOps, site reliability engineering, platform engineering, and digital business operations! Robert Rati is a software and platform engineer veteran of small, medium, and large corporations in regulated industries ranging from wireless communications to the financial sector. He is passionate about reducing noise and enabling teams to focus on creating business value. He emphasizes maintainability, consistency, user friendliness, and productivity when planning and implementing projects. He is currently an engineering manager with Second Front. Tab Content 6Author Website:Countries AvailableAll regions |
||||