Fraud Blocker
Home > Social Media & Entertainment
In the fast-paced world of media and entertainment, performance is everything. Make users wait, make their game buffer, or delay their transactions, and in less than a heartbeat they’ve jumped to a competitor’s site. Whether your business is news, video, gaming or gambling, it’s all about delivery and responsiveness — and that depends on high-performing applications.

Case Study: Resolving Catastrophic Database Performance Issue for a Leading Internet Company’s advertisement management system

Introduction

A major player in the internet industry, referred to here as “a leading internet company,” faced a critical database challenge that severely impacted their operations. This case study explores how our team at Enteros utilized advanced Enteros UpBeat techniques to swiftly resolve an issue that had stumped other experts.

The Challenge

  • Severe Database Hangs and Timeouts: The client’s advertisement management system was plagued by thousands of application database user sessions hanging, followed by massive timeouts, leading to a disruptive connection storm and fragmented database memories that caused databases to be restarted.
  • High-Level Attention: The severity of the issue escalated to the point where the COO was demanding updates every 20 minutes.
  • Previous Attempts at Resolution: Despite two weeks of effort by Oracle experts using standard tools like AWR (Automatic Workload Repository), ASH (Active Session History), and OS Watcher the root cause remained elusive.

Enteros UpBeat Approach

  • Comprehensive Instrumentation: We instrumented three key layers of their IT infrastructure: the storage area network, server layer (Red Hat Linux), and database layer (multiple Oracle nodes in Oracle Real Application Clusters).
  • Data Collection and Analysis: Data was collected every 3 seconds, and upon occurrence of the catastrophic event, intensive spike analysis and statistical analysis were conducted, particularly focusing on the database and storage area network data.

Discovery and Resolution

  • Identification of Cache Flush Event: We discovered a critical cache flush in the storage area network occurring around the time of the failures, triggered by a specific data read/write pattern within the less than 6 seconds.
  • Vendor Collaboration: After consulting with the storage vendor, NetApp, we understood the mechanism causing the cache flush.
  • Correlation with Database Activities: We correlated this finding with database activities and identified a large transaction log switch and sending log to standby destination operation as the trigger.
  • Root Cause Analysis: Further analysis revealed a specific SQL statement causing memory flush from the database buffer cache, leading to direct data file reads by thousands of database user sessions and being a part of the chain of events.
  • Remediation: Stop automatic transactional log shipment to standby and use a script that would wait for a minute before shipping a log file to standby

Results

  • Rapid Problem Identification: We were able to identify the multiple root causes of the catastrophic event in just a few hours, a task that had eluded Oracle experts for over two weeks.
  • System Stability Restored: Our findings enabled the company to make necessary adjustments, stabilizing their database environment.
  • Operational Efficiency Regained: The resolution of these issues allowed thousands of user sessions to operate smoothly, enhancing overall productivity and user experience.

Conclusion

This case study demonstrates the power of Enteros’ UpBeat Platform in resolving complex performance issues. Its ability to instrument and analyze data at multiple layers, combined with advanced statistical analysis, allowed customer to quickly identify and resolve a multifaceted problem that was critically impacting a major internet company. Our approach not only restored operational efficiency but also safeguarded key revenue streams for the client, demonstrating our platform’s indispensable value in high-stakes environments.

Thank you, the form has been submitted.