Title: Capturing, indexing, and retrieving system history
Abstract: Complex networked systems are widely deployed today and support many popular services such as Google and Ebay.com. Due to their size and complexity, these systems tend to behave in ways that are difficult for operators to understand. In addition, frequent changes such as hardware and software upgrades mean that insights into system behavior could be invalidated at any time. When these complex systems exhibit problems, administrators must often analyze millions of metrics collected about system state, the vast majority of which are irrelevant for any particular problem. Furthermore, systematic methods of utilizing previous diagnostic efforts to aid problem resolution are lacking.
This dissertation describes our approach of automatically extracting indexable descriptions, or signatures, that distill the system information most associated with a problem and can be formally manipulated to facilitate automated clustering and similarity based search. We argue that our technique helps operators better manage problems both by improved leveraging of past diagnostic efforts, and by automated identification of relevant system information.
The first half of this thesis details how signatures can be used to aid system problem diagnosis and the methodology for evaluating their effectiveness. We also present a specific signature construction method based on statistical machine learning and show that signatures generated in this manner have significantly better clustering and retrieval properties compared to naive approaches. We validated our techniques on a testbed system with injected problems, as well as a production system serving real customers.
The latter half of this thesis focuses on a couple of challenges we faced. First, because system behavior is often highly dynamic, we introduce a technique for employing an ensemble of models to capture changes in behavior. Second, problem symptoms often depend on how normal system behavior is defined. We present a method of using multiple models of normality to make signatures robust to variances in normal system behavior.
We believe our signatures-based approach offers a promising framework for leveraging statistical and information retrieval techniques to address the challenges posed by the complexity of today's and tomorrow's systems.
Publication Year: 2007
Publication Date: 2007-01-01
Language: en
Type: book
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot