OXFORD UNIVERSITY COMPUTING LABORATORY

Scalable Problem Localization for Distributed Systems: Principles and Practices

Rui Zhang, Bruno C. d. S. Oliveira, Alan Bivens, Steve McKeever

abstract

Problem localization is a critical part of providing crucial system management capabilities to modern distributed environments. One key open challenge is for problem localization solutions to scale for systems containing hundreds or even thousands of nodes, whilst still remaining fast enough to respond to rapid environment changes and sufficiently cost-effective to avoid overloading any manage- ment or application component. This paper meets the challenge by introducing two scalable frameworks applicable to a wide range of existing problem localization solutions: one based on a summary- driven, narrow-down procedure, the other through decomposing and decentralizing the problem localization process. Both frame- works, at their best, are able to achieve O(logN ) problem local- ization time and O(1) per node communication load. The contrast- ing natures of both frameworks provide them with complimentary strengths that make them suitable for different scenarios in prac- tice. We demonstrate our approaches in simulation settings and two real-world environments and show promising scalability bene- fits that can make a difference in system management operations.

info

book title

ACM International Conference Proceedings of the Second International Conference on Scalable Information Systems (Infoscale'07)

institution

ACM

journal

ACM International Conference Proceedings of the Second International Conference on Scalable Information Systems (Infoscale'07)

location

Suzhou, China

month

June

year

2007

links

BibTeX

Link (pdf)

related pages

people

Random Image
Random Image
Random Image