Carnegie Mellon University Pittsburgh United States
This dissertation proposes a fundamentally different way of monitoring persistent storage. It introduces a monitoring platform based on the modern reality of software defined storage which enables the decoupling of policy from mechanism. The proposed platform is both agentless - meaning it operates external to and independent of the entities it monitors - and scalable - meaning it is designed to address many systems at once with a mixture of operating systems and applications. Concretely, this dissertation focuses on virtualized clouds, but the proposed monitoring platform generalizes to any form of persistent storage. The core mechanism this dissertation introduces is called Distributed Streaming Virtual Machine Introspection DS-VMI, and it leverages two properties of modern clouds virtualized servers managed by hypervisors enabling efficient introspection, and file-level duplication of data within cloud instances. We explore a new class of agentless monitoring applications via three interfaces with two different consistency models cloud-inotify strong consistency, cloud eventual consistency, and cloud-history strong consistency. cloud inotify is a publish-subscribe interface to cloud-wide file-level updates and it supports event-based monitoring applications. cloud is designed to support batch-based and legacy monitoring applications by providing a file system interface to cloud-wide file-level state. cloud-history is designed to support efficient search and management of historic virtual disk state. It leverages new fast-to-access archival storage systems, and achieves tractable indexing of file-level history via whole-file deduplication using a novel application of an incremental hashing construction.