Efficient Video Similarity Measurement and Search
CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Pagination or Media Count:
The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and di erent formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Iden- tifying similar content can bene t many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, we present a system ar- chitecture and corresponding algorithms to e ciently measure, search, and organize similar video sequences found on any large database such as the web. We rst introduce a class of randomized algorithms, called ViSig, to estimate video similarity. The basic idea is to summarize each video sequence into a small set of video frames, or a signature, that is most similar to a set of prede ned random images. Theoretical and experimental results show that video similarity can be reli- ably estimated by the ViSig method. Even though a small signature is su cient to estimate similarity, each frame in the signature is represented by a high-dimensional vector. Similarity search on a large database of high-dimensional vectors is a chal- lenging problem from a computational viewpoint. To solve this problem, we propose a novel non-linear feature extraction technique that can be used in a fast similarity search system.
- Computer Systems
- Radio Communications