In this thesis, a new framework is presented which support the efficient representation, indexing and retrieval of multimedia data by content. Raw multimedia data is assumed to exist in the form of programs that typically consist of a combination of media types such as visual, audio, and text. We partition each such media stream into smaller units based on actual physical events. These physical events within each media stream can then be effectively indexed for retrieval.
Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we will focus on using the associated audio and image information for video analysis. This thesis dedicated to the two of the most important media types: images and videos. Novel approaches to the feature analysis, content representation, indexing and retrieval are presented.
The main contributions of this thesis are: (ⅰ) reliable and robust feature extraction and representation techniques for images and videos; (ⅱ) introduction of spatial relationship techniques to image retrieval, which greatly improves the retrieval performance and alleviates user’s query formulation burden; and (ⅲ) introduction several reduction rules for spatial relationships, which improve query processing time and save disk space; (ⅳ) introduction of a new audio feature extraction and analysis techniques for video. Extensive experimental results for large data sets have validated to the proposed approaches.