Ying Li - Video Content Analysis Using Multimodal Information : For Movie Content Extraction, Indexing and Representation download book TXT, DJV
9781402074905 English 1402074905 Video Content Analysis Using Multimodal Information For Movie Content Extraction, Indexing and Representation is on content-based multimedia analysis, indexing, representation and applications with a focus on feature films. Presented are the state-of-art techniques in video content analysis domain, as well as many novel ideas and algorithms for movie content analysis based on the use of multimodal information. The authors employ multiple media cues such as audio, visual and face information to bridge the gap between low-level audiovisual features and high-level video semantics. Based on sophisticated audio and visual content processing such as video segmentation and audio classification, the original video is re-represented in the form of a set of semantic video scenes or events, where an event is further classified as a 2-speaker dialog, a multiple-speaker dialog, or a hybrid event. Moreover, desired speakers are simultaneously identified from the video stream based on either a supervised or an adaptive speaker identification scheme. All this information is then integrated together to build the video's ToC (table of content) as well as the index table. Finally, a video abstraction system, which can generate either a scene-based summary or an event-based skim, is presented by exploiting the knowledge of both video semantics and video production rules. This monograph will be of great interest to research scientists and graduate level students working in the area of content-based multimedia analysis, indexing, representation and applications as well s its related fields., With the fast growth ofmultimedia information, content-based video anal ysis, indexing and representation have attracted increasing attention in re cent years. Many applications have emerged in these areas such as video on-demand, distributed multimedia systems, digital video libraries, distance learning/education, entertainment, surveillance and geographical information systems. The need for content-based video indexing and retrieval was also rec ognized by ISOIMPEG, and a new international standard called "Multimedia Content Description Interface" (or in short, MPEG-7)was initialized in 1998 and finalized in September 2001. In this context, a systematic and thorough review ofexisting approaches as well as the state-of-the-art techniques in video content analysis, indexing and representation areas are investigated and studied in this book. In addition, we will specifically elaborate on a system which analyzes, indexes and abstracts movie contents based on the integration ofmultiple media modalities. Content ofeach part ofthis book is briefly previewed below. In the first part, we segment a video sequence into a set ofcascaded shots, where a shot consistsofone or more continuouslyrecorded image frames. Both raw and compressedvideo data will beinvestigated. Moreover, consideringthat there are always non-story units in real TV programs such as commercials, a novel commercial break detection/extraction scheme is developed which ex ploits both audio and visual cues to achieve robust results. Specifically, we first employ visual cues such as the video data statistics, the camera cut fre quency, and the existenceofdelimiting black frames between commercials and programs, to obtain coarse-level detection results."
9781402074905 English 1402074905 Video Content Analysis Using Multimodal Information For Movie Content Extraction, Indexing and Representation is on content-based multimedia analysis, indexing, representation and applications with a focus on feature films. Presented are the state-of-art techniques in video content analysis domain, as well as many novel ideas and algorithms for movie content analysis based on the use of multimodal information. The authors employ multiple media cues such as audio, visual and face information to bridge the gap between low-level audiovisual features and high-level video semantics. Based on sophisticated audio and visual content processing such as video segmentation and audio classification, the original video is re-represented in the form of a set of semantic video scenes or events, where an event is further classified as a 2-speaker dialog, a multiple-speaker dialog, or a hybrid event. Moreover, desired speakers are simultaneously identified from the video stream based on either a supervised or an adaptive speaker identification scheme. All this information is then integrated together to build the video's ToC (table of content) as well as the index table. Finally, a video abstraction system, which can generate either a scene-based summary or an event-based skim, is presented by exploiting the knowledge of both video semantics and video production rules. This monograph will be of great interest to research scientists and graduate level students working in the area of content-based multimedia analysis, indexing, representation and applications as well s its related fields., With the fast growth ofmultimedia information, content-based video anal ysis, indexing and representation have attracted increasing attention in re cent years. Many applications have emerged in these areas such as video on-demand, distributed multimedia systems, digital video libraries, distance learning/education, entertainment, surveillance and geographical information systems. The need for content-based video indexing and retrieval was also rec ognized by ISOIMPEG, and a new international standard called "Multimedia Content Description Interface" (or in short, MPEG-7)was initialized in 1998 and finalized in September 2001. In this context, a systematic and thorough review ofexisting approaches as well as the state-of-the-art techniques in video content analysis, indexing and representation areas are investigated and studied in this book. In addition, we will specifically elaborate on a system which analyzes, indexes and abstracts movie contents based on the integration ofmultiple media modalities. Content ofeach part ofthis book is briefly previewed below. In the first part, we segment a video sequence into a set ofcascaded shots, where a shot consistsofone or more continuouslyrecorded image frames. Both raw and compressedvideo data will beinvestigated. Moreover, consideringthat there are always non-story units in real TV programs such as commercials, a novel commercial break detection/extraction scheme is developed which ex ploits both audio and visual cues to achieve robust results. Specifically, we first employ visual cues such as the video data statistics, the camera cut fre quency, and the existenceofdelimiting black frames between commercials and programs, to obtain coarse-level detection results."