Advances in Visual Information Management : Visual Database Systems. IFIP TC2 WG2.6 Fifth Working Conference on Visual Database Systems May 10-12, 2000, Fukuoka, Japan

Bok av Hiroshi Arisawa

Video segmentation is the most fundamental process for appropriate index- ing and retrieval of video intervals. In general, video streams are composed 1 of shots delimited by physical shot boundaries. Substantial work has been done on how to detect such shot boundaries automatically (Arman et aI. , 1993) (Zhang et aI. , 1993) (Zhang et aI. , 1995) (Kobla et aI. , 1997). Through the inte- gration of technologies such as image processing, speech/character recognition and natural language understanding, keywords can be extracted and associated with these shots for indexing (Wactlar et aI. , 1996). A single shot, however, rarely carries enough amount of information to be meaningful by itself. Usu- ally, it is a semantically meaningful interval that most users are interested in re- trieving. Generally, such meaningful intervals span several consecutive shots. There hardly exists any efficient and reliable technique, either automatic or manual, to identify all semantically meaningful intervals within a video stream. Works by (Smith and Davenport, 1992) (Oomoto and Tanaka, 1993) (Weiss et aI. , 1995) (Hjelsvold et aI. , 1996) suggest manually defining all such inter- vals in the database in advance. However, even an hour long video may have an indefinite number of meaningful intervals. Moreover, video data is multi- interpretative. Therefore, given a query, what is a meaningful interval to an annotator may not be meaningful to the user who issues the query. In practice, manual indexing of meaningful intervals is labour intensive and inadequate.