DagSemProc.08091.10.pdf
- Filesize: 431 kB
- 15 pages
Verbal statements and vision are a rich source of information in a human-machine interaction scenario. For this reason Situated Computer Vision aims to include knowledge about the communicative situation in which it takes place. This paper presents three approaches how to achieve scene models of such scenarios combining different modalities. Seeing (planar) scenes as configurations of parts leads to a probabilistic modeling with Bayes’ nets relating spoken utterances with results of an object recognition step. In the second approach parallel datasets form the basis for analyzing the statistical dependencies between them through learning a statistical translation model which maps between these datasets (here: words in a text and boundary fragments extracted in 2D images). The third approach deals with complex indoor scenes from which 3D data is acquired. Planar structures in the 3D points and statistics extracted on these planar patches describe the coarse spatial layouts of different indoor room types in such a way that a holistic classification scheme can be provided.
Feedback for Dagstuhl Publishing