Identifying and inferring objects from textual descriptions of scenes from books

Author Andrew Cropper

Thumbnail PDF


  • Filesize: 0.58 MB
  • 8 pages

Document Identifiers

Author Details

Andrew Cropper

Cite AsGet BibTex

Andrew Cropper. Identifying and inferring objects from textual descriptions of scenes from books. In 2014 Imperial College Computing Student Workshop. Open Access Series in Informatics (OASIcs), Volume 43, pp. 19-26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Fiction authors rarely provide detailed descriptions of scenes, preferring the reader to fill in the details using their imagination. Therefore, to perform detailed text-to-scene conversion from books, we need to not only identify explicit objects but also infer implicit objects. In this paper, we describe an approach to inferring objects using Wikipedia and WordNet. In our experiments, we are able to infer implicit objects such as monitor and computer by identifying explicit objects such as keyboard.
  • Text-to-Scene Conversion
  • Natural Language Processing
  • Artificial Intelligence


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Giovanni Adorni, Mauro Di Manzo, and Fausto Giunchiglia. Natural language driven image generation. In Proceedings of the 10th international conference on Computational linguistics, pages 495-500. Association for Computational Linguistics, 1984. Google Scholar
  2. Ola Åkerberg, Hans Svensson, Bastian Schulz, and Pierre Nugues. Carsim: an automatic 3d text-to-scene conversion system applied to road accident reports. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 2, pages 191-194. Association for Computational Linguistics, 2003. Google Scholar
  3. Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python. " O'Reilly Media, Inc.", 2009. Google Scholar
  4. Duy Bui, Carlos Nakamura, Bruce E Bray, and Qing Zeng-Treitler. Automated illustration of patients instructions. In AMIA Annual Symposium Proceedings, volume 2012, page 1158. American Medical Informatics Association, 2012. Google Scholar
  5. Angel X Chang, Manolis Savva, and Christopher D Manning. Semantic parsing for text to 3d scene generation. ACL 2014, page 17, 2014. Google Scholar
  6. Sharon Rose Clay and Jane Wilhelms. Put: Language-based interactive manipulation of objects. Computer Graphics and Applications, IEEE, 16(2):31-39, 1996. Google Scholar
  7. Bob Coyne and Richard Sproat. Wordseye: an automatic text-to-scene conversion system. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 487-496. ACM, 2001. Google Scholar
  8. Sylvain Dupuy, Arjan Egges, Vincent Legendre, and Pierre Nugues. Generating a 3d simulation of a car accident from a written description in natural language: The carsim system. In Proceedings of the workshop on Temporal and spatial information processing-Volume 13, page 1. Association for Computational Linguistics, 2001. Google Scholar
  9. Richard Johansson, David Williams, Anders Berglund, and Pierre Nugues. Carsim: a system to visualize written road accident reports as animated 3d scenes. In Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pages 57-64. Association for Computational Linguistics, 2004. Google Scholar
  10. Dhiraj Joshi, James Z Wang, and Jia Li. The story picturing engine - a system for automatic text illustration. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 2(1):68-89, 2006. Google Scholar
  11. Dan Jurafsky and James H Martin. Speech &language processing. Pearson Education India, 2000. Google Scholar
  12. Marie Louise Lingaya. Automatic scene extraction from natural language text. Master’s thesis, Nottingham Trent University School of Science and Technology, UK, 2008. Google Scholar
  13. Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google Scholar
  14. Rada Mihalcea and Chee Wee Leong. Toward communicating simple sentences using pictorial representations. Machine Translation, 22(3):153-173, 2008. Google Scholar
  15. George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39-41, 1995. Google Scholar
  16. K Onstad. Pixar gambles on a robot in love., 2008. Accessed: 25-06-2014.
  17. J Reimer. Cross-platform game development and the next generation of consoles., 2005. Accessed: 25-06-2014.
  18. Robert F Simmons. The clowns microworld. In Proceedings of the 1975 workshop on Theoretical issues in natural language processing, pages 17-19. Association for Computational Linguistics, 1975. Google Scholar
  19. Richard Sproat. Inferring the environment in a text-to-scene conversion system. In Proceedings of the 1st international conference on Knowledge capture, pages 147-154. ACM, 2001. Google Scholar
  20. Daniel Allen Tappan. Knowledge-based spatial reasoning for automated scene generation from text descriptions. PhD thesis, New Mexico State University, 2004. Google Scholar
  21. Meng Wang. Research on the relationship between story and the popularity of animated movies. Master’s thesis, Purdue University, United States, 2012. Google Scholar
  22. Terry Winograd. Procedures as a representation for data in a computer program for understanding natural language. Technical report, DTIC Document, 1971. Google Scholar
  23. Xiaojin Zhu, Andrew B Goldberg, Mohamed Eldawy, Charles R Dyer, and Bradley Strock. A text-to-picture synthesis system for augmenting communication. In AAAI, volume 7, pages 1590-1595, 2007. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail