Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text

Authors José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, Joaquim Sousa Pinto



PDF
Thumbnail PDF

File

OASIcs.SLATE.2017.20.pdf
  • Filesize: 0.5 MB
  • 14 pages

Document Identifiers

Author Details

José Casimiro Pereira
António J. S. Teixeira
Mário Rodrigues
Pedro Miguel
Joaquim Sousa Pinto

Cite AsGet BibTex

José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, and Joaquim Sousa Pinto. Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/OASIcs.SLATE.2017.20

Abstract

Information Extraction from natural texts has a great potential in areas such as Tourism and can be of great assistance in transforming customers' comments in valuable information for Tourism operators, governments and customers. After extraction, information needs to be efficiently transmitted to end-users in a natural way. Systems should not, in general, send extracted information directly to end-users, such as hotel managers, as it can be difficult to read. Naturally, humans transmit and encode information using natural languages, such as Portuguese. The problem arising from the need of efficient and natural transmission of the information to end-user is how to encode it. The use of natural language generation (NLG) is a possible solution, for producing sentences, and, with them, texts. In this paper we address this, with a data-to-text system, a derivation of formal NLG systems that use data as input. The proposed system uses an aligned corpus, which was defined, collected and processed, in about approximately 3 weeks of work. To build the language model were used three different in-domain and out-of-domain corpora. The effects of this approach were evaluated, and results are presented. Automatic metrics, BLEU and Meteor, were used to evaluate the different systems, comparing their values with similar systems. Results show that expanding the corpus has a major positive effect in BLEU and Meteor scores and use of additional corpora (in-domain and out-of-domain) in training language model does not result in significantly different performance. The scores obtained, combined with their comparison with other systems performance and informal evaluation by humans of the sentences produced, give additional support for the capabilities of the translation based approach for fast development of data-to-text for new domains.
Keywords
  • Data-to-Text
  • Natural Language Generation
  • Automatic Translation
  • opinions
  • Tourism
  • Portuguese

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael Denkowski and Alon Lavie. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In 6th Workshop on Statistical Machine Translation - EMNLP, pages 85-91, 2011. Google Scholar
  2. Flávio Ferreira, Nuno Almeida, Ana Filipa Rosa, André Oliveira, José Casimiro Pereira, Samuel Silva, and António Teixeira. Elderly centered design for interaction - the case of the S4S medication assistant. Procedia Computer Science, 27(Dsai 2013):398-408, 2014. Google Scholar
  3. Albert Gatt and Ehud Reiter. SimpleNLG: a realisation engine for practical applications. In 12th European Workshop on Natural Language Generation, pages 90-93, 2009. Google Scholar
  4. Brian Langner. Data-driven Natural Language Generation: Making Machines Talk Like Humans Using Natural Corpora. PhD thesis, Carnegie Mellon University, 2010. Google Scholar
  5. Eder Miranda Novais, Rafael Lage Oliveira, Daniel Bastos Pereira, Thiago Dias Tadeu, and Ivandre Paraboni. A testbed for portuguese natural language generation. In Brazilian Symposium in Information and Human Language Technology, pages 154-157, 2009. Google Scholar
  6. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics, pages 311-318, 2002. Google Scholar
  7. José Casimiro Pereira and António Teixeira. Geração de linguagem natural para conversão de dados em texto - aplicação a um assistente de medicação para o português. Linguamática, 7(1):3-21, 2015. Google Scholar
  8. José Casimiro. Pereira, António Teixeira, and Joaquim Sousa Pinto. Towards a hybrid NLG system for Data2Text in Portuguese. In Conferência Ibérica de Sistemas e Tecnologias de Informação (CISTI), pages 679-684, 2015. Google Scholar
  9. François Portet, Ehud Reiter, Albert Gatt, Jim Hunter, Somayajulu Sripada, Yvonne Freer, and Cindy Sykes. Automatic generation of textual summaries from neonatal intensive care data. Artifitial Intelligence, 173(7-8):789-816, 2009. Google Scholar
  10. Ehud Reiter. An architecture for data-to-text systems. In Eleventh European Workshop on Natural Language Generation (ENLG), pages 97-104, 2007. Google Scholar
  11. Ehud Reiter and Robert Dale. Building Natural Language Generation Systems. Cambridge University Press, 2000. Google Scholar
  12. Mário Rodrigues and António Teixeira. Advanced Information Extraction. Springer, 2015. Google Scholar
  13. Diana Santos and Paulo Rocha. Evaluating CETEMPúblico, a free resource for portuguese. In Annual Meeting of the Association for Computational Linguistics, pages 442-449, 2001. Google Scholar
  14. Douglas Fernandes Pereira Silva Junior, Ivandré Paraboni, and Eder Miranda Novais. Um sistema de realização superficial para geração de textos em Português. RITA - Revista de Informática Teórica e Aplicada, 20(3):31-48, 2013. Google Scholar
  15. Ross Turner, Somayajulu Sripada, Ehud Reiter, and Ian P. Davy. Generating spatio-temporal descriptions in pollen forecasts. In Conference of the European Chapter of the Association for Computational Linguistics, pages 163-166, 2006. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail