Information Extraction for Event Ranking

Authors José Devezas, Sérgio Nunes



PDF
Thumbnail PDF

File

OASIcs.SLATE.2017.18.pdf
  • Filesize: 0.66 MB
  • 14 pages

Document Identifiers

Author Details

José Devezas
Sérgio Nunes

Cite As Get BibTex

José Devezas and Sérgio Nunes. Information Extraction for Event Ranking. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 18:1-18:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/OASIcs.SLATE.2017.18

Abstract

Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to effectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events.

Subject Classification

Keywords
  • Named Entity Recognition
  • Relation Extraction
  • Knowledge Base Population
  • Entity-Based Ranking
  • Academic Events

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2670-2676, 2007. Google Scholar
  2. Hannah Bast, Björn Buchhold, and Elmar Haussmann. Semantic search on text and knowledge bases. Foundations and Trends in Information Retrieval, 10(2-3):119-271, 2016. Google Scholar
  3. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly, 2009. Google Scholar
  4. Nuno Cardoso. REMBRANDT - reconhecimento de entidades mencionadas baseado em relações e análise detalhada do texto. In Cristina Mota and Diana Santos, editors, Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pages 195-211. Linguateca, 2008. Google Scholar
  5. Hamish Cunningham, Valentin Tablan, Ian Roberts, Mark A Greenwood, and Niraj Aswani. Information extraction and semantic annotation for multi-paradigm information management. In Mihai Lupu, Katja Mayer, Noriko Kando, and Anthony Trippe, editors, Current Challenges in Patent Information Retrieval, pages 307-327. Springer, 2011. Google Scholar
  6. Hamish Cunningham, Yorick Wilks, and Robert Gaizauskas. GATE: a general architecture for text engineering. In 16th Conference on Computational Linguistics, pages 1057-1060, 1996. Google Scholar
  7. José Devezas and Sérgio Nunes. Index-based semantic tagging for efficient query interpretation. In International Conference of the Evaluation Forum (CLEF), pages 208-213, 2016. Google Scholar
  8. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In 20th International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 601-610, 2014. Google Scholar
  9. Erick Rocha Fonseca and João Luís G. Rosa. Mac-morpho revisited: Towards robust part-of-speech tagging. In 9th Brazilian Symposium in Information and Human Language Technology, pages 98-107, 2013. Google Scholar
  10. Cláudia Freitas and Susana Afonso. Bíblia Florestal: Um manual lingüístico da Floresta Sintá(c)tica. Linguateca, 2007. Google Scholar
  11. Catherine Havasi, Robert Speer, and Jason Alonso. ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent Advances in Natural Language Processing (RANLP), pages 27-29, 2007. Google Scholar
  12. Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. Understanding user’s query intent with Wikipedia. In 18th International Conference on World Wide Web, pages 471-480, 2009. Google Scholar
  13. John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Eighteenth International Conference on Machine Learning (ICML), pages 282-289, 2001. Google Scholar
  14. Hang Li and Jun Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 7(5):343-469, 2014. Google Scholar
  15. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, pages 3111-3119, 2013. Google Scholar
  16. Cristina Mota and Diana Santos. Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca, 2008. Google Scholar
  17. David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3-26, 2007. Google Scholar
  18. Kamel Nebhi. Named entity disambiguation using freebase and syntactic parsing. In First International Conference on Linked Data for Information Extraction, pages 50-55, 2013. Google Scholar
  19. Joakim Nivre, Johan Hall, and Jens Nilsson. MaltParser: A data-driven parser-generator for dependency parsing. In The Fifth International Conference on Language Resources and Evaluation (LREC), pages 2216-2219, 2006. Google Scholar
  20. Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191-218, 2006. Google Scholar
  21. Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In 3rd Symposium on Languages, Applications and Technologies (SLATE), pages 267-274, 2014. Google Scholar
  22. Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, techniques, and solutions. Transactions on Knowledge and Data Engineering, 27(2):443-460, 2015. Google Scholar
  23. Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun'ichi Tsujii. BRAT: a web-based tool for NLP-assisted text annotation. In 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 102-107, 2012. Google Scholar
  24. David Vallet, Miriam Fernández, and Pablo Castells. An ontology-based information retrieval model. In European Semantic Web Conference, pages 455-470, 2005. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail