Information Extraction for Event Ranking

Devezas, José; Nunes, Sérgio

doi:10.4230/OASIcs.SLATE.2017.18

File

Author Details

José Devezas

Sérgio Nunes

Cite As Get BibTex

José Devezas and Sérgio Nunes. Information Extraction for Event Ranking. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 18:1-18:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/OASIcs.SLATE.2017.18

Abstract

Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to effectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events.

Subject Classification

Keywords

Named Entity Recognition
Relation Extraction
Knowledge Base Population
Entity-Based Ranking
Academic Events

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2670-2676, 2007.
Hannah Bast, Björn Buchhold, and Elmar Haussmann. Semantic search on text and knowledge bases. Foundations and Trends in Information Retrieval, 10(2-3):119-271, 2016.
Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly, 2009.
Nuno Cardoso. REMBRANDT - reconhecimento de entidades mencionadas baseado em relações e análise detalhada do texto. In Cristina Mota and Diana Santos, editors, Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pages 195-211. Linguateca, 2008.
Hamish Cunningham, Valentin Tablan, Ian Roberts, Mark A Greenwood, and Niraj Aswani. Information extraction and semantic annotation for multi-paradigm information management. In Mihai Lupu, Katja Mayer, Noriko Kando, and Anthony Trippe, editors, Current Challenges in Patent Information Retrieval, pages 307-327. Springer, 2011.
Hamish Cunningham, Yorick Wilks, and Robert Gaizauskas. GATE: a general architecture for text engineering. In 16th Conference on Computational Linguistics, pages 1057-1060, 1996.
José Devezas and Sérgio Nunes. Index-based semantic tagging for efficient query interpretation. In International Conference of the Evaluation Forum (CLEF), pages 208-213, 2016.
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In 20th International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 601-610, 2014.
Erick Rocha Fonseca and João Luís G. Rosa. Mac-morpho revisited: Towards robust part-of-speech tagging. In 9th Brazilian Symposium in Information and Human Language Technology, pages 98-107, 2013.
Cláudia Freitas and Susana Afonso. Bíblia Florestal: Um manual lingüístico da Floresta Sintá(c)tica. Linguateca, 2007.
Catherine Havasi, Robert Speer, and Jason Alonso. ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent Advances in Natural Language Processing (RANLP), pages 27-29, 2007.
Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. Understanding user’s query intent with Wikipedia. In 18th International Conference on World Wide Web, pages 471-480, 2009.
John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Eighteenth International Conference on Machine Learning (ICML), pages 282-289, 2001.
Hang Li and Jun Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 7(5):343-469, 2014.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, pages 3111-3119, 2013.
Cristina Mota and Diana Santos. Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca, 2008.
David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3-26, 2007.
Kamel Nebhi. Named entity disambiguation using freebase and syntactic parsing. In First International Conference on Linked Data for Information Extraction, pages 50-55, 2013.
Joakim Nivre, Johan Hall, and Jens Nilsson. MaltParser: A data-driven parser-generator for dependency parsing. In The Fifth International Conference on Language Resources and Evaluation (LREC), pages 2216-2219, 2006.
Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191-218, 2006.
Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In 3rd Symposium on Languages, Applications and Technologies (SLATE), pages 267-274, 2014.
Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, techniques, and solutions. Transactions on Knowledge and Data Engineering, 27(2):443-460, 2015.
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun'ichi Tsujii. BRAT: a web-based tool for NLP-assisted text annotation. In 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 102-107, 2012.
David Vallet, Miriam Fernández, and Pablo Castells. An ontology-based information retrieval model. In European Semantic Web Conference, pages 455-470, 2005.

Information Extraction for Event Ranking

Authors José Devezas, Sérgio Nunes

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message