Document Open Access Logo

Graph-of-Entity: A Model for Combined Data Representation and Retrieval

Authors José Devezas , Carla Lopes , Sérgio Nunes



PDF
Thumbnail PDF

File

OASIcs.SLATE.2019.1.pdf
  • Filesize: 1.12 MB
  • 14 pages

Document Identifiers

Author Details

José Devezas
  • INESC TEC, Porto, Portugal
  • Faculty of Engineering, University of Porto, Portugal
Carla Lopes
  • INESC TEC, Porto, Portugal
  • Faculty of Engineering, University of Porto, Portugal
Sérgio Nunes
  • INESC TEC, Porto, Portugal
  • Faculty of Engineering, University of Porto, Portugal

Cite AsGet BibTex

José Devezas, Carla Lopes, and Sérgio Nunes. Graph-of-Entity: A Model for Combined Data Representation and Retrieval. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 1:1-1:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.SLATE.2019.1

Abstract

Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving.

Subject Classification

ACM Subject Classification
  • Information systems → Document representation
  • Information systems → Retrieval models and ranking
  • Mathematics of computing → Graph theory
Keywords
  • Entity-oriented search
  • graph-based models
  • collection-based graph

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paavo Arvola, Shlomo Geva, Jaap Kamps, Ralf Schenkel, Andrew Trotman, and Johanna Vainio. Overview of the INEX 2010 Ad Hoc Track. In Comparative Evaluation of Focused Retrieval - 9th International Workshop of the Inititative for the Evaluation of XML Retrieval, INEX 2010, Vugh, The Netherlands, December 13-15, 2010, Revised Selected Papers, pages 1-32, 2010. URL: http://dx.doi.org/10.1007/978-3-642-23577-1_1.
  2. Hannah Bast, Björn Buchhold, Elmar Haussmann, et al. Semantic Search on Text and Knowledge Bases. Foundations and Trendsregistered in Information Retrieval, 10(2-3):119-271, 2016. Google Scholar
  3. Mikhail Bautin and Steven Skiena. Concordance-Based Entity-Oriented Search. In The 2007 IEEE / WIC / ACM Conference on Web Intelligence (WI '07), pages 2-5, 2007. Google Scholar
  4. Michael S. Bernstein, Jaime Teevan, Susan T. Dumais, Daniel J. Liebling, and Eric Horvitz. Direct answers for search queries in the long tail. In CHI Conference on Human Factors in Computing Systems, CHI '12, Austin, TX, USA - May 05 - 10, 2012, pages 237-246, 2012. URL: http://dx.doi.org/10.1145/2207676.2207710.
  5. Ravish Bhagdev, Sam Chapman, Fabio Ciravegna, Vitaveska Lanfranchi, and Daniela Petrelli. Hybrid search: Effectively combining keywords and semantic searches. In European Semantic Web Conference, pages 554-568. Springer, 2008. Google Scholar
  6. Roi Blanco and Christina Lioma. Random walk term weighting for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 829-830. ACM, 2007. Google Scholar
  7. Roi Blanco and Christina Lioma. Graph-based term weighting for information retrieval. Information Retrieval, 15(1):54-92, 2012. URL: http://dx.doi.org/10.1007/s10791-011-9172-x.
  8. Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pages 1247-1250, 2008. URL: http://dx.doi.org/10.1145/1376616.1376746.
  9. Jing Chen, Chenyan Xiong, and Jamie Callan. An Empirical Study of Learning to Rank for Entity Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17-21, 2016, pages 737-740, 2016. URL: http://dx.doi.org/10.1145/2911451.2914725.
  10. Ruey-cheng Chen, Damiano Spina, W Bruce Croft, Mark Sanderson, and Falk Scholer. Harnessing Semantics for Answer Sentence Retrieval. In Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2015), pages 21-27, 2015. Google Scholar
  11. José Devezas, Carla Teixeira Lopes, and Sérgio Nunes. FEUP at TREC 2017 opensearch track: Graph-based models for entity-oriented. In The Twenty-Sixth Text REtrieval Conference Proceedings (TREC 2017), Gaithersburg, MD, USA, 2017. Google Scholar
  12. Taoufiq Dkaki, Josiane Mothe, and Quoc Dinh Truong. Passage Retrieval Using Graph Vertices Comparison. In Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, SITIS 2007, Shanghai, China, December 16-18, 2007, pages 71-76, 2007. URL: http://dx.doi.org/10.1109/SITIS.2007.82.
  13. Pedro Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books, 2015. URL: https://books.google.pt/books?id=glUtrgEACAAJ.
  14. Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. In SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004, pages 49-56, 2004. URL: http://dx.doi.org/10.1145/1008992.1009004.
  15. Miriam Fernández, Iván Cantador, Vanesa López, David Vallet, Pablo Castells, and Enrico Motta. Semantically enhanced information retrieval: An ontology-based approach. Web semantics: Science, services and agents on the world wide web, 9(4):434-452, 2011. Google Scholar
  16. Shlomo Geva, Jaap Kamps, Miro Lehtonen, Ralf Schenkel, James A. Thom, and Andrew Trotman. Overview of the INEX 2009 Ad Hoc Track. In Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Brisbane, Australia, December 7-9, 2009, Revised and Selected Papers, pages 4-25, 2009. URL: http://dx.doi.org/10.1007/978-3-642-14556-8_4.
  17. Udayan Khurana and Amol Deshpande. Efficient snapshot retrieval over historical graph data. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pages 997-1008, 2013. URL: http://dx.doi.org/10.1109/ICDE.2013.6544892.
  18. Jon M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. J. ACM, 46(5):604-632, 1999. URL: http://dx.doi.org/10.1145/324133.324140.
  19. Ching-Pei Lee and Chih-Jen Lin. Large-Scale Linear RankSVM. Neural Computation, 26(4):781-817, 2014. URL: http://dx.doi.org/10.1162/NECO_a_00571.
  20. Bo Lin, Kevin Dela Rosa, Rushin Shah, and Nitin Agarwal. LADS : Rapid Development of a Learning-To-Rank Based Related Entity Finding System using Open Advancement. In The First International Workshop on Entity-Oriented Search (EOS 2011), 2011. Google Scholar
  21. Yuanhua Lv and ChengXiang Zhai. Lower-bounding term frequency normalization. In Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, 2011, pages 7-16, 2011. URL: http://dx.doi.org/10.1145/2063576.2063584.
  22. Bruno Martins and Mário J. Silva. A Graph-Ranking Algorithm for Geo-Referencing Documents. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27-30 November 2005, Houston, Texas, USA, pages 741-744, 2005. URL: http://dx.doi.org/10.1109/ICDM.2005.6.
  23. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. Google Scholar
  24. Hadas Raviv, David Carmel, and Oren Kurland. A ranking framework for entity oriented search using Markov random fields. In Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search (JIWES 2012), pages 1-6, 2012. URL: http://dx.doi.org/10.1145/2379307.2379308.
  25. François Rousseau and Michalis Vazirgiannis. Graph-of-word and TW-IDF: new approach to ad hoc IR. In Proceedings of the 22nd ACM International Conference on Information &Knowledge Management, pages 59-68. ACM, 2013. Google Scholar
  26. Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pages 142-147, 2003. URL: http://aclweb.org/anthology/W/W03/W03-0419.pdf.
  27. Ralf Schenkel, Fabian M. Suchanek, and Gjergji Kasneci. YAWN: A semantically annotated wikipedia XML corpus. In Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings, 7.-9. März 2007, Aachen, Germany, pages 277-291, 2007. URL: http://subs.emis.de/LNI/Proceedings/Proceedings103/article1404.html.
  28. Amit Singhal. Introducing the Knowledge Graph: things, not strings. https://googleblog.blogspot.pt/2012/05/introducing-knowledge-graph-things-not.html, May 2012.
  29. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pages 697-706, 2007. URL: http://dx.doi.org/10.1145/1242572.1242667.
  30. Valentin Tablan, Danica Damljanovic, and Kalina Bontcheva. A Natural Language Query Interface to Structured Information. In The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008, Proceedings, pages 361-375, 2008. URL: http://dx.doi.org/10.1007/978-3-540-68234-9_28.
  31. Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudr, and Karl Aberer. TRank: Ranking Entity Types Using the Web of Data. In International Symposium on Wearable Computers 2013 (ISWC 2013), 2013. URL: http://infoscience.epfl.ch/record/196256/files/TRank.pdf.
  32. Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. Word-Entity Duet Representations for Document Ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, pages 763-772, 2017. URL: http://dx.doi.org/10.1145/3077136.3080768.
  33. Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Ed Hovy. JointSem: Combining query entity linking and entity based document ranking. In Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM 2017), 2017. Google Scholar
  34. Nikita Zhiltsov, Alexander Kotov, and Fedor Nikolaev. Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253-262. ACM, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail