Search Facets and Ranking in Geospatial Dataset Search

Hervey, Thomas; Lafia, Sara; Kuhn, Werner

doi:10.4230/LIPIcs.GIScience.2021.I.5

Abstract

This study surveys the state of search on open geospatial data portals. We seek to understand 1) what users are able to control when searching for geospatial data, 2) how these portals process and interpret a user’s query, and 3) if and how user query reformulations alter search results. We find that most users initiate a search using a text input and several pre-created facets (such as a filter for tags or format). Some portals supply a map-view of data or topic explorers. To process and interpret queries, most portals use a vertical full-text search engine like Apache Solr to query data from a content-management system like CKAN. When processing queries, most portals initially filter results and then rank the remaining results using a common keyword frequency relevance metric (e.g., TF-IDF). Some portals use query expansion. We identify and discuss several recurring usability constraints across portals. For example, users are typically only given text lists to interact with search results. Furthermore, ranking is rarely extended beyond syntactic comparison of keyword similarity. We discuss several avenues for improving search for geospatial data including alternative interfaces and query processing pipelines.

Lars Backstrom, Jon Kleinberg, Ravi Kumar, and Jasmine Novak. Spatial variation in search engine queries. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08, pages 357-366, 2008. URL: https://doi.org/10.1145/1367497.1367546.
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999.
Andrea Ballatore, Werner Kuhn, Mary Hegarty, and Ed Parsons. Special issue introduction: Spatial approaches to information search. Spatial Cognition and Computation, 16(4):245-254, 2016. URL: https://doi.org/10.1080/13875868.2016.1243693.
Andrea Ballatore, David C Wilson, and Michela Bertolotto. A holistic semantic similarity measure for viewports in interactive maps. In International Symposium on Web and Wireless Geographical Information Systems, pages 151-166. Springer, 2012.
Kate Beard and Vyjayanti Sharma. Multidimensional ranking for data in digital spatial libraries. International Journal on Digital Libraries, 1(2):153-160, 1997. URL: https://doi.org/10.1007/s007990050011.
Brian J.L. Berry. Approaches to Regional Analysis: A Synthesis. Annals of the Association of American Geographers, 54(1):2-11, 1964. URL: https://doi.org/10.1111/j.1467-8306.1964.tb00469.x.
John Carlo Bertot, Ursula Gorham, Paul T. Jaeger, Lindsay C. Sarin, and Heeyoon Choi. Big data, open government and e-government: Issues, policies and recommendations. Information Polity, 19(1-2):5-16, 2014. URL: https://doi.org/10.3233/IP-140328.
Bradley Wade Bishop and Carolyn Hank. Measuring fair principles to inform fitness for use. International Journal of Digital Curation, 13(1):35-46, 2018. URL: https://doi.org/10.2218/ijdc.v13i1.630.
Christopher Bone, Alan Ager, Ken Bunzel, and Lauren Tierney. A geospatial search engine for discovering multi-format geospatial data across the web. International Journal of Digital Earth, 9(1):47-62, 2016. URL: https://doi.org/10.1080/17538947.2014.966164.
Max Craglia, Michael F Goodchild, Alessandro Annoni, Gilberto Camara, Michael Gould, Werner Kuhn, David Mark, Ian Masser, David Maguire, Steve Liang, and Ed Parsons. A position paper from the Vespucci Initiative for the Advancement of Geographic Information Science. International Journal of Spatial Data Infrastructures Research, 3:146-167, 2008. URL: https://doi.org/10.2902/1725-0463.2008.03.art9.
James Frew, Michael Freeston, Nathan Freitas, Linda Hill, Greg Janee, Kevin Lovette, Robert Nideffer, Terence Smith, and Qi Zheng. The Alexandria digital library architecture. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1513:61-73, 1998. URL: https://doi.org/10.1007/pl00021470.
Tamar Ganor. An Integrated Spatial Search Engine for Maps and Aerial Photographs on a Google Maps API Platform. Journal of Map and Geography Libraries, 13(2):175-197, 2017. URL: https://doi.org/10.1080/15420353.2016.1277574.
Christian Philipp Geiger and Jörn Von Lucke. Open Government and (Linked) (Open) (Government) (Data). JeDEM - eJournal of eDemocracy and Open Government, 4(2):265-278, 2012. URL: https://doi.org/10.29379/jedem.v4i2.143.
Michael F. Goodchild, Pinde Fu, and Paul Rich. Sharing geographic information: An assessment of the geospatial one-stop. Annals of the Association of American Geographers, 97(2):250-266, 2007. URL: https://doi.org/10.1111/j.1467-8306.2007.00534.x.
Darren Hardy and Kim Durante. A Metadata Schema for Geospatial Resource Discovery Use Cases. Code4Lib Journal, 25:1-1, 2014. URL: http://journal.code4lib.org/articles/9710.
Marti Hearst. User interfaces for search. Modern Information Retrieval, pages 21-55, 2011.
Krzysztof Janowicz, Frank van Harmelen, James A Hendler, and Pascal Hitzler. Why the Data Train Needs Semantic Rails. AI Magazine, 36(May):5-14, 2015. URL: https://doi.org/10.1609/aimag.v36i1.2560.
Yongyao Jiang, Yun Li, Chaowei Yang, Fei Hu, Edward M. Armstrong, Thomas Huang, David Moroni, Lewis J. McGibbney, and Christopher J. Finch. Towards intelligent geospatial data discovery: a machine learning framework for search ranking. International Journal of Digital Earth, 11(9):956-971, 2018. URL: https://doi.org/10.1080/17538947.2017.1371255.
Rosie Jones, Wei Vivian Zhang, Benjamin Rey, Pradhuman Jhala, and Eugene Stipp. Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3):229-246, 2008. URL: https://doi.org/10.1080/13658810701626186.
Sara Lafia, Andrew Turner, and Werner Kuhn. Improving discovery of open civic data. Leibniz International Proceedings in Informatics, LIPIcs, 114(9):1-9, 2018. URL: https://doi.org/10.4230/LIPIcs.GIScience.2018.9.
Tessa Lau and Eric Horvitz. Patterns of search: analyzing and modeling web query refinement. In UM99 user modeling, pages 119-128. Springer, 1999.
Chang Liu, Jacek Gwizdka, Jingjing Liu, Tao Xu, and Nicholas J. Belkin. Analysis and evaluation of query reformulations in different task types. Proceedings of the ASIST Annual Meeting, 47, 2010. URL: https://doi.org/10.1002/meet.14504701214.
David J. Maguire and Paul A. Longley. The emergence of geoportals and their role in spatial data infrastructures. Computers, Environment and Urban Systems, 29(1 SPEC.ISS.):3-14, 2005. URL: https://doi.org/10.1016/j.compenvurbsys.2004.05.012.
Ian Masser. GIS worlds: creating spatial data infrastructures, volume 338. Esri Press Redlands, CA, 2005.
Matthew S Mayernik. Research data and metadata curation as institutional issues. Journal of the Association for Information Science and Technology, 67(4):973-993, 2016. URL: https://doi.org/10.1002/asi.23425.
Karen Okamoto. What is being done with open government data? An exploratory analysis of public uses of New York City open data. Webology, 13(1):1-12, 2016.
Ricardo Oliveira and Rafael Moreno. Harvesting, integrating and distributing large open geospatial datasets using free and open-source software. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 41(July):939-940, 2016. URL: https://doi.org/10.5194/isprsarchives-XLI-B7-939-2016.
José M. Perea-Ortega, Miguel A. García-Cumbreras, and L. Alfonso Ureña-López. Evaluating different query reformulation techniques for the geographical information retrieval task considering geospatial entities as textual terms, 2012.
Ross S. Purves, Paul Clough, Christopher B. Jones, Mark H. Hall, and Vanessa Murdock. Geographic information retrieval: Progress and challenges in spatial search of text. Foundations and Trends® in Information Retrieval, 12(2-3):164-318, 2018. URL: https://doi.org/10.1561/1500000034.
Michael G. Tait. Implementing geoportals: Applications of distributed GIS. Computers, Environment and Urban Systems, 29(1 SPEC.ISS.):33-47, 2005. URL: https://doi.org/10.1016/j.compenvurbsys.2004.05.011.
Akemi Takeoka and Christopher G Reddick. A longitudinal cross-sector analysis of open data portal service capability : The case of Australian local governments. Government information quarterly, 34:231-243, 2017. URL: https://doi.org/10.1016/j.giq.2017.02.004.
W Tang and J Selwood. Spatial portals: Adding value to spatial data infrastructures. In ISPRS Workshop on Service and Application of Spatial Data Infrastructure, pages 14-16, 2005.
Jeffrey Thorsby, Genie N.L. Stowers, Kristen Wolslegel, and Ellie Tumbuan. Understanding the content and features of open data portals in American cities. Government Information Quarterly, 34(1):53-61, 2017. URL: https://doi.org/10.1016/j.giq.2016.07.001.
Mark D. Wilkinson, Susanna Assunta Sansone, Erik Schultes, Peter Doorn, Luiz Olavo Bonino Da Silva Santos, and Michel Dumontier. Comment: A design framework and exemplar metrics for FAIRness. Scientific Data, 5:1-4, 2018. URL: https://doi.org/10.1038/sdata.2018.118.
Phil Yang, John Evans, Marge Cola, Steve Marley, Nadine Alameh, and Myra Bambacus. The emerging concepts and applications of the spatial web portal. Photogrammetric Engineering and Remote Sensing, 73(6):691-698, 2007. URL: https://doi.org/10.14358/PERS.73.6.691.
Anneke Zuiderwijk and Marijn Janssen. Open data policies, their implementation and impact: A framework for comparison. Government Information Quarterly, 31(1):17-29, 2014. URL: https://doi.org/10.1016/j.giq.2013.04.003.

Search Facets and Ranking in Geospatial Dataset Search

Authors Thomas Hervey , Sara Lafia, Werner Kuhn

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message