Geotagging Location Information Extracted from Unstructured Data (Short Paper)

Min, Kyunghyun; Lee, Jungseok; Yu, Kiyun; Kim, Jiyoung

doi:10.4230/LIPIcs.GISCIENCE.2018.49

File

LIPIcs.GISCIENCE.2018.49.pdf

Filesize: 293 kB
6 pages

Document Identifiers

DOI: 10.4230/LIPIcs.GISCIENCE.2018.49
URN: urn:nbn:de:0030-drops-93778

Author Details

Kyunghyun Min

Department of Civil and Environmental Engineering, Seoul National University 35-209, Gwanak-gu, Seoul, Republic of Korea

Jungseok Lee

Department of Civil and Environmental Engineering, Seoul National University 35-209, Gwanak-gu, Seoul, Republic of Korea

Kiyun Yu

Department of Civil and Environmental Engineering, Seoul National University 35-209, Gwanak-gu, Seoul, Republic of Korea

Jiyoung Kim

Institute of Construction and Environmental Engineering, Seoul National University 35-215, Gwanak-gu, Seoul, Republic of Korea

Cite AsGet BibTex

Kyunghyun Min, Jungseok Lee, Kiyun Yu, and Jiyoung Kim. Geotagging Location Information Extracted from Unstructured Data (Short Paper). In 10th International Conference on Geographic Information Science (GIScience 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 114, pp. 49:1-49:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.GISCIENCE.2018.49

Abstract

Location information is an essential element of location-based services and is used in various ways. Unstructured data contain different types of location information, but coordinate values are required to determine the exact location. In Twitter, a typical social network service (SNS) platform of unstructured data, the number of geotagged tweets is low. If we can estimate the location of text by geotagging a large number of unstructured data, we can estimate the location of the event in real-time. This study is a base study on extracting the location information by using the named entity recognizer provided by the Exobrain API and applying geotagging to unstructured data in Hangul (Korean). We used Chosun news articles, which are grammatically correct and well organized, instead of tweets to extract three location-related categories, namely "location," "organization," and "artifact". We used the named entity recognizer and geotagged each sentence in combination of the fields in each category. The results of the study showed that 61% of the 800 test sentences did not have the location-related information, thus hindering geotagging. In 11.75% of the test sentences, geotagging was possible with only the given location information extracted using the named entity recognizer. The remaining 27.25% of the sentences contained information on more than two locations from the same subcategories and hence required location estimation from candidate locations. In future research, we plan to apply the results of this study to develop location estimation algorithm that makes use of the extracted location-related entities from purely unstructured data such as that on SNSs.

Subject Classification

ACM Subject Classification

Information systems → Content analysis and feature selection

Keywords

Location Estimation
Information Extraction
Geo-Tagging
Location Information
Unstructured Data

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Puneet Agarwal, Rajgopal Vaithiyanathan, Saurabh Sharma, and Gautam Shroff. Catching the long-tail: Extracting local news events from twitter. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, pages 379-382, 2012.
Yunsu Choi and Jeongwon Cha. Korean named entity recognition and classification using word embedding features. In Journal of Korean Institute of Information Scientists and Engineers, pages 678-685, 2016.
Diana Inkpen, Ji Liu, Atefeh Farzindar, Farzaneh Kazemi, and Diman Ghazi. Location detection and disambiguation from twitter messages. Journal of Intelligent Information Systems, 49(2):237-253, 2017.
Farhad Laylavi, Abbas Rajabifard, and Mohsen Kalantari. A multi-element approach to location inference of twitter: A case for emergency response. ISPRS International Journal of Geo-Information, 5(5):56, 2016.
Ryong Lee, Shoko Wakamiya, and Kazutoshi Sumiya. Discovery of unusual regional social activities using geo-tagged microblogs. World Wide Web, 14(4):321-349, 2011.
Xiaohua Liu, Ming Zhou, Furu Wei, Zhongyang Fu, and Xiangyang Zhou. Joint inference of named entity recognition and normalization for tweets. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 526-535. Association for Computational Linguistics, 2012.
Kenta Oku, Koki Ueno, and Fumio Hattori. Mapping geotagged tweets to tourist spots for recommender systems. In Advanced Applied Informatics (IIAIAAI), 2014 IIAI 3rd International Conference on, pages 789-794. IEEE, 2014.
Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851-860. ACM, 2010.
Cheng Zhiyuan, Caverlee James, and Lee Kyumin. You are where you tweet: a content- based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 759-768, 2010.