Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs

Authors Elisavet Koutsiana , Ioannis Reklos , Kholoud Saad Alghamdi , Nitisha Jain , Albert Meroño-Peñuela , Elena Simperl



PDF
Thumbnail PDF

File

TGDK.3.1.2.pdf
  • Filesize: 2.92 MB
  • 27 pages

Document Identifiers

Author Details

Elisavet Koutsiana
  • King’s College London, Bush House, Strand, London, UK
Ioannis Reklos
  • King’s College London, Bush House, Strand, London, UK
Kholoud Saad Alghamdi
  • University of Jeddah, Saudi Arabia
Nitisha Jain
  • King’s College London, Bush House, Strand, London, UK
Albert Meroño-Peñuela
  • King’s College London, Bush House, Strand, London, UK
Elena Simperl
  • King’s College London, Bush House, Strand, London, UK

Cite As Get BibTex

Elisavet Koutsiana, Ioannis Reklos, Kholoud Saad Alghamdi, Nitisha Jain, Albert Meroño-Peñuela, and Elena Simperl. Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 1, pp. 2:1-2:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/TGDK.3.1.2

Abstract

We study collaboration patterns of Wikidata, one of the world's largest open source collaborative knowledge graph (KG) communities.
Collaborative KG communities, play a key role in structuring machine-readable knowledge to support AI systems like conversational agents. However, these communities face challenges related to long-term member engagement, as a small subset of contributors often is responsible for the majority of contributions and decision-making. While prior research has explored contributors' roles and lifespans, discussions within collaborative KG communities remain understudied. To fill this gap, we investigated the behavioural patterns of contributors and factors affecting their communication and participation. We analysed all the discussions on Wikidata using a mixed methods approach, including statistical tests, network analysis, and text and graph embedding representations. Our findings reveal that the interactions between Wikidata editors form a small world network, resilient to dropouts and inclusive, where both the network topology and discussion content influence the continuity of conversations. Furthermore, the account age of Wikidata members and their conversations are significant factors in their long-term engagement with the project.
Our observations and recommendations can benefit the Wikidata and semantic web communities, providing guidance on how to improve collaborative environments for sustainability, growth, and quality.

Subject Classification

ACM Subject Classification
  • Information systems → Data mining
  • Human-centered computing → Wikis
  • Human-centered computing → Social networks
  • Human-centered computing → Empirical studies in HCI
Keywords
  • collaborative knowledge graph
  • network analysis
  • graph embeddings
  • text embeddings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Kholoud AlGhamdi, Miaojing Shi, and Elena Simperl. Learning to recommend items to wikidata editors. In International Semantic Web Conference, pages 163-181. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-88361-4_10.
  2. Ofer Arazy, Oded Nov, Raymond Patterson, and Lisa Yeo. Information quality in wikipedia: The effects of group composition and task conflict. Journal of management information systems, 27(4):71-98, 2011. URL: https://doi.org/10.2753/MIS0742-1222270403.
  3. Ofer Arazy, Lisa Yeo, and Oded Nov. Stay on the wikipedia task: When task-related disagreements slip into personal and procedural conflicts. Journal of the American Society for Information Science and Technology, 64(8):1634-1648, 2013. URL: https://doi.org/10.1002/ASI.22869.
  4. Yochai Benkler, Aaron Shaw, and Benjamin Mako Hill. Peer production: A form of collective intelligence. Handbook of collective intelligence, 175, 2015. Google Scholar
  5. Subhayan Bhattacharya, Sankhamita Sinha, and Sarbani Roy. Impact of structural properties on network structure for online social networks. Procedia Computer Science, 167:1200-1209, 2020. Google Scholar
  6. Ulrik Brandes, Patrick Kenis, Jürgen Lerner, and Denise Van Raaij. Network analysis of collaboration structure in wikipedia. In Proceedings of the 18th international conference on World wide web, pages 731-740, 2009. URL: https://doi.org/10.1145/1526709.1526808.
  7. Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE transactions on knowledge and data engineering, 30(9):1616-1637, 2018. URL: https://doi.org/10.1109/TKDE.2018.2807452.
  8. Paul Erdos. On random graphs. Mathematicae, 6:290-297, 1959. Google Scholar
  9. Sean Falconer, Tania Tudorache, and Natalya F Noy. An analysis of collaborative patterns in large-scale ontology development projects. In Proceedings of the sixth international conference on Knowledge capture, pages 25-32, 2011. Google Scholar
  10. Dieter Fensel, U Simsek, Kevin Angele, Elwin Huaman, Elias Kärle, Oleksandra Panasiuk, Ioan Toma, Jürgen Umbrich, and Alexander Wahler. Knowledge graphs. Springer, 2020. Google Scholar
  11. Anna Filippova and Hichang Cho. The effects and antecedents of conflict in free and open source software development. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pages 705-716, 2016. Google Scholar
  12. David N Fisher, Matthew J Silk, and Daniel W Franks. The perceived assortativity of social networks: methodological problems and solutions. Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment, pages 1-19, 2017. Google Scholar
  13. Yérali Gandica, Joäo Carvalho, and F Sampaio Dos Aidos. Wikipedia editing dynamics. Physical Review E, 91(1):012824, 2015. Google Scholar
  14. Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. Quantifying controversy on social media. ACM Transactions on Social Computing, 1(1):1-27, 2018. URL: https://doi.org/10.1145/3140565.
  15. Vicenç Gómez, Andreas Kaltenbrunner, and Vicente López. Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th international conference on World Wide Web, pages 645-654, 2008. URL: https://doi.org/10.1145/1367497.1367585.
  16. Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. Github discussions: An exploratory study of early adoption. Empirical Software Engineering, 27(1):1-32, 2022. Google Scholar
  17. Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge graphs. Synthesis Lectures on Data, Semantics, and Knowledge, 12(2):1-257, 2021. Google Scholar
  18. Myshkin Ingawale, Amitava Dutta, Rahul Roy, and Priya Seetharaman. The small worlds of wikipedia: implications for growth, quality and sustainability of collaborative knowledge networks. AMCIS 2009 Proceedings, 2009. Google Scholar
  19. Aditya Joshi, Pushpak Bhattacharyya, and Mark J Carman. Automatic sarcasm detection: A survey. ACM Computing Surveys (CSUR), 50(5):1-22, 2017. URL: https://doi.org/10.1145/3124420.
  20. Timothy Kanke. Knowledge curation work in wikidata wikiproject discussions. Library Hi Tech, 2020. Google Scholar
  21. Samantha Kanza, Alex Stolz, Martin Hepp, and Elena Simperl. What Does an Ontology Engineering Community Look Like? A Systematic Analysis of the schema. org Community. In European Semantic Web Conference, pages 335-350. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-93417-4_22.
  22. Mutasem Khalil Alsmadi, Khairuddin Bin Omar, Shahrul Azman Noah, and Ibrahim Almarashdah. Performance comparison of multi-layer perceptron (back propagation, delta rule and perceptron) algorithms in neural networks. In 2009 IEEE International Advance Computing Conference, pages 296-299. IEEE, 2009. Google Scholar
  23. Aniket Kittur and Robert E Kraut. Beyond wikipedia: coordination and conflict in online production groups. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 215-224, 2010. URL: https://doi.org/10.1145/1718918.1718959.
  24. Aniket Kittur, Bongwon Suh, Bryan A Pendleton, and Ed H Chi. He says, she says: conflict and coordination in wikipedia. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 453-462, 2007. URL: https://doi.org/10.1145/1240624.1240698.
  25. Elisavet Koutsiana, Gabriel Maia Rocha Amaral, Neal Reeves, Albert Meroño-Peñuela, and Elena Simperl. An analysis of discussions in collaborative knowledge engineering through the lens of wikidata. Journal of Web Semantics, 78:100799, 2023. URL: https://doi.org/10.1016/J.WEBSEM.2023.100799.
  26. Elisavet Koutsiana, Tushita Yadav, Nitisha Jain, Albert Meroño-Peñuela, and Elena Simperl. Agreeing and disagreeing in collaborative knowledge graph construction: An analysis of wikidata, 2023. URL: https://doi.org/10.48550/arXiv.2306.11766.
  27. Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th international conference on world wide web, pages 857-866, 2017. Google Scholar
  28. Jure Leskovec and Eric Horvitz. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web, pages 915-924, 2008. URL: https://doi.org/10.1145/1367497.1367620.
  29. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015. Google Scholar
  30. Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z Sheng, Hui Xiong, and Leman Akoglu. A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering, 2021. Google Scholar
  31. Paolo Massa. Social networks of wikipedia. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, pages 221-230, 2011. URL: https://doi.org/10.1145/1995966.1995996.
  32. Patrick E McKnight and Julius Najab. Mann-whitney u test. The Corsini encyclopedia of psychology, pages 1-1, 2010. Google Scholar
  33. Iraklis Moutidis and Hywel TP Williams. Community evolution on stack overflow. Plos one, 16(6):e0253010, 2021. Google Scholar
  34. Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, and Markus Luczak-Rösch. Peer-production system or collaborative ontology engineering effort: What is wikidata? In Proceedings of the 11th International Symposium on Open Collaboration, pages 1-10, 2015. URL: https://doi.org/10.1145/2788993.2789836.
  35. Mark EJ Newman and Juyong Park. Why social networks are different from other types of networks. Physical review E, 68(3):036122, 2003. Google Scholar
  36. Jakob Nielsen. The 90-9-1 Rule for Participation Inequality in Social Media and Online Communities. Nielsen Norman Group, 2006. URL: https://www.nngroup.com/articles/participation-inequality/.
  37. Avinash Patil, Kihwan Han, and Sabyasachi Mukhopadhyay. A comparative study of Text Embedding Models for Semantic Text Similarity in Bug Reports, 2023. URL: https://doi.org/10.48550/arXiv.2308.09193.
  38. Guangyuan Piao and Weipeng Huang. Learning to predict the departure dynamics of wikidata editors. In The Semantic Web-ISWC 2021: 20th International Semantic Web Conference, ISWC 2021, Virtual Event, October 24-28, 2021, Proceedings 20, pages 39-55. Springer, 2021. Google Scholar
  39. Alessandro Piscopo and Elena Simperl. Who Models the World? Collaborative Ontology Creation and User Roles in Wikidata. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW):1-18, 2018. URL: https://doi.org/10.1145/3274410.
  40. Tahereh Pourhabibi, Kok-Leong Ong, Booi H Kam, and Yee Ling Boo. Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems, 133:113303, 2020. URL: https://doi.org/10.1016/J.DSS.2020.113303.
  41. Agnieszka Rychwalska, Szymon Talaga, and Karolina Ziembowicz. Quality in peer production systems-impact of assortativity of communication networks on group efficacy. Social Networking and Communities, 2020. Google Scholar
  42. Cristina Sarasua, Alessandro Checco, Gianluca Demartini, Djellel Difallah, Michael Feldman, and Lydia Pintscher. The evolution of power and standard wikidata editors: comparing editing behavior over time to predict lifespan and volume of edits. Computer Supported Cooperative Work (CSCW), 28(5):843-882, 2019. URL: https://doi.org/10.1007/S10606-018-9344-Y.
  43. Jodi Schneider, Alexandre Passant, and John G Breslin. A content analysis: How Wikipedia talk pages are used. In The Web Science Conference 2010 (WebSci '10). Raleigh, North Carolina, USA, 2010. Google Scholar
  44. Elena Simperl and Markus Luczak-Rösch. Collaborative ontology engineering: a survey. The Knowledge Engineering Review, 29(1):101-131, 2014. URL: https://doi.org/10.1017/S0269888913000192.
  45. Thamar Solorio, Ragib Hasan, and Mainul Mizan. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media, pages 59-68, 2013. Google Scholar
  46. Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, 2018. Google Scholar
  47. Diego Sáez Trumper and Lydia Pintscher. Research:Identifying Controversial Content in Wikidata. https://meta.wikimedia.org/wiki/Research:Identifying_Controversial_Content_in_Wikidata, 2021. [Online; accessed 10-January-2023].
  48. Mattia Tantardini, Francesca Ieva, Lucia Tajoli, and Carlo Piccardi. Comparing methods for comparing networks. Scientific reports, 9(1):17557, 2019. Google Scholar
  49. Johan Ugander, Brian Karrer, Lars Backstrom, and Cameron Marlow. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011. URL: https://arxiv.org/abs/1111.4503.
  50. Antony Ugoni and Bruce F Walker. The chi square test: an introduction. COMSIG review, 4(3):61, 1995. Google Scholar
  51. Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha Talukdar. Composition-based multi-relational graph convolutional networks. arXiv preprint arXiv:1911.03082, 2019. URL: https://arxiv.org/abs/1911.03082.
  52. Fernanda B Viegas, Martin Wattenberg, Jesse Kriss, and Frank Van Ham. Talk before you type: Coordination in Wikipedia. In 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07), pages 78-78. IEEE, 2007. Google Scholar
  53. Denny Vrandecic. The rise of wikidata. IEEE Intelligent Systems, 28(4):90-95, 2013. URL: https://doi.org/10.1109/MIS.2013.119.
  54. Denny Vrandečić and Markus Krötzsch. Wikidata: A Free Collaborative Knowledge Base. Communications of the ACM, 57(10):78-85, 2014. URL: https://doi.org/10.1145/2629489.
  55. Christian Wagner. Wiki: A technology for conversational knowledge management and group collaboration. Communications of the association for information systems, 13(1):19, 2004. URL: https://doi.org/10.17705/1CAIS.01319.
  56. Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. nature, 393(6684):440-442, 1998. Google Scholar
  57. Eva Zangerle, Wolfgang Gassler, Martin Pichl, Stefan Steinhauser, and Günther Specht. An empirical evaluation of property recommender systems for wikidata and collaborative knowledge bases. In Proceedings of the 12th International Symposium on Open Collaboration, pages 1-8, 2016. URL: https://doi.org/10.1145/2957792.2957804.
  58. Bin Zhou, Xin Lu, and Petter Holme. Universal evolution patterns of degree assortativity in social networks. Social Networks, 63:47-55, 2020. URL: https://doi.org/10.1016/J.SOCNET.2020.04.004.
  59. HJ Zhou, TT Shen, XL Liu, YR Zhang, Peng Guo, and Jianjun Zhang. Survey of knowledge graph approaches and applications. Journal on Artificial Intelligence, 2(2):89-101, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail