Predicting Math Success in an Online Tutoring System Using Language Data and Click-Stream Variables: A Longitudinal Analysis

Authors Scott Crossley , Shamya Karumbaiah, Jaclyn Ocumpaugh, Matthew J. Labrum , Ryan S. Baker

Thumbnail PDF


  • Filesize: 335 kB
  • 13 pages

Document Identifiers

Author Details

Scott Crossley
  • Georgia State University, Applied Linguistics/ESL, Atlanta, GA, USA
Shamya Karumbaiah
  • The University of Pennsylvania, Philadelphia, PA, USA
Jaclyn Ocumpaugh
  • The University of Pennsylvania, Philadelphia, PA, USA
Matthew J. Labrum
  • Imagine Learning, Provo, UT, USA
Ryan S. Baker
  • The University of Pennsylvania, Philadelphia, PA, USA

Cite AsGet BibTex

Scott Crossley, Shamya Karumbaiah, Jaclyn Ocumpaugh, Matthew J. Labrum, and Ryan S. Baker. Predicting Math Success in an Online Tutoring System Using Language Data and Click-Stream Variables: A Longitudinal Analysis. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 25:1-25:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Previous studies have demonstrated strong links between students' linguistic knowledge, their affective language patterns and their success in math. Other studies have shown that demographic and click-stream variables in online learning environments are important predictors of math success. This study builds on this research in two ways. First, it combines linguistics and click-stream variables along with demographic information to increase prediction rates for math success. Second, it examines how random variance, as found in repeated participant data, can explain math success beyond linguistic, demographic, and click-stream variables. The findings indicate that linguistic, demographic, and click-stream factors explained about 14% of the variance in math scores. These variables mixed with random factors explained about 44% of the variance.

Subject Classification

ACM Subject Classification
  • Applied computing → Computer-assisted instruction
  • Applied computing → Mathematics and statistics
  • Computing methodologies → Natural language processing
  • Natural language processing
  • math education
  • online tutoring systems
  • text analytics
  • click-stream variables


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Jamal Abedi and Carol Lord. The Language Factor in Mathematics Tests. Applied Measurement in Education, 14(3):219-234, 2001. Google Scholar
  2. Thomasenia Lott Adams. Reading mathematics: More than words can say. The Reading Teacher, 56(8):786-795, 2003. Google Scholar
  3. Mary Alt, Genesis D Arizmendi, and Carole R Beal. The relationship between mathematics and language: Academic implications for children with specific language impairment and English language learners. Language, speech, and hearing services in schools, 45(3):220-233, 2014. Google Scholar
  4. Nathaniel Anozie and Brian W Junker. Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. Technical report, Educational Data Mining: Papers from the AAAI Workshop. Menlo Park, CA: AAAI Press, 2006. Google Scholar
  5. David A Balota, Melvin J Yap, Keith A Hutchison, Michael J Cortese, Brett Kessler, Bjorn Loftis, James H Neely, Douglas L Nelson, Greg B Simpson, and Rebecca Treiman. The English lexicon project. Behavior research methods, 39(3):445-459, 2007. Google Scholar
  6. Kamil Barton. MuMIn: Multi-Model Inference, 2018. URL:
  7. Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1):1-48, 2015. Google Scholar
  8. Carole R Beal, Rena Walles, Ivon Arroyo, and Beverly P Woolf. On-line tutoring for math achievement testing: A controlled evaluation. Journal of Interactive Online Learning, 6(1):43-55, 2007. Google Scholar
  9. Marc Brysbaert and Boris New. Subtlexus: American word frequencies., 2009.
  10. Scott Crossley, Tiffany Barnes, Collin Lynch, and Danielle S McNamara. Linking Language to Math Success in an On-Line Course. In Proceedings of the 10th International Conference on Educational Data Mining, pages 180-185, Wuhan, China, 2017. Google Scholar
  11. Scott Crossley and Victor Kostyuk. Letting the Genie out of the Lamp: Using Natural Language Processing tools to predict math performance. In International Conference on Language, Data and Knowledge, pages 330-342. Springer, 2017. Google Scholar
  12. Scott Crossley, Ran Liu, and Danielle McNamara. Predicting math performance using natural language processing tools. In Proceedings of the Seventh International Learning Analytics &Knowledge Conference, pages 339-347. ACM, 2017. Google Scholar
  13. Scott Crossley, Jaclyn Ocumpaugh, Matthew Labrum, Franklin Bradfield, Mihai Dascalu, and Ryan S Baker. Modeling Math Identity and Math Success through Sentiment Analysis and Linguistic Features. In International Educational Data Mining Society. ERIC, 2018. Google Scholar
  14. Scott A. Crossley, Kristopher Kyle, and Mihai Dascalu. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51(1):14-27, 2019. Google Scholar
  15. Scott A Crossley, Kristopher Kyle, and Danielle S McNamara. Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior research methods, 49(3):803-821, 2017. Google Scholar
  16. Scott A Crossley, Maria-Dorinela Sirbu, Mihai Dascalu, Tiffany Barnes, Collin F Lynch, and Danielle S McNamara. Modeling Math Success Using Cohesion Network Analysis. In International Conference on Artificial Intelligence in Education, pages 63-67. Springer, 2018. Google Scholar
  17. Mark Davies. The 385+ million word Corpus of Contemporary American English (1990-2008+): Design, architecture, and linguistic insights. International journal of corpus linguistics, 14(2):159-190, 2009. Google Scholar
  18. Mingyu Feng, Neil T Heffernan, and Kenneth R Koedinger. Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. In International conference on intelligent tutoring systems, pages 31-40. Springer, 2006. Google Scholar
  19. Pier Luigi Ferrari. Mathematical Language and Advanced Mathematics Learning. International Group for the Psychology of Mathematics Education, 2004. Google Scholar
  20. Gillian Hampden-Thompson, Gail Mulligan, Akemi Kinukawa, and Tamara Halle. Mathematics Achievement of Language-Minority Students During the Elementary Years. Research report, The University of York, Washington, DC, 2008. Google Scholar
  21. Neil T Heffernan, Kenneth R Koedinger, Brian W Junker, and Steven Ritter. Using Web-based cognitive assessment systems for predicting student performance on state exams. Research proposal to the Institute of Educational Statistics, US Department of Education. Department of Computer Science at Worcester Polytechnic Institute, Massachusetts, 2001. Google Scholar
  22. Federico Hernandez. The Relationship Between Reading and Mathematics Achievement of Middle School Students as Measured by the Texas Assessment of Knowledge and Skills. PhD thesis, University of Houston, 2013. Google Scholar
  23. Arnon Hershkovitz, Ryan Shaun Joazeiro de Baker, Janice Gobert, Michael Wixon, and Michael Sao Pedro. Discovery with models: A case study on carelessness in computer-based science inquiry. American Behavioral Scientist, 57(10):1480-1499, 2013. Google Scholar
  24. George A Khachatryan, Andrey V Romashov, Alexander R Khachatryan, Steven J Gaudino, Julia M Khachatryan, Konstantin R Guarian, and Nataliya V Yufa. Reasoning Mind Genie 2: An intelligent tutoring system as a vehicle for international transfer of instructional methods in mathematics. International Journal of Artificial Intelligence in Education, 24(3):333-382, 2014. Google Scholar
  25. Alexandra Kuznetsova, Per B Brockhoff, and Rune Haubo Bojesen Christensen. lmerTest package: tests in linear mixed effects models. Journal of Statistical Software, 82(13):1-26, 2017. Google Scholar
  26. Kristopher Kyle, Scott Crossley, and Cynthia Berger. The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behavior research methods, 50(3):1030-1046, 2018. Google Scholar
  27. Kristopher Kyle and Scott A Crossley. Measuring Syntactic Complexity in L2 Writing Using Fine-Grained Clausal and Phrasal Indices. The Modern Language Journal, 102(2):333-349, 2018. Google Scholar
  28. Jo-Anne LeFevre, Lisa Fast, Sheri-Lynn Skwarchuk, Brenda L Smith-Chant, Jeffrey Bisanz, Deepthi Kamawar, and Marcie Penner-Wilger. Pathways to mathematics: Longitudinal predictors of performance. Child development, 81(6):1753-1767, 2010. Google Scholar
  29. Mollie MacGregor and Elizabeth Price. An exploration of aspects of language proficiency and algebra learning. Journal for Research in Mathematics Education, 30:449-467, 1999. Google Scholar
  30. Maria Martiniello. Language and the performance of English-language learners in math word problems. Harvard Educational Review, 78(2):333-368, 2008. Google Scholar
  31. Maria Martiniello. Linguistic complexity, schematic representations, and differential item functioning for English language learners in math tests. Educational assessment, 14(3-4):160-179, 2009. Google Scholar
  32. William L Miller, Ryan S Baker, Matthew J Labrum, Karen Petsche, Yu-Han Liu, and Angela Z Wagner. Automated detection of proactive remediation by teachers in Reasoning Mind classrooms. In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pages 290-294. ACM, 2015. Google Scholar
  33. Douglas L Nelson, Cathy L McEvoy, and Thomas A Schreiber. The University of South Florida word association, rhyme, and word fragment norms, 1998. Google Scholar
  34. Jaclyn Ocumpaugh, Maria Ofelia San Pedro, Huei-yi Lai, Ryan S Baker, and Fred Borgen. Middle school engagement with mathematics software and later interest and self-efficacy for STEM careers. Journal of Science Education and Technology, 25(6):877-887, 2016. Google Scholar
  35. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. URL:
  36. Steven Ritter, Ambarish Joshi, Stephen Fancsali, and Tristan Nixon. Predicting standardized test scores from Cognitive Tutor interactions. In Proceedings of the International Conference on Educational Data Mining, 2013. Google Scholar
  37. Maria Ofelia San Pedro, Jaclyn Ocumpaugh, Ryan S Baker, and Neil T Heffernan. Predicting STEM and Non-STEM College Major Enrollment from Middle School Interaction with Mathematics Educational Software. In Proceedings of the International Conference on Educational Data Mining, pages 276-279, 2014. Google Scholar
  38. Maria Ofelia Clarissa Z San Pedro, Ryan SJ d Baker, and Ma Mercedes T Rodrigo. Detecting carelessness through contextual estimation of slip probabilities among students using an intelligent tutor for mathematics. In International Conference on Artificial Intelligence in Education, pages 304-311. Springer, 2011. Google Scholar
  39. Maria Ofelia Z San Pedro, Ryan SJ d Baker, and Ma Mercedes T Rodrigo. Carelessness and affect in an intelligent tutoring system for mathematics. International Journal of Artificial Intelligence in Education, 24(2):189-210, 2014. Google Scholar
  40. David Tall. Thinking Through Three Worlds of Mathematics. International Group for the Psychology of Mathematics Education, 2004. Google Scholar
  41. Rose K Vukovic and Nonie K Lesaux. The relationship between linguistic skills and arithmetic knowledge. Learning and Individual Differences, 23:87-91, 2013. Google Scholar
  42. Jun Xie, Alfred Essa, Shirin Mojarad, Ryan S Baker, Keith Shubeck, and Xiangen Hu. Student learning strategies and behaviors to predict success in an online adaptive mathematics tutoring system. In Proceedings of the International Conference on Educational Data Mining, pages 460-465, 2017. Google Scholar