Back

A Study on Educational Text Analysis Using Topic Modeling and Generalized Linear Models

Yi-Hsiung Lin*|Postdoctoral Research Fellow, Office of Sustainability and Strategic Development, National Cheng Kung University, Tainan City, Taiwan|11408046@gs.ncku.edu.tw
Chien-Chih Lai|Project Manager, Center for Teaching and Learning, National University of Kaohsiung, Kaohsiung City, Taiwan|jjlai@nuk.edu.tw
Ren-Yu Huang|Project Assistant, Center for Teaching and Learning, National University of Kaohsiung, Kaohsiung City, Taiwan|ksu0001@nuk.edu.tw
Chih-Hung Chang|Distinguished Professor, Department of Applied Mathematics, National University of Kaohsiung, Kaohsiung City, Taiwan|chchang@nuk.edu.tw
*Corresponding author

▌Abstract

The application of topic modeling in educational text analysis is a critical area of natural language processing. With the digitization of educational resources, effectively organizing, retrieving, and analyzing large volumes of educational text data has become a significant challenge for researchers. Topic modeling is a statistical method that automatically extracts key concepts from texts, assisting educators and administrators in making data-driven decisions. In recent years, with advancements in artificial intelligence and machine learning, topic modeling has gained increasing attention in the educational domain, particularly in the context of digital learning platforms and the widespread evaluation of online educational re- sources. This study employs a web crawler to collect data related to college students' learning evaluations from online discussion platforms. By using topic modeling for automated analysis, it reduces the time cost of manual annotation and enhances data processing efficiency. Latent Dirichlet Allocation (LDA), a widely used probabilistic topic modeling method, has been extensively applied in text classification, sentiment analysis, and knowledge management. This study explores the application of LDA in educational text analysis by constructing topic models to identify key themes in student feedback, course evaluations, and subject content. Furthermore, LDA can be integrated with generalized linear models to examine the relationship between topic analysis and the quantification of student learning outcomes. The fi ndings demonstrate that LDA effectively extracts core content from educational texts. Moreover, when combined with generalized linear models, it provides valuable insights for educational decision-makers, enabling more informed and data-driven decisions. The appendix of this paper provides the Python source code used for topic modeling analysis of the Chinese and English text data, and R source code for Generalized Linear Model, allowing interested readers to utilize it.

Keywords: Topic Modeling, Educational Texts, Latent Dirichlet Allocation, Generalized Linear Model