Dataset and Development of Learning Analytic Tool to Extract Manifestations of Students’ Agency from Texts of Comments from MOOCs

Keywords: MOOC, learning analytics, student agency, sentiment analysis, unigrams, bigrams, topic modeling


The study is devoted to the automatic identification of manifestations of various components and sources of student agency from the texts of reviews of MOOCs, as well as descriptions of internal and external transformation among students in the process of studying MOOCs. To extract descriptions corresponding to individual, relational and contextual sources of students’ agency, a dataset of 3445 English-language comments on the most popular mathematics courses presented on the Udemy platform was generated, and additionally 1787 comments on practice-oriented MOOCs and entrepreneurship MOOCs were extracted to understand the descriptions corresponding manifestation of internal and external transformation in MOOC listeners. The paper proposes a methodological approach based on the use of natural language processing methods such as topic modeling, sentiment analysis and N-gram frequency analysis for extracting keywords and their combinations from MOOCs’ comments texts to describe the manifestation of the components of an individual source of student agency in the form of self-efficacy , increased sense of confidence in solving problems and motivation; components of the relational source in the form of support and accompaniment of the online course by the tutor with the help of quick answers and well-structured educational material; components of the contextual source in the form of the ability to make decisions when choosing alternative online courses, as well as descriptions of the manifestation of internal transformation of students, expressed in the transition from internal struggle - overcoming fears, uncertainty, difficulties in perceiving MOOC content to understanding the purpose of learning and external transformation, expressed in the texts of comments on MOOCs in the form of creating a new or changing the structure of an existing product, startup or business through a change in thinking.


Download data is not yet available.


Ahearn L.M. (2001) Language and Agency. Annual Review of Anthropology, vol. 30, no 1, pp. 109–137.

Alhazmi H. (2022) Detection of Students’ Problems in Distance Education Using Topic Modeling and Machine Learning. Future Internet, vol. 14, no 6, Article no 170.

Allington R.L. (1980) Teacher Interruption Behaviors during Primary-Grade Oral Reading. Journal of Educational Psychology, vol. 72, no 3, pp. 371–377.

Andre J. (2021) Can Learning Analytics Increase Agency and Transform Digital Learning? Paper presented at the conference "Digital Transformation and Higher Education: When Challenges are Opportunities" (September, 2021, Hanoi, Vietnam).

Bandura A. (1999) Social Cognitive Theory: An Agentic Perspective. Asian Journal of Social Psychology, vol. 2, no 1, pp. 21–41.

Bandura A. (1986) Social Foundations of Thought and Action: A Social Cognitive Theory. Englewood Cliffs, NJ: Prentice Hall.

Chen Y., Yu B., Zhang X., Yu Y. (2016) Topic Modeling for Evaluating Students' Reflective Writing. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge — LAK '16 (Edinburgh, UK, 2016, April 25–29).

Chen X., Zou D., Cheng G., Xie H. (2020) Detecting Latent Topics and Trends in Educational Technologies over Four Decades Using Structural Topic Modeling: A Retrospective of All Volumes of Computers & Education. Computers & Education, vol. 151, July, Article no 103855.

Copur-Gencturk Y., Choi H., Cohen A. (2022) Investigating Teachers’ Understanding through Topic Modeling: A Promising Approach to Studying Teachers’ Knowledge. Journal of Mathematics Teacher Education, vol. 26, pp. 281–302.

Devi S., Dhavale C., Moharkar L., Khanvilkar S. (2022) Impact of Online Education and Sentiment Analysis from Twitter Data Using Topic Modeling Algorithms. International Journal of Applied Sciences and Smart Technologies, vol. 4, no 1, pp. 21–34.

Dewey J. (1922) Human Nature and Conduct: An Introduction to Social Psychology. New York, NY: Henry Holt and Co.

Duranti A. (2005) Agency in Language. A Companion to Linguistic Anthropology (ed. A. Duranti), Malden, MA: Blackwell, pp. 449–473.

Ekin C.Ç., Çakici M., Şener E., Türker S., Altanlar S. (2021) Research Trends Analysis in Educational Journal Publications on COVID-19 Using Descriptive and Text Mining Methods: Preliminary Analysis. European Journal of Science and Technology, iss. 29, pp. 432–437.

Fogle L.W., King K.A. (2013) Child Agency and Language Policy in Transnational Families. Issues in Applied Linguistics, vol. 19.

Grootendorst M. (2022) BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure.

Gurcan F., Dalveren G.G., Derawi M. (2022) COVID-19 and e-Learning: An Exploratory Analysis of Research Topics and Interests in e-Learning during the Pandemic. IEEE Access, vol. 10, pp. 123349–123357.

He J., Rubinstein B.I.P., Bailey J., Zhang R., Milligan S. (2017) TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs. ArXiv: 607.08720v2.

Heilala V. (2022) Learning Analytics with Learning and Analytics: Advancing Student Agency Analytics (PhD Thesis), Jyväskylä: University of Jyväskylä.

Jääskelä P., Heilala V., Kärkkäinen T., Häkkinen P. (2020) Student Agency Analytics: Learning Analytics as a Tool for Analysing Student Agency in Higher Education. Behaviour & Information Technology, vol. 40, no 8, pp. 790–808.

Jääskelä P., Poikkeus A., Vasalampi K., Valleala U.M., Rasku-Puttonen H. (2016) Assessing Agency of University Students: Validation of the AUS scale. Studies in Higher Education, vol. 42, no 11, pp. 2061–2079.

Johnston P.H. (2004) Choice Words: How Our Language Affects Children's Learning. Portland, ME: Stenhouse.

Kandula S., Curtis D., Hill B., Zeng-Treitler Q. (2011) Use of Topic Modeling for Recommending Relevant Education Material to Diabetic Patients. AMIA Annual Symposium Proceedings Archive, pp. 674–682.

Kastrati Z., Dalipi F., Imran A.S., Nuci K.P., Wani M.A. (2021) Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, vol. 11, no 9, Article no 3986.

Kim J. (2022) Analyzing ‘Student Agency’ Embedded in the Discourse of Future Education: (Re)interpretation of the OECD Education 2030 Project. The Journal of Curriculum Studies, vol. 40, no 2, pp. 181–202.

Korshunov A., Gomzin A. (2012) Tematicheskoe modelirovanie tekstov na estestvennom yazyke [Topic Modeling in Natural Language Texts]. Proceedings of the Institute for System Programming of the RAS, vol. 23.

Leadbeater C. (2017) Student Agency: Learning to Make a Difference. East Melbourne, VIC: The Centre for Strategic Education.

Li Y., Zheng Y., Bao H., Liu Y. (2015) Towards Better Understanding of Hot Topics in Online Learning Communities. Smart Learning Environments, vol. 2, no 1, Article no 12.

Littleton K., Taylor S., Eteläpelto A. (2011) Special Issue Introduction: Creativity and Creative Work in Contemporary Working Contexts. Vocations and Learning, vol. 5, no 1, pp. 1–4.

Mameli C., Passini S. (2018) Development and Validation of an Enlarged Version of the Student Agentic Engagement Scale. Journal of Psychoeducational Assessment, vol. 37, no 4, pp. 450–463.

Matos L., Reeve J., Herrera D., Claux M. (2018) Students' Agentic Engagement Predicts Longitudinal Increases in Perceived Autonomy-Supportive Teaching: The Squeaky Wheel Gets the Grease. The Journal of Experimental Education, vol. 86, no 4, pp. 579–596.

McCauley L., King K. (2021) Human-Centered Learning and Student Agency: “Think Big, Start Small and Act Fast”. Available at: (accessed 20 February 2024).

Molavi M., Tavakoli M., Kismihók G. (2020) Extracting Topics from Open Educational Resources. Addressing Global Challenges and Quality Education. EC-TEL 2020. Lecture Notes in Computer Science (eds C. Alario-Hoyos, M.J. Rodríguez-Triana, M. Scheffel, I. Arnedillo-Sánchez, S.M. Dennerlein), vol. 12315. Cham: Springer, pp. 455–460.

Nazari A., Hossennia M., Garmaroudi G., Torkian S. (2023) Social Media and Mental Health in Students: A Cross-Sectional Study during the COVID-19 Pandemic.

Nogueira F.D. (2017) Reassembling the Social: An Introduction to Actor-Network Theory, Oxford university press, 2005. Formação (Online), vol. 1, no 25, pp. 229–233.

Saarela M., Heilala V., Jaaskela P., Rantakaulio A., Karkkainen T. (2021) Explainable Student Agency Analytics. IEEE Access, vol. 9, pp. 137444–137459.

Sorokin P.S. (2021) "Transformiruyushchaya agentnostʼ" kak predmet sotsiologicheskogo analiza: sovremennye diskussii i rolʼ obrazovaniya [“Transformative Agency” as an Object of Sociological Analysis: Contemporary Discussions and the Role of Education]. RUDN Journal of Sociology, vol. 21, no 1, pp. 124–138.

Sorokin P.S., Zykova A.V. (2021) "Transformiruyushchaya agentnostʼ" kak predmet issledovaniy i razrabotok v XXI veke: obzor i interpretatsiya mezhdunarodnogo opyta ["Transformative Agency" as a Subject of Research and Development in the 21st Century: A Review and Interpretation of International Experience]. Monitoring of Public Opinion: Economic and Social Changes, no 5, pp. 216–241. 10.14515/monitoring.2021.5.1858

Tadeo D.J., Yoo J. (2022) Topic Modeling of the Student Emails Sent before and during the Birth of COVID-19 in Physics and Math Classes. Eurasia Journal of Mathematics, Science and Technology Education, vol. 18, no 10, Artocle no em2167.

Vaughn M. (2020) What Is Student Agency and Why Is It Needed Now More Than Ever? Theory Into Practice, vol. 59, no 2, pp. 109–118.

Vygotsky L. (1980) Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA; London: Harvard University.

Waheeb S.A., Khan N.A., Shang X. (2022) Topic Modeling and Sentiment Analysis of Online Education in the COVID-19 Era Using Social Networks Based Datasets. Electronics, vol. 11, no 5, Article no 715.

Wong A.V., Wong K., Hindle A. (2019) Tracing Forum Posts to MOOC Content using Topic Analysis. ArXiv: 1904.07307v1.

Yin B., Yuan C. (2022) Detecting Latent Topics and Trends in Blended Learning Using LDA Topic Modeling. Education and Information Technologies, vol. 27, no 9, pp. 12689–12712.

Zeiser K., Scholz C., Cirks V. (2018) Maximizing Student Agency: Implementing and Measuring Student-Centered Learning Practices. Boston, Washington, DC, Oakland: American Institutes for Research.

Zhang Z., Miao D., Gao C. (2013) Short Text Classification Using Latent Dirichlet Allocation. Journal of Computer Applications, vol. 33, no 6, pp. 1587–1590.

How to Cite
DyulichevaYulia Yu. 2024. “Dataset and Development of Learning Analytic Tool to Extract Manifestations of Students’ Agency from Texts of Comments from MOOCs”. Voprosy Obrazovaniya / Educational Studies Moscow, no. 1 (April).
Datasets in Education