The Role of Context in Scenario-Based Tasks for Measuring Universal Skills: The Use of Generalizability Theory
Abstract
In education, much attention is paid to the development and evaluation of universal skills in schoolchildren. At the same time, the assessment of universal skills requires new test formats based on the observed actions of the student in the digital environment. Scenario-based contextual tasks serve as a promising format. However, the contextual diversity of such tasks can make it difficult to compare results obtained from different scenario tasks. This article aims to analyze the role of scenario task context in measuring two universal skills: critical thinking and communication. The work uses the methods of Generalizability Theory, which allows to analyze to what extent the results can be generalized for other contexts of scenario tasks, and how, by changing the number of indicators or scenario contexts, to ensure satisfied measurement reliability. The study is based on data from fourth-grade students who were tested with various scenario-based tasks of the “4K” instrument. The results of the analysis showed that the behavior of the test-takers differs in scenarios with different contexts, while the difficulties of the contexts are almost the same. To achieve satisfactory reliability, it is recommended to use at least two scenarios with different contexts, and the use of three or more scenarios with different contexts will reduce the number of indicators without loss of reliability. Also, the study evaluated the role of context when using alternative scenario-based tasks forms were used. The alternative forms were similar in the main problem and plot of the scenario, but differed in topic (content). Changing only the content of the scenario makes it possible to generalize the results across scenario forms, that is, alternative forms can be used interchangeably. This study demonstrates how Generalization Theory can be used to optimize the development of tasks, taking into account the requirements for measurement reliability.
Downloads
References
Arterberry B.J., Martens M.P., Cadigan J.M., Rohrer D. (2014) Application of Generalizability Theory to the Big Five Inventory. Personality and Individual Differences, vol. 69, October, pp. 98–103. https://doi.org/10.1016/j.paid.2014.05.015
Barnett S.M., Ceci S.J. (2002) When and Where Do We Apply What We Learn?: A Taxonomy for Far Transfer. Psychological Bulletin, vol. 128, no 4, pp. 612–637. https://doi.org/10.1037/0033-2909.128.4.612
Bouwer R., Béguin A., Sanders T., van den Bergh H. (2015) Effect of Genre on the Generalizability of Writing Scores. Language Testing, vol. 32, no 1, pp. 83–100. https://doi.org/10.1177/0265532214542994
Brennan R.L. (1992) Generalizability Theory. Educational Measurement: Issues and Practice, vol. 11, no 4, pp. 27–34. https://doi.org/10.1111/j.1745-3992.1992.tb00260.x
Briesch A.M., Swaminathan H., Welsh M., Chafouleas S.M. (2014) Generalizability Theory: A Practical Guide to Study Design, Implementation, and Interpretation. Journal of School Psychology, vol. 52, no 1, pp. 13–35. https://doi.org/10.1016/j.jsp.2013.11.008
Brun I.V., Orel E.A., Uglanova I.L. (2020) Izmerenie kreativnosti i kriticheskogo myshleniya v nachal’noy shkole [Measuring Creativity and Critical Thinking in Primary School]. Psikhologicheskij zhurnal, vol. 41, no 6, pp. 96–107. https://doi .org/10.31857/S020595920011124-2
Buyukkidik S., Anil D. (2015) Investigation of Reliability in Generalizability Theory with Different Designs on Performance-Based Assessment. Education and Science, vol. 40, no 117, pp. 285–296. http://dx.doi.org/10.15390/EB.2015.2454
Cronbach L.J., Gleser G.C., Nanda H., Rajaratnam N. (1972) The Dependability of Behavioral Measurements. New York, NY: Wiley.
Davier von A.A., Mislevy R.J., Hao J. (eds) (2021) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment: With Examples in R and Python. Cham: Springer International. https://doi.org/10.1007/978-3-030-74394-9
Engelhardt P.V. (2009) An Introduction to Classical Test Theory as Applied to Conceptual Multiple-Choice Tests. Getting Started in PER (eds C. Henderson, K. Harper), College Park, MD: American Association of Physics Teachers, pp. 1–40.
Gracheva D.A. (2022) Analiz sopostavimosti izmereniya metapredmetnykh navykov v tsifrovoy srede [Analysis of Task Comparability in Digital Environment by the Case of Metacognitive Skills]. Psikhologicheskaya nauka i obrazovanie / Psychological Science and Education, vol. 27, no 6, pp. 57–67. https://doi.org/10.17759/pse.2022270605
Gracheva D.A, Tarasova K.V. (2022) Podkhody k razrabotke variantov zadaniy scenarnogo tipa v ramkakh metoda dokazatelnoy argumentatsii [Approaches to the Development of Scenario-Based Task Forms within the Framework of Evidence-Centered Design]. Otechestvennaya i zarubezhnaya pedagogika, vol. 1, no 3, pp. 83–97. https://doi.org/ 10.24412/2224–0772–2022–84–83–97
Haladyna T.M., Downing S.M., Rodriguez M.C. (2002) A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, vol. 15, no 3, pp. 309–333. https://doi.org/10.1207/S15324818AME1503_5
Hild P., Gut C., Brückmann M. (2019) Validating Performance Assessments: Measures That May Help to Evaluate Students’ Expertise in ‘Doing Science’. Research in Science & Technological Education, vol. 37, no 4, pp. 419–445. https://doi.org/10.1080/02635143.2018.1552851
Homayounzadeh M., Saadat M., Ahmadi A. (2019) Investigating the Effect of Source Characteristics on Task Comparability in Integrated Writing Tasks. Assessing Writing, vol. 41, no 2, pp. 25–46. https://doi.org/10.1016/j.asw.2019.05.003
Hooijdonk van M., Mainhard T., Kroesbergen E.H., van Tartwijk J. (2022) Examining the Assessment of Creativity with Generalizability Theory: An Analysis of Creative Problem Solving Assessment Tasks. Thinking Skills and Creativity, vol. 43, no 1, Article no 100994. https://doi.org/10.1016/j.tsc.2021.100994
Huebner A., Lucht M. (2019) Generalizability theory in R. Practical Assessment, Research, and Evaluation, vol. 24, no 5. https://doi.org/10.7275/5065-gc10
Jiang Z., Skorupski W. (2018) A Bayesian Approach to Estimating Variance Components Within a Multivariate Generalizability Theory Framework. Behavior Research Methods, vol. 50, no 3,pp. 2193–2214. https://doi.org/10.3758/s13428-017-0986-3
Jorgensen T.D. (2021) How to Estimate Absolute-Error Components in Structural Equation Models of Generalizability Theory. Psych, vol. 3, no 2, pp. 113–133. https://doi.org/10.3390/psych3020011
Keller L.A., Clauser B.E., Swanson D.B. (2010) Using Multivariate Generalizability Theory to Assess the Effect of Content Stratification on the Reliability of a Performance Assessment. Advances in Health Sciences Education, vol. 15, no 5, pp. 717–733. https://doi.org/10.1007/s10459-010-9233-8
Li G., Pan Y., Wang W. (2021) Using Generalizability Theory and Many-Facet Rasch Model to Evaluate In-Basket Tests for Managerial Positions. Frontiers in Psychology, vol. 12, July, Article no 660553. https://doi.org/10.3389/fpsyg.2021.660553
Liao R.J. (2023) The Use of Generalizability Theory in Investigating the Score Dependability of Classroom-Based L2 Reading Assessment. Language Testing, vol. 40, no 1, pp. 86–106. https://doi.org/10.1177/02655322211070840
Messick S. (1994) The Interplay of Evidence and Consequences in the Validation of Performance Assessments. Educational Researcher, vol. 23, no 2, pp. 13–23. https://doi.org/10.3102/0013189X023002013
Mislevy R.J., Almond R.G., Lukas J.F. (2003) A Brief Introduction to Evidence‐Centered design. ETS Research Report Series no 2003(1). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2003.tb01908.x
Rosen Y. (2017) Assessing Students in Human‐to‐Agent Settings to Inform Collaborative Problem‐Solving Learning. Journal of Educational Measurement, vol. 54, no 1, pp. 36–53. https://doi.org/10.1111/jedm.12131
Ruiz-Primo M.A., Li M. (2015) The Relationship between Item Context Characteristics and Student Performance: The Case of the 2006 and 2009 PISA Science Items. Teachers College Record, vol. 117, no 1, pp. 1–36. https://doi.org/10.1177/016146811511700118
Shavelson R.J., Baxter G.P., Gao X. (1993). Sampling Variability of Performance Assessments. Journal of Educational Measurement, vol. 30, no 3, pp. 215–232. https://doi.org/10.1111/j.1745-3984.1993.tb00424.x
Shavelson R.J., Webb N.M., Rowley G.L. (1992) Generalizability Theory. Methodological Issues & Strategies in Clinical Research (ed. A.E. Kazdin), American Psychological Association, pp. 233–256. http://dx.doi.org/10.1037/10109-051
Uglanova I., Brun I., Vasin G. (2018) Metodologiya Evidence-Centered Design dlya izmereniya kompleksnykh psikhologicheskikh konstruktov [Evidence-Centered Design Method for Measuring Complex Psychological Constructs]. Journal of Modern Foreign Psychology, vol. 7, no 3, pp. 18–27. https://doi.org/10.17759/jmfp.2018070302
Uglanova I.L., Zhiltsova L.Y., Lebedeva M.Y. (2021) Izmerenie navykov kommunikatsii i kooperatsii v nachal'noy i sredney shkole: mogut li shkol'niki dogovorit'sya s inoplanetyaninom? [Communication and Cooperation Assessment in Primary and Middle School: How Students Negotiate with an Alien?]. Proceedings of the 5th International Conference "Informatization of Education and E-learning Methodology: Digital Technologies in Education" (Krasnoyarsk, 2022, September, 20–23), Krasnoyarsk: Siberian Federal University, pp. 682–686.
Uglanova I., Orel E., Gracheva D., Tarasova K. (2023) Computer‐Based Performance Approach for Critical Thinking Assessment in Children. British Journal of Educational Psychology, vol. 93, no. 2, pp. 531–544. https://doi.org/10.1111/bjep.12576
Uzun N.B., Aktas M., Asiret S., Yormaz S. (2018) Using Generalizability Theory to Assess the Score Reliability of Communication Skills of Dentistry Students. Asian Journal of Education and Training, vol. 4, no 2, pp. 85–90. https://doi.org/10.20448/journal.522.2018.42.85.90
Wang D., Liu H., Hau K.T. (2022) Automated and Interactive Game-Based Assessment of Critical Thinking. Education and Information Technologies, vol. 27, no 4, pp. 4553–4575. https://doi.org/10.1007/s10639-021-10777-9
Wu M.Y., Steinkrauss R., Lowie W. (2023) The Reliability of Single Task Assessment in Longitudinal L2 Writing Research. Journal of Second Language Writing, vol. 59, no 4, Article no 100950. https://doi.org/10.1016/j.jslw.2022.100950
Zhai X., Haudek K.C., Wilson C., Stuhlsatz M. (2021) A Framework of Construct-Irrelevant Variance for Contextualized Constructed Response Assessment. Frontiers in Education, vol. 6, October, Article no 751283. https://doi.org/10.3389/feduc.2021.751283