Роль контекста в заданиях сценарного типа при измерении универсальных навыков: применение теории генерализации

Daria A. Gracheva

doi:10.17323/vo-2023-16901

Daria A. Gracheva НИУ ВШЭ https://orcid.org/0000-0002-4646-7349

DOI: https://doi.org/10.17323/vo-2023-16901

Keywords: generalizability theory, universal skills, scenario-based tasks, task context, psychometrics, reliability of measurement

Abstract

In education, much attention is paid to the development and evaluation of universal skills in schoolchildren. At the same time, the assessment of universal skills requires new test formats based on the observed actions of the student in the digital environment. Scenario-based contextual tasks serve as a promising format. However, the contextual diversity of such tasks can make it difficult to compare results obtained from different scenario tasks. This article aims to analyze the role of scenario task context in measuring two universal skills: critical thinking and communication. The work uses the methods of Generalizability Theory, which allows to analyze to what extent the results can be generalized for other contexts of scenario tasks, and how, by changing the number of indicators or scenario contexts, to ensure satisfied measurement reliability. The study is based on data from fourth-grade students who were tested with various scenario-based tasks of the “4K” instrument. The results of the analysis showed that the behavior of the test-takers differs in scenarios with different contexts, while the difficulties of the contexts are almost the same. To achieve satisfactory reliability, it is recommended to use at least two scenarios with different contexts, and the use of three or more scenarios with different contexts will reduce the number of indicators without loss of reliability. Also, the study evaluated the role of context when using alternative scenario-based tasks forms were used. The alternative forms were similar in the main problem and plot of the scenario, but differed in topic (content). Changing only the content of the scenario makes it possible to generalize the results across scenario forms, that is, alternative forms can be used interchangeably. This study demonstrates how Generalization Theory can be used to optimize the development of tasks, taking into account the requirements for measurement reliability.

Downloads

Download data is not yet available.

References

Arterberry B.J., Martens M.P., Cadigan J.M., Rohrer D. (2014) Application of Generalizability Theory to the Big Five Inventory. Personality and Individual Differences, vol. 69, October, pp. 98–103. https://doi.org/10.1016/j.paid.2014.05.015

Barnett S.M., Ceci S.J. (2002) When and Where Do We Apply What We Learn?: A Taxonomy for Far Transfer. Psychological Bulletin, vol. 128, no 4, pp. 612–637. https://doi.org/10.1037/0033-2909.128.4.612

Bouwer R., Béguin A., Sanders T., van den Bergh H. (2015) Effect of Genre on the Generalizability of Writing Scores. Language Testing, vol. 32, no 1, pp. 83–100. https://doi.org/10.1177/0265532214542994

Brennan R.L. (1992) Generalizability Theory. Educational Measurement: Issues and Practice, vol. 11, no 4, pp. 27–34. https://doi.org/10.1111/j.1745-3992.1992.tb00260.x

Briesch A.M., Swaminathan H., Welsh M., Chafouleas S.M. (2014) Generalizability Theory: A Practical Guide to Study Design, Implementation, and Interpretation. Journal of School Psychology, vol. 52, no 1, pp. 13–35. https://doi.org/10.1016/j.jsp.2013.11.008

Brun I.V., Orel E.A., Uglanova I.L. (2020) Izmerenie kreativnosti i kriticheskogo myshleniya v nachal’noy shkole [Measuring Creativity and Critical Thinking in Primary School]. Psikhologicheskij zhurnal, vol. 41, no 6, pp. 96–107. https://doi .org/10.31857/S020595920011124-2

Buyukkidik S., Anil D. (2015) Investigation of Reliability in Generalizability Theory with Different Designs on Performance-Based Assessment. Education and Science, vol. 40, no 117, pp. 285–296. http://dx.doi.org/10.15390/EB.2015.2454

Cronbach L.J., Gleser G.C., Nanda H., Rajaratnam N. (1972) The Dependability of Behavioral Measurements. New York, NY: Wiley.

Davier von A.A., Mislevy R.J., Hao J. (eds) (2021) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment: With Examples in R and Python. Cham: Springer International. https://doi.org/10.1007/978-3-030-74394-9

Engelhardt P.V. (2009) An Introduction to Classical Test Theory as Applied to Conceptual Multiple-Choice Tests. Getting Started in PER (eds C. Henderson, K. Harper), College Park, MD: American Association of Physics Teachers, pp. 1–40.

Gracheva D.A. (2022) Analiz sopostavimosti izmereniya metapredmetnykh navykov v tsifrovoy srede [Analysis of Task Comparability in Digital Environment by the Case of Metacognitive Skills]. Psikhologicheskaya nauka i obrazovanie / Psychological Science and Education, vol. 27, no 6, pp. 57–67. https://doi.org/10.17759/pse.2022270605

Gracheva D.A, Tarasova K.V. (2022) Podkhody k razrabotke variantov zadaniy scenarnogo tipa v ramkakh metoda dokazatelnoy argumentatsii [Approaches to the Development of Scenario-Based Task Forms within the Framework of Evidence-Centered Design]. Otechestvennaya i zarubezhnaya pedagogika, vol. 1, no 3, pp. 83–97. https://doi.org/ 10.24412/2224–0772–2022–84–83–97

Haladyna T.M., Downing S.M., Rodriguez M.C. (2002) A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, vol. 15, no 3, pp. 309–333. https://doi.org/10.1207/S15324818AME1503_5

Hild P., Gut C., Brückmann M. (2019) Validating Performance Assessments: Measures That May Help to Evaluate Students’ Expertise in ‘Doing Science’. Research in Science & Technological Education, vol. 37, no 4, pp. 419–445. https://doi.org/10.1080/02635143.2018.1552851

Homayounzadeh M., Saadat M., Ahmadi A. (2019) Investigating the Effect of Source Characteristics on Task Comparability in Integrated Writing Tasks. Assessing Writing, vol. 41, no 2, pp. 25–46. https://doi.org/10.1016/j.asw.2019.05.003

Hooijdonk van M., Mainhard T., Kroesbergen E.H., van Tartwijk J. (2022) Examining the Assessment of Creativity with Generalizability Theory: An Analysis of Creative Problem Solving Assessment Tasks. Thinking Skills and Creativity, vol. 43, no 1, Article no 100994. https://doi.org/10.1016/j.tsc.2021.100994

Huebner A., Lucht M. (2019) Generalizability theory in R. Practical Assessment, Research, and Evaluation, vol. 24, no 5. https://doi.org/10.7275/5065-gc10

Jiang Z., Skorupski W. (2018) A Bayesian Approach to Estimating Variance Components Within a Multivariate Generalizability Theory Framework. Behavior Research Methods, vol. 50, no 3,pp. 2193–2214. https://doi.org/10.3758/s13428-017-0986-3

Jorgensen T.D. (2021) How to Estimate Absolute-Error Components in Structural Equation Models of Generalizability Theory. Psych, vol. 3, no 2, pp. 113–133. https://doi.org/10.3390/psych3020011

Keller L.A., Clauser B.E., Swanson D.B. (2010) Using Multivariate Generalizability Theory to Assess the Effect of Content Stratification on the Reliability of a Performance Assessment. Advances in Health Sciences Education, vol. 15, no 5, pp. 717–733. https://doi.org/10.1007/s10459-010-9233-8

Li G., Pan Y., Wang W. (2021) Using Generalizability Theory and Many-Facet Rasch Model to Evaluate In-Basket Tests for Managerial Positions. Frontiers in Psychology, vol. 12, July, Article no 660553. https://doi.org/10.3389/fpsyg.2021.660553

Liao R.J. (2023) The Use of Generalizability Theory in Investigating the Score Dependability of Classroom-Based L2 Reading Assessment. Language Testing, vol. 40, no 1, pp. 86–106. https://doi.org/10.1177/02655322211070840

Messick S. (1994) The Interplay of Evidence and Consequences in the Validation of Performance Assessments. Educational Researcher, vol. 23, no 2, pp. 13–23. https://doi.org/10.3102/0013189X023002013

Mislevy R.J., Almond R.G., Lukas J.F. (2003) A Brief Introduction to Evidence‐Centered design. ETS Research Report Series no 2003(1). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2003.tb01908.x

Rosen Y. (2017) Assessing Students in Human‐to‐Agent Settings to Inform Collaborative Problem‐Solving Learning. Journal of Educational Measurement, vol. 54, no 1, pp. 36–53. https://doi.org/10.1111/jedm.12131

Ruiz-Primo M.A., Li M. (2015) The Relationship between Item Context Characteristics and Student Performance: The Case of the 2006 and 2009 PISA Science Items. Teachers College Record, vol. 117, no 1, pp. 1–36. https://doi.org/10.1177/016146811511700118

Shavelson R.J., Baxter G.P., Gao X. (1993). Sampling Variability of Performance Assessments. Journal of Educational Measurement, vol. 30, no 3, pp. 215–232. https://doi.org/10.1111/j.1745-3984.1993.tb00424.x

Shavelson R.J., Webb N.M., Rowley G.L. (1992) Generalizability Theory. Methodological Issues & Strategies in Clinical Research (ed. A.E. Kazdin), American Psychological Association, pp. 233–256. http://dx.doi.org/10.1037/10109-051

Uglanova I., Brun I., Vasin G. (2018) Metodologiya Evidence-Centered Design dlya izmereniya kompleksnykh psikhologicheskikh konstruktov [Evidence-Centered Design Method for Measuring Complex Psychological Constructs]. Journal of Modern Foreign Psychology, vol. 7, no 3, pp. 18–27. https://doi.org/10.17759/jmfp.2018070302

Uglanova I.L., Zhiltsova L.Y., Lebedeva M.Y. (2021) Izmerenie navykov kommunikatsii i kooperatsii v nachal'noy i sredney shkole: mogut li shkol'niki dogovorit'sya s inoplanetyaninom? [Communication and Cooperation Assessment in Primary and Middle School: How Students Negotiate with an Alien?]. Proceedings of the 5th International Conference "Informatization of Education and E-learning Methodology: Digital Technologies in Education" (Krasnoyarsk, 2022, September, 20–23), Krasnoyarsk: Siberian Federal University, pp. 682–686.

Uglanova I., Orel E., Gracheva D., Tarasova K. (2023) Computer‐Based Performance Approach for Critical Thinking Assessment in Children. British Journal of Educational Psychology, vol. 93, no. 2, pp. 531–544. https://doi.org/10.1111/bjep.12576

Uzun N.B., Aktas M., Asiret S., Yormaz S. (2018) Using Generalizability Theory to Assess the Score Reliability of Communication Skills of Dentistry Students. Asian Journal of Education and Training, vol. 4, no 2, pp. 85–90. https://doi.org/10.20448/journal.522.2018.42.85.90

Wang D., Liu H., Hau K.T. (2022) Automated and Interactive Game-Based Assessment of Critical Thinking. Education and Information Technologies, vol. 27, no 4, pp. 4553–4575. https://doi.org/10.1007/s10639-021-10777-9

Wu M.Y., Steinkrauss R., Lowie W. (2023) The Reliability of Single Task Assessment in Longitudinal L2 Writing Research. Journal of Second Language Writing, vol. 59, no 4, Article no 100950. https://doi.org/10.1016/j.jslw.2022.100950

Zhai X., Haudek K.C., Wilson C., Stuhlsatz M. (2021) A Framework of Construct-Irrelevant Variance for Contextualized Constructed Response Assessment. Frontiers in Education, vol. 6, October, Article no 751283. https://doi.org/10.3389/feduc.2021.751283

Major Indexing

The Role of Context in Scenario-Based Tasks for Measuring Universal Skills: The Use of Generalizability Theory

Abstract

Downloads

References