Is Psychometrics So Useful for Academic Psychology?
Abstract
Psychological theories regarding ability and personality traits often rely on the results of psychometric modelling. The latter is assumed to link responses to test items to an unobserved 'construct' (trait, ability), which is 'modelled' from the test data. However, does the agreement between the data and the model indicate that the model represents a psychological construct? To what extent is ‘psychometric modelling’ modelling in the general scientific sense of the term? The validity of using modelling data to understand psychological phenomena depends on the answer to these questions. The article analyses the logic of psychometric modelling in comparison with modelling in other sciences and argues that psychological phenomena as a subject of modelling are not involved neither in the construction nor in the correction of models. The problem of unjustified interpretations of modelling results in psychology and their undesirable consequences for psychological theory is raised. At the same time, the use of psychometric modelling for human resource decision-making is still waiting for its evaluation.
Downloads
References
Ackerman T.A., Gierl M.J., Walker C.M. (2003) Using Multidimensional Item Response Theory to Evaluate Educational and Psychological Tests. Educational Measurement: Issues and Practice, vol. 22, no 3, pp. 37–51. http://dx.doi.org/10.1111/j.1745-3992.2003.tb00136.x
Alexander P.A., Dumas D., Grossnickle E.M., List A., Firetto C.M. (2016) Measuring Relational Reasoning. The Journal of Experimental Education, vol. 84, no 1, pp. 119–151. http://dx.doi.org/10.1080/00220973.2014.963216
Araujo A.L.S.O., Andrade W.L., Guerrero D.D.S., Melo M.R.A. (2019) How Many Abilities Can We Measure in Computational Thinking? A Study on Bebras Challenge. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, 2019, 27 February), New York, NY: Machinery, pp. 545–551.
Arhonditsis G.B., Stow C.A., Steinberg L.J., Kenney M.A., Lathrop R.C., McBride S.J., Reckhow K.H. (2006) Exploring Ecological Patterns with Structural Equation Modeling and Bayesian Analysis. Ecological Modelling, vol. 192, no 3–4, pp. 385–409. https://doi.org/10.1016/j.ecolmodel.2005.07.028
Ayzel G.V., Gusev E.M., Nasonova O.N. (2017) Raschety rechnogo stoka na osnove modeli SWAP dlya vodosborov s nedostatochnym informatsionnym obespecheniem. 2. Ispol´zovanie metodov fiziko-geograficheskogo podpbiya i prostranstvennoy geostatistiki [Runoff Evaluation for Ungauged Watersheds by SWAP Model. 2. Using Methods of Physical and Geographical Similarity and Spatial Geostatistics]. Water Resources, vol. 44, no 4, pp. 419–431. https://doi.org/10.7868/S0321059617020043
Birenbaum M., DeLuca C., Earl L., Heritage M., Klenowski V., Looney A. et al. (2015) International Trends in the Implementation of Assessment for Learning: Implications for Policy and Practice. Policy Futures in Education, vol. 13, no 1, pp. 117–140. http://dx.doi.org/10.1177/1478210314566733
Birnbaum M.H. (2008) New Paradoxes of Risky Decision Making. Psychological Review, vol. 115, no 2, pp. 463–501. https://doi.org/10.1037/0033-295X.115.2.463
Borsboom D., Mellenbergh G.J., van Heerden J. (2004) The Concept of Validity. Psychological Review, vol. 111, no 4 pp., 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Borsboom D., Molenaar D. (2015) Psychometrics. International Encyclopedia of the Social & Behavioral Sciences (ed. J.D. Wright), Oxford: Elsevier, pp. 418–422. https://doi.org/10.1016/B978-0-08-097086-8.43079-5
Borsboom D., Rhemtulla M., Cramer A.O., van der Maas H.L., Scheffer M., Dolan C.V. (2016) Kinds Versus Continua: A Review of Psychometric Approaches to Uncover the Structure of Psychiatric Constructs. Psychological Medicine, vol. 46, no 8, pp. 1567–1579. http://dx.doi.org/10.1017/S0033291715001944
Buchholz J., Hartig J. (2020) Measurement Invariance Testing in Questionnaires: A Comparison of Three Multigroup-CFA and IRT-Based Approaches. Psychological Test and Assessment Modeling, vol. 62, no 1, pp. 29–53.
Caycho-Rodríguez T., Vilca L.W., Carbajal-León C., White M., Vivanco-Vidal A., Saroli-Araníbar D. et al. (2022) Coronavirus Anxiety Scale: New Psychometric Evidence for the Spanish Version Based on CFA and IRT Models in a Peruvian Sample. Death Studies, vol. 46, no 5, pp. 1090–1099. http://dx.doi.org/10.1080/07481187.2020.1865480
Costantini G., Epskamp S., Borsboom D., Perugini M., Mõttus R., Waldorp L.J., Cramer A.O. (2015) State of the aRt Personality Research: A Tutorial on Network Analysis of Personality Data in R. Journal of Research in Personality, vol. 54, July, pp. 13–29.
https://doi.org/10.1016/j.jrp.2014.07.003
Credé M. (2018) What Shall We Do about Grit? A Critical Review of What We Know and What We Don’t Know. Educational Researcher, vol. 47, no 9, pp. 606–611. http://dx.doi.org/10.3102/0013189X18801322
Cronbach L.J., Meehl P.E. (1955) Construct Validity in Psychological Tests. Psychological Bulletin, vol. 52, no 4, pp. 281–302. https://doi.org/10.1037/h0040957
Dam van J.C., Groenendijk P., Hendriks R.F., Kroes J.G. (2008) Advances of Modeling Water Flow in Variably Saturated Soils with SWAP. Vadose Zone Journal, vol. 7, no 2, pp. 640–653. http://dx.doi.org/10.2136/vzj2007.0060
Divgi D.R. (1986) Does the Rasch Model Really Work for Multiple Choice Items? Not If You Look Closely. Journal of Educational Measurement, vol. 23, no 4, pp. 283–298.
Duckworth A.L., Quinn P.D. (2012) Short Grit Scale. Journal of Personality Assessment, vol. 91, no 2, pp. 166-174. https://psycnet.apa.org/doi/10.1037/t01598-000
Duckworth A.L., Peterson C., Matthews M.D., Kelly D.R. (2007) Grit: Perseverance and Passion for Long-Term Goals. Journal of Personality and Social Psychology, vol. 92, no 6, 1087–1101. http://dx.doi.org/10.1037/0022-3514.92.6.1087
Dumas D., Dong Y. (2022) Relational Reasoning and Thinking: Theory, Measurement, and Empirical Findings. International Encyclopedia of Education (eds R. Tierney, F. Rizvi, K. Ercican), New York, NY: Taylor & Francis. https://doi.org/10.4324/9781138609877-REE179-1
Fischer G.H. (1973) The Linear Logistic Test Model as an Instrument in Educational Research. Acta Psychologica, vol. 37, no 6, pp. 359–374. http://dx.doi.org/10.1016/0001-6918(73)90003-6
Fisher Jr.W.P., Stenner A.J. (2022) Metrology for the Social, Behavioral, and Economic Sciences. Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner (eds W.P. Fisher, P.J. Massengill), Singapore: Springer Nature Singapore, pp. 217–222.
Fox J.P. (2005) Multilevel IRT Using Dichotomous and Polytomous Response Data. British Journal of Mathematical and Statistical Psychology, vol. 58, no 1, pp. 145–172. http://dx.doi.org/10.1348/000711005X38951
Franić S., Borsboom D., Dolan C.V., Boomsma D.I. (2014) The Big Five Personality Traits: Psychological Entities or Statistical Constructs? Behavior Genetics, vol. 44, no 6, pp. 591–604. http://dx.doi.org/10.1007/s10519-013-9625-7
Franic S., Dolan C.V., Borsboom D., Boomsma D.I. (2012) Structural Equation Modeling in Genetics. Handbook of Structural Equation Modeling (ed. R.H. Hoyle), New York, NY: The Guilford, pp. 617–635.
Freund P.A., Lohbeck A. (2021) Modeling Self-Determination Theory Motivation Data by Using Unfolding IRT. European Journal of Psychological Assessment, vol. 37, no 5, pp. 388–396. http://dx.doi.org/10.1027/1015-5759/a000629
Hambleton R.K., Swaminathan H. (2013) Item Response Theory: Principles and Applications. Springer Science & Business Media.
Hartig J., Höhler J. (2009) Multidimensional IRT Models for the Assessment of Competencies. Studies in Educational Evaluation, vol. 35, no 2–3, pp. 57–63. http://dx.doi.org/10.1016/j.stueduc.2009.10.002
Hauwaert van S.M., Schimpf C.H., Azevedo F. (2020) The Measurement of Populist Attitudes: Testing Cross-National Scales Using Item Response Theory. Politics, vol. 40, no 1, Article no 026339571985930. http://dx.doi.org/10.1177/0263395719859306
Johnson H.M. (1945) Are Psychophysical Problems Genuine or Spurious? The American Journal of Psychology, vol. 58, no 2, pp. 189–211. https://doi.org/10.2307/1417845
Kane M.T. (2016) Explicating Validity. Assessment in Education: Principles, Policy & Practice, vol. 23, no 2, pp. 198–211. https://doi.org/10.1080/0969594X.2015.1060192
Kunina-Habenicht O., Goldhammer F. (2020) ICT Engagement: A New Construct and Its Assessment in PISA 2015. Large-Scale Assessments in Education, vol. 8, no 1, pp. 1–21. http://dx.doi.org/10.1186/s40536-020-00084-z
Lange J., Dalege J., Borsboom D., van Kleef G.A., Fischer A.H. (2020) Toward an Integrative Psychometric Model of Emotions. Perspectives on Psychological Science, vol. 15, no 2, pp. 444–468. http://dx.doi.org/10.1177/1745691619895057
Linden van der W.J., Hambleton R.K. (eds) (2013) Handbook of Modern Item Response Theory. Springer Science & Business Media.
Luo Y. (2021) A Comparison of Common IRT Model-selection Methods with Mixed-Format Tests. Measurement: Interdisciplinary Research and Perspectives, vol. 19, no 4, pp. 199–212. http://dx.doi.org/10.1080/15366367.2021.1878779
MacCorquodale K., Meehl P.E. (1948) On a Distinction between Hypothetical Constructs and Intervening Variables. Psychological Review, vol. 55, no 2, pp 95–107. https://doi.org/10.1037/h0056029
Maraun M. (2017) The Object Detection Logic of Latent Variable Technologies. Quality and Quantity, vol. 51, no 1, pp. 239–259. https://doi.org/10.1007/s11135-015-0303-0
Maraun M.D., Gabriel S.M. (2013) Illegitimate Concept Equating in the Partial Fusion of Construct Validation Theory and Latent Variable Modeling. New Ideas in Psychology, vol. 31, no 1, pp. 32–42. https://doi.org/10.1016/j.newideapsych.2011.02.006
Maraun M.D., Halpin P.F. (2008) Manifest and Latent Variates. Measurement: Interdisciplinary Research and Perspectives, vol. 6, no 1-2, pp. 113–117. https://doi.org/10.1080/15366360802035596
Markus K.A., Borsboom D. (2013) Frontiers of Test Validity Theory: Measurement, Causation, and Meaning. New York, NY: Routledge/Taylor & Francis Group. https://doi.org/10.4324/9780203501207
Messick S. (1994) The Interplay of Evidence and Consequences in the Validation of Performance Assessments. Educational Researcher, vol. 23, no 2, pp. 13–23. https://doi.org/10.3102/0013189x023002013
Michell J. (2013) Constructs, Inferences, and Mental Measurement. New Ideas in Psychology, vol. 31, no 1, pp. 13–21. https://doi.org/10.1016/j.newideapsych.2011.02.004
Mislevy R.J., Steinberg L.S., Almond R.G. (2002) On the Roles of Task Model Variables in Assessment Design. Generating Items for Cognitive Tests: Theory and Practice (eds S. Irvine, P. Kyllonen), Hillsdale, NY: Erlbaum, pp. 97–128.
Nering M.L., Ostini R. (eds) (2010) Handbook of Polytomous Item Response Theory Models. New York, NY: Routledge. https://doi.org/10.4324/9780203861264
Nima A.A., Cloninger K.M., Persson B.N., Sikström S., Garcia D. (2020) Validation of Subjective Well-Being Measures Using Item Response Theory. Frontiers in Psychology, vol. 10, January, Article no 3036. http://dx.doi.org/10.3389/fpsyg.2019.03036
Ottensen J. (2000) Mathematical Modelling in Medicine. Amsterdam: IOS Press.
Podolsky A., Kaufman K.R., Cahalan T.D., Aleshinsky S.Y., Chao E.Y. (1990) The Relationship of Strength and Jump Height in Figure Skaters. The American Journal of Sports Medicine, vol. 18, no 4, pp. 400–405. https://doi.org/10.1177/036354659001800412
Power M.J. (2006) The Structure of Emotion: An Empirical Comparison of Six Models. Cognition and Emotion, vol. 20, no 5, pp 694–713. https://doi.org/10.1080/02699930500367925
Pugesek B.H., Tomer A., von Eye A. (2003) Structural Equation Modeling: Applications in Ecological and Evolutionary Biology. Cambridge, UK: Cambridge University. https://doi.org/10.1017/CBO9780511542138
Qian M., Plucker J.A., Yang X. (2019) Is Creativity Domain Specific or Domain General? Evidence from Multilevel Explanatory Item Response Theory Models. Thinking Skills and Creativity, vol. 33, May, Article no 100571. http://dx.doi.org/10.1016/j.tsc.2019.100571
Oberkampf W.L., DeLand S.M., Rutherford B.M., Diegert K.V., Alvin K.F. (2002) Error and Uncertainty in Modeling and Simulation. Reliability Engineering & System Safety, vol. 75, no 3 pp., 333–357. http://dx.doi.org/10.1016/S0951-8320(01)00120-X
Rasch G. (1960) Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danmarks Paedagogiske Institut.
Ravand H., Robitzsch A. (2015) Cognitive Diagnostic Modeling Using R. Practical Assessment, Research, and Evaluation, vol. 20, no 11. Available at: http://pareonline.net/getvn.asp?v=20&n=11 (accessed 20 August 2023).
Reise S.P. (2012) The Rediscovery of Bifactor Measurement Models. Multivariate Behavioral Research, vol. 47, no 5, pp. 667–696. https://doi.org/10.1080/00273171.2012.715555
Riconscente M.M., Mislevy R.J., Corrigan S. (2015) Evidence-Centered Design. Handbook of Test Development (eds S. Lane, M.R. Raymond, T.M. Haladyna), New York, NY: Routledge, pp. 40–63. http://dx.doi.org/10.4324/9780203102961.ch3
Robitzsch A. (2022) On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty. Entropy, vol. 24, no 6, Article no 760. http://dx.doi.org/10.3390/e24060760
Rhodes M., Putkaradze V. (2022) Trajectory Tracing in Figure Skating. Nonlinear Dynamics, vol. 110, no 4, pp. 3031–3044. https://doi.org/10.1007/s11071-022-07806-8
Schmittmann V.D., Cramer A.O.J., Waldorp L.J., Epskamp S., Kievit R.A., Borsboom D. (2013) Deconstructing the Construct: A Network Perspective on Psychological Phenomena. New Ideas in Psychology, vol. 31, no 1, pp. 43–53. https://doi.org/10.1016/j.newideapsych.2011.02.007
Sen S., Cohen A.S. (2019) Applications of Mixture IRT Models: A Literature Review. Measurement: Interdisciplinary Research and Perspectives, vol. 17, no 4, pp. 177–191. http://dx.doi.org/10.1080/15366367.2019.1583506
Shaw A., Kapnek M., Morelli N.A. (2021) Measuring Creative Self-Efficacy: An Item Response Theory Analysis of the Creative Self-Efficacy Scale. Frontiers in Psychology, vol. 12, July, Article no 678033. http://dx.doi.org/10.3389/fpsyg.2021.678033
Sijtsma K., Ark van der A. (2020) Measurement Models for Psychological Attributes: Classical Test Theory, Factor Analysis, Item Response Theory, and Latent Class Models. Boca Raton, FL: CRC. https://doi.org/10.1201/9780429112447
Streckert N., Kurtz L., Kajonius P.J. (2023) Can Your Darkness Be Measured? Analyzing the Full and Brief Version of the Dark Factor of Personality in Swedish. International Journal of Testing, vol. 23, no 2, pp. 1–45. http://dx.doi.org/10.1080/15305058.2023.2195659
Templin J.L., Henson R.A. (2006) Measurement of Psychological Disorders Using Cognitive Diagnosis Models. Psychological Methods, vol. 11, no 3, 287–305. http://dx.doi.org/10.1037/1082-989X.11.3.287
Trendler G. (2022) Is Measurement in Psychology an Empirical or a Conceptual Issue? A Comment on David Franz. Theory & Psychology, vol. 32, no 1, pp. 164–170. https://doi.org/10.1177/09593543211050025
Trendler G. (2013) Measurement in Psychology: A Case of Ignoramus et Ignorabimus? A Rejoinder. Theory & Psychology, vol. 23, no 5, pp. 591–615. https://doi.org/10.1177/0959354313490451
Tynan M.C. (2021) Deconstructing Grit’s Validity: The Case for Revising Grit Measures and Theory. Multidisciplinary Perspectives on Grit: Contemporary Theories, Assessments, Applications and Critiques (eds L.E. van Zyl, C. Olckers, L. van der Vaart), Cham: Springer Nature Switzerland, pp. 137–155. http://dx.doi.org/10.1007/978-3-030-57389-8_8
Tyumeneva Y., Kardanova E., Kuzmina J. (2019) Grit: Two Related but Independent Constructs Instead of One. Evidence from Item Response Theory. European Journal of Psychological Assessment, vol. 35, no 4, pp. 469–478. http://dx.doi.org/10.1027/1015-5759/a000424
Uglanova I.L.1, Brun I.V.1, Vasin G.M. (2018) Metodologiya Evidence-Centered Design dlya izmereniya kompleksnykh psikhologicheskikh konstruktov [Evidence-Centered Design Method for Measuring Complex Psychological Constructs]. Journal of Modern Foreign Psychology, vol. 7, no 3, pp. 18–27. https://doi.org/10.17759/jmfp.2018070302
Uher J. (2021) Quantitative Psychology under Scrutiny: Measurement Requires Not Result-Dependent But Traceable Data Generation. Personality and Individual Differences, vol. 170, no 5, Article no110205. https://doi.org/10.1016/j.paid.2020.110205
Vessonen E. (2021) Conceptual Engineering and Operationalism in Psychology. Synthese, vol. 199, no 3–4, pp. 10615–10637. https://doi.org/10.1007/s11229-021-03261-x
Wagner T.A., Harvey R.J. (2006) Development of a New Critical Thinking Test Using Item Response Theory. Psychological Assessment, vol. 18, no 1, pp. 100–105. https://doi.org/10.1037/1040-3590.18.1.100
Walton K.E., Roberts B.W., Krueger R.F., Blonigen D.M., Hicks B.M. (2008) Capturing Abnormal Personality with Normal Personality Inventories: An Item Response Theory Approach. Journal of Personality, vol. 76, no 6, pp. 1623–1648. http://dx.doi.org/10.1111/j.1467-6494.2008.00533.x
Wiggins B.J., Christopherson C.D. (2019) The Replication Crisis in Psychology: An Overview for Theoretical and Philosophical Psychology. Journal of Theoretical and Philosophical Psychology, vol. 39, no 4, pp. 202–217. http://dx.doi.org/10.1037/teo0000137
Will C.M. (2000) Einstein’s Relativity and Everyday Life. Available at: http://www.physicscentral.com/writers/writers-00-2.html (accessed 20 August 202).
Wilson M. (2004) Constructing Measures. An Item Response Modeling Approach. New York, NY: Routledge
Yen W.M, Fizpatrick A.R. (2006) Item Response Theory. Educational Measurement (ed. R.L. Brennan), Westport, CT: American Council on Education and Praeger, pp. 17–64.
Zhao H., Alexander P.A., Sun Y. (2021) Relational Reasoning’s Contributions to Mathematical Thinking and Performance in Chinese Elementary and Middle-School Students. Journal of Educational Psychology, vol. 113, no 2, pp. 279–303. http://dx.doi.org/10.1037/edu0000595