Is Psychometrics So Useful for Academic Psychology?

Keywords: psychometric modelling, latent construct modelling, psychological construct, psychological theory, test


Psychological theories regarding ability and personality traits often rely on the results of psychometric modelling. The latter is assumed to link responses to test items to an unobserved 'construct' (trait, ability), which is 'modelled' from the test data. However, does the agreement between the data and the model indicate that the model represents a psychological construct? To what extent is ‘psychometric modelling’ modelling in the general scientific sense of the term? The validity of using modelling data to understand psychological phenomena depends on the answer to these questions. The article analyses the logic of psychometric modelling in comparison with modelling in other sciences and argues that psychological phenomena as a subject of modelling are not involved neither in the construction nor in the correction of models. The problem of unjustified interpretations of modelling results in psychology and their undesirable consequences for psychological theory is raised. At the same time, the use of psychometric modelling for human resource decision-making is still waiting for its evaluation.


Download data is not yet available.


Ackerman T.A., Gierl M.J., Walker C.M. (2003) Using Multidimensional Item Response Theory to Evaluate Educational and Psychological Tests. Educational Measurement: Issues and Practice, vol. 22, no 3, pp. 37–51.

Alexander P.A., Dumas D., Grossnickle E.M., List A., Firetto C.M. (2016) Measuring Relational Reasoning. The Journal of Experimental Education, vol. 84, no 1, pp. 119–151.

Araujo A.L.S.O., Andrade W.L., Guerrero D.D.S., Melo M.R.A. (2019) How Many Abilities Can We Measure in Computational Thinking? A Study on Bebras Challenge. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, 2019, 27 February), New York, NY: Machinery, pp. 545–551.

Arhonditsis G.B., Stow C.A., Steinberg L.J., Kenney M.A., Lathrop R.C., McBride S.J., Reckhow K.H. (2006) Exploring Ecological Patterns with Structural Equation Modeling and Bayesian Analysis. Ecological Modelling, vol. 192, no 3–4, pp. 385–409.

Ayzel G.V., Gusev E.M., Nasonova O.N. (2017) Raschety rechnogo stoka na osnove modeli SWAP dlya vodosborov s nedostatochnym informatsionnym obespecheniem. 2. Ispol´zovanie metodov fiziko-geograficheskogo podpbiya i prostranstvennoy geostatistiki [Runoff Evaluation for Ungauged Watersheds by SWAP Model. 2. Using Methods of Physical and Geographical Similarity and Spatial Geostatistics]. Water Resources, vol. 44, no 4, pp. 419–431.

Birenbaum M., DeLuca C., Earl L., Heritage M., Klenowski V., Looney A. et al. (2015) International Trends in the Implementation of Assessment for Learning: Implications for Policy and Practice. Policy Futures in Education, vol. 13, no 1, pp. 117–140.

Birnbaum M.H. (2008) New Paradoxes of Risky Decision Making. Psychological Review, vol. 115, no 2, pp. 463–501.

Borsboom D., Mellenbergh G.J., van Heerden J. (2004) The Concept of Validity. Psychological Review, vol. 111, no 4 pp., 1061–1071.

Borsboom D., Molenaar D. (2015) Psychometrics. International Encyclopedia of the Social & Behavioral Sciences (ed. J.D. Wright), Oxford: Elsevier, pp. 418–422.

Borsboom D., Rhemtulla M., Cramer A.O., van der Maas H.L., Scheffer M., Dolan C.V. (2016) Kinds Versus Continua: A Review of Psychometric Approaches to Uncover the Structure of Psychiatric Constructs. Psychological Medicine, vol. 46, no 8, pp. 1567–1579.

Buchholz J., Hartig J. (2020) Measurement Invariance Testing in Questionnaires: A Comparison of Three Multigroup-CFA and IRT-Based Approaches. Psychological Test and Assessment Modeling, vol. 62, no 1, pp. 29–53.

Caycho-Rodríguez T., Vilca L.W., Carbajal-León C., White M., Vivanco-Vidal A., Saroli-Araníbar D. et al. (2022) Coronavirus Anxiety Scale: New Psychometric Evidence for the Spanish Version Based on CFA and IRT Models in a Peruvian Sample. Death Studies, vol. 46, no 5, pp. 1090–1099.

Costantini G., Epskamp S., Borsboom D., Perugini M., Mõttus R., Waldorp L.J., Cramer A.O. (2015) State of the aRt Personality Research: A Tutorial on Network Analysis of Personality Data in R. Journal of Research in Personality, vol. 54, July, pp. 13–29.

Credé M. (2018) What Shall We Do about Grit? A Critical Review of What We Know and What We Don’t Know. Educational Researcher, vol. 47, no 9, pp. 606–611.

Cronbach L.J., Meehl P.E. (1955) Construct Validity in Psychological Tests. Psychological Bulletin, vol. 52, no 4, pp. 281–302.

Dam van J.C., Groenendijk P., Hendriks R.F., Kroes J.G. (2008) Advances of Modeling Water Flow in Variably Saturated Soils with SWAP. Vadose Zone Journal, vol. 7, no 2, pp. 640–653.

Divgi D.R. (1986) Does the Rasch Model Really Work for Multiple Choice Items? Not If You Look Closely. Journal of Educational Measurement, vol. 23, no 4, pp. 283–298.

Duckworth A.L., Quinn P.D. (2012) Short Grit Scale. Journal of Personality Assessment, vol. 91, no 2, pp. 166-174.

Duckworth A.L., Peterson C., Matthews M.D., Kelly D.R. (2007) Grit: Perseverance and Passion for Long-Term Goals. Journal of Personality and Social Psychology, vol. 92, no 6, 1087–1101.

Dumas D., Dong Y. (2022) Relational Reasoning and Thinking: Theory, Measurement, and Empirical Findings. International Encyclopedia of Education (eds R. Tierney, F. Rizvi, K. Ercican), New York, NY: Taylor & Francis.

Fischer G.H. (1973) The Linear Logistic Test Model as an Instrument in Educational Research. Acta Psychologica, vol. 37, no 6, pp. 359–374.

Fisher Jr.W.P., Stenner A.J. (2022) Metrology for the Social, Behavioral, and Economic Sciences. Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner (eds W.P. Fisher, P.J. Massengill), Singapore: Springer Nature Singapore, pp. 217–222.

Fox J.P. (2005) Multilevel IRT Using Dichotomous and Polytomous Response Data. British Journal of Mathematical and Statistical Psychology, vol. 58, no 1, pp. 145–172.

Franić S., Borsboom D., Dolan C.V., Boomsma D.I. (2014) The Big Five Personality Traits: Psychological Entities or Statistical Constructs? Behavior Genetics, vol. 44, no 6, pp. 591–604.

Franic S., Dolan C.V., Borsboom D., Boomsma D.I. (2012) Structural Equation Modeling in Genetics. Handbook of Structural Equation Modeling (ed. R.H. Hoyle), New York, NY: The Guilford, pp. 617–635.

Freund P.A., Lohbeck A. (2021) Modeling Self-Determination Theory Motivation Data by Using Unfolding IRT. European Journal of Psychological Assessment, vol. 37, no 5, pp. 388–396.

Hambleton R.K., Swaminathan H. (2013) Item Response Theory: Principles and Applications. Springer Science & Business Media.

Hartig J., Höhler J. (2009) Multidimensional IRT Models for the Assessment of Competencies. Studies in Educational Evaluation, vol. 35, no 2–3, pp. 57–63.

Hauwaert van S.M., Schimpf C.H., Azevedo F. (2020) The Measurement of Populist Attitudes: Testing Cross-National Scales Using Item Response Theory. Politics, vol. 40, no 1, Article no 026339571985930.

Johnson H.M. (1945) Are Psychophysical Problems Genuine or Spurious? The American Journal of Psychology, vol. 58, no 2, pp. 189–211.

Kane M.T. (2016) Explicating Validity. Assessment in Education: Principles, Policy & Practice, vol. 23, no 2, pp. 198–211.

Kunina-Habenicht O., Goldhammer F. (2020) ICT Engagement: A New Construct and Its Assessment in PISA 2015. Large-Scale Assessments in Education, vol. 8, no 1, pp. 1–21.

Lange J., Dalege J., Borsboom D., van Kleef G.A., Fischer A.H. (2020) Toward an Integrative Psychometric Model of Emotions. Perspectives on Psychological Science, vol. 15, no 2, pp. 444–468.

Linden van der W.J., Hambleton R.K. (eds) (2013) Handbook of Modern Item Response Theory. Springer Science & Business Media.

Luo Y. (2021) A Comparison of Common IRT Model-selection Methods with Mixed-Format Tests. Measurement: Interdisciplinary Research and Perspectives, vol. 19, no 4, pp. 199–212.

MacCorquodale K., Meehl P.E. (1948) On a Distinction between Hypothetical Constructs and Intervening Variables. Psychological Review, vol. 55, no 2, pp 95–107.

Maraun M. (2017) The Object Detection Logic of Latent Variable Technologies. Quality and Quantity, vol. 51, no 1, pp. 239–259.

Maraun M.D., Gabriel S.M. (2013) Illegitimate Concept Equating in the Partial Fusion of Construct Validation Theory and Latent Variable Modeling. New Ideas in Psychology, vol. 31, no 1, pp. 32–42.

Maraun M.D., Halpin P.F. (2008) Manifest and Latent Variates. Measurement: Interdisciplinary Research and Perspectives, vol. 6, no 1-2, pp. 113–117.

Markus K.A., Borsboom D. (2013) Frontiers of Test Validity Theory: Measurement, Causation, and Meaning. New York, NY: Routledge/Taylor & Francis Group.

Messick S. (1994) The Interplay of Evidence and Consequences in the Validation of Performance Assessments. Educational Researcher, vol. 23, no 2, pp. 13–23.

Michell J. (2013) Constructs, Inferences, and Mental Measurement. New Ideas in Psychology, vol. 31, no 1, pp. 13–21.

Mislevy R.J., Steinberg L.S., Almond R.G. (2002) On the Roles of Task Model Variables in Assessment Design. Generating Items for Cognitive Tests: Theory and Practice (eds S. Irvine, P. Kyllonen), Hillsdale, NY: Erlbaum, pp. 97–128.

Nering M.L., Ostini R. (eds) (2010) Handbook of Polytomous Item Response Theory Models. New York, NY: Routledge.

Nima A.A., Cloninger K.M., Persson B.N., Sikström S., Garcia D. (2020) Validation of Subjective Well-Being Measures Using Item Response Theory. Frontiers in Psychology, vol. 10, January, Article no 3036.

Ottensen J. (2000) Mathematical Modelling in Medicine. Amsterdam: IOS Press.

Podolsky A., Kaufman K.R., Cahalan T.D., Aleshinsky S.Y., Chao E.Y. (1990) The Relationship of Strength and Jump Height in Figure Skaters. The American Journal of Sports Medicine, vol. 18, no 4, pp. 400–405.

Power M.J. (2006) The Structure of Emotion: An Empirical Comparison of Six Models. Cognition and Emotion, vol. 20, no 5, pp 694–713.

Pugesek B.H., Tomer A., von Eye A. (2003) Structural Equation Modeling: Applications in Ecological and Evolutionary Biology. Cambridge, UK: Cambridge University.

Qian M., Plucker J.A., Yang X. (2019) Is Creativity Domain Specific or Domain General? Evidence from Multilevel Explanatory Item Response Theory Models. Thinking Skills and Creativity, vol. 33, May, Article no 100571.

Oberkampf W.L., DeLand S.M., Rutherford B.M., Diegert K.V., Alvin K.F. (2002) Error and Uncertainty in Modeling and Simulation. Reliability Engineering & System Safety, vol. 75, no 3 pp., 333–357.

Rasch G. (1960) Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danmarks Paedagogiske Institut.

Ravand H., Robitzsch A. (2015) Cognitive Diagnostic Modeling Using R. Practical Assessment, Research, and Evaluation, vol. 20, no 11. Available at: (accessed 20 August 2023).

Reise S.P. (2012) The Rediscovery of Bifactor Measurement Models. Multivariate Behavioral Research, vol. 47, no 5, pp. 667–696.

Riconscente M.M., Mislevy R.J., Corrigan S. (2015) Evidence-Centered Design. Handbook of Test Development (eds S. Lane, M.R. Raymond, T.M. Haladyna), New York, NY: Routledge, pp. 40–63.

Robitzsch A. (2022) On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty. Entropy, vol. 24, no 6, Article no 760.

Rhodes M., Putkaradze V. (2022) Trajectory Tracing in Figure Skating. Nonlinear Dynamics, vol. 110, no 4, pp. 3031–3044.

Schmittmann V.D., Cramer A.O.J., Waldorp L.J., Epskamp S., Kievit R.A., Borsboom D. (2013) Deconstructing the Construct: A Network Perspective on Psychological Phenomena. New Ideas in Psychology, vol. 31, no 1, pp. 43–53.

Sen S., Cohen A.S. (2019) Applications of Mixture IRT Models: A Literature Review. Measurement: Interdisciplinary Research and Perspectives, vol. 17, no 4, pp. 177–191.

Shaw A., Kapnek M., Morelli N.A. (2021) Measuring Creative Self-Efficacy: An Item Response Theory Analysis of the Creative Self-Efficacy Scale. Frontiers in Psychology, vol. 12, July, Article no 678033.

Sijtsma K., Ark van der A. (2020) Measurement Models for Psychological Attributes: Classical Test Theory, Factor Analysis, Item Response Theory, and Latent Class Models. Boca Raton, FL: CRC.

Streckert N., Kurtz L., Kajonius P.J. (2023) Can Your Darkness Be Measured? Analyzing the Full and Brief Version of the Dark Factor of Personality in Swedish. International Journal of Testing, vol. 23, no 2, pp. 1–45.

Templin J.L., Henson R.A. (2006) Measurement of Psychological Disorders Using Cognitive Diagnosis Models. Psychological Methods, vol. 11, no 3, 287–305.

Trendler G. (2022) Is Measurement in Psychology an Empirical or a Conceptual Issue? A Comment on David Franz. Theory & Psychology, vol. 32, no 1, pp. 164–170.

Trendler G. (2013) Measurement in Psychology: A Case of Ignoramus et Ignorabimus? A Rejoinder. Theory & Psychology, vol. 23, no 5, pp. 591–615.

Tynan M.C. (2021) Deconstructing Grit’s Validity: The Case for Revising Grit Measures and Theory. Multidisciplinary Perspectives on Grit: Contemporary Theories, Assessments, Applications and Critiques (eds L.E. van Zyl, C. Olckers, L. van der Vaart), Cham: Springer Nature Switzerland, pp. 137–155.

Tyumeneva Y., Kardanova E., Kuzmina J. (2019) Grit: Two Related but Independent Constructs Instead of One. Evidence from Item Response Theory. European Journal of Psychological Assessment, vol. 35, no 4, pp. 469–478.

Uglanova I.L.1, Brun I.V.1, Vasin G.M. (2018) Metodologiya Evidence-Centered Design dlya izmereniya kompleksnykh psikhologicheskikh konstruktov [Evidence-Centered Design Method for Measuring Complex Psychological Constructs]. Journal of Modern Foreign Psychology, vol. 7, no 3, pp. 18–27.

Uher J. (2021) Quantitative Psychology under Scrutiny: Measurement Requires Not Result-Dependent But Traceable Data Generation. Personality and Individual Differences, vol. 170, no 5, Article no110205.

Vessonen E. (2021) Conceptual Engineering and Operationalism in Psychology. Synthese, vol. 199, no 3–4, pp. 10615–10637.

Wagner T.A., Harvey R.J. (2006) Development of a New Critical Thinking Test Using Item Response Theory. Psychological Assessment, vol. 18, no 1, pp. 100–105.

Walton K.E., Roberts B.W., Krueger R.F., Blonigen D.M., Hicks B.M. (2008) Capturing Abnormal Personality with Normal Personality Inventories: An Item Response Theory Approach. Journal of Personality, vol. 76, no 6, pp. 1623–1648.

Wiggins B.J., Christopherson C.D. (2019) The Replication Crisis in Psychology: An Overview for Theoretical and Philosophical Psychology. Journal of Theoretical and Philosophical Psychology, vol. 39, no 4, pp. 202–217.

Will C.M. (2000) Einstein’s Relativity and Everyday Life. Available at: (accessed 20 August 202).

Wilson M. (2004) Constructing Measures. An Item Response Modeling Approach. New York, NY: Routledge

Yen W.M, Fizpatrick A.R. (2006) Item Response Theory. Educational Measurement (ed. R.L. Brennan), Westport, CT: American Council on Education and Praeger, pp. 17–64.

Zhao H., Alexander P.A., Sun Y. (2021) Relational Reasoning’s Contributions to Mathematical Thinking and Performance in Chinese Elementary and Middle-School Students. Journal of Educational Psychology, vol. 113, no 2, pp. 279–303.

How to Cite
TyumenevaYulia A. 2023. “Is Psychometrics So Useful for Academic Psychology?”. Voprosy Obrazovaniya / Educational Studies Moscow, no. 3 (November).
SI Psychometrics