Abstract: Medical EducationVolume 38, Issue 12 p. 1208-1210 Free Access Merging views on assessment Lambert Schuwirth, Lambert SchuwirthSearch for more papers by this authorCees Van Der Vleuten, Cees Van Der VleutenSearch for more papers by this author Lambert Schuwirth, Lambert SchuwirthSearch for more papers by this authorCees Van Der Vleuten, Cees Van Der VleutenSearch for more papers by this author First published: 25 November 2004 https://doi.org/10.1111/j.1365-2929.2004.02055.xCitations: 37AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat This is really an exciting time to be working in the field of assessment in medical education. Our views on how to assess medical competence and performance are changing dramatically. The good thing about this is that it is not so much an issue of replacing old theories with new ones. Rather it seems as if we are combining old theories with new ones, which makes our perspective so much wider. Even more interesting is to see that although these two sets of theories represent seemingly opposing views they in fact have considerable commonalties. For several decades our views on assessment have been dominated by psychometric theories. This has had a major and very beneficial impact on the quality of our assessment. It has made us aware of factors such as reproducibility or reliability, (construct) validity and efficiency as indicators for the quality of assessment instruments. In addition, it has shown us that many of the traditional common beliefs concerning assessment were untrue, or at least not supported by empirical numerical evidence. Typical examples of such unsustainable beliefs include the notion that open-ended questions are superior to multiple-choice questions, or that the format of the assessment determines which aspect of competence is being measured by the method. Psychometric theories have made people aware that critical testing of beliefs, quality control and research is essential for the development of high quality assessment (and high quality education). But, as with any view or theory, the psychometric view has its limitations. We think the major limitation originates from the fact that the test theory is largely derived from psychological personality tests. Typically, such personality tests do not require each item to be intrinsically meaningful, as long as the total scores (or in many tests the scale scores) are meaningful. A typical item, such as 'At times I feel like swearing', from the MMPI in itself does not tell us much about the personality of a person; it is the large numbers of items that become meaningful. The whole view of assessment as psychological measurement has had several implications for the way we think about defensibility and applicability of assessment. First, it creates an atmosphere in which the selective purposes of the test dominate the educational purposes, or in other words the punitive aspects dominate the rewarding aspects. Second, it leads people to overstress the value of numerical data in the process of quality control of assessment. Some schools, for example have a policy of eliminating all items from a test which show poor item statistics regardless of their content. But, what if the content, wording and relevancy of the item are beyond doubt? Eliminating the item may improve item statistics and perhaps Cronbach's α will go up, but in fact the school has simply eliminated a valid element of the universe. By doing this, it is probable that the reproducibility and the validity of the test will actually have diminished, although the numbers will indicate otherwise. Third, it implies that the value of the assessment can be researched and described in numbers only. Fourth, it defines the development and research agenda in assessment as one that tries to find the single superior instrument for each trait. So, where this view has led to obvious advances in quality of assessment developments, it also has its limitations. The other view mainly comes down to perceiving any form of assessment as an integral part of education, where the main question is not 'pass or fail?'. Rather, assessment is supposed to determine strengths and weaknesses of the candidates in order to optimise their individual learning pathways. This means that assessment 'results' should remain broad and highly informative, showing all the nuances. This view also has serious implications. First, each of the elements (items) of the assessment has a specific and intrinsic meaning. This means that each individual item can and must have consequences, not so much in terms of passing and failing but more in terms of adaptation of the individual learning pathway. Second, it implies that quality control comes from judgements about the items. It makes a strong argument for test review committees. Third, it calls for more research that illuminates the role of assessment within its educational context: for example, into how the assessment actually drives student learning, how teachers process large bodies of rich information, and so on. Fourth, it implies that no single instrument is superior, but that the whole picture can only be drawn from a variety of well used methods of information gathering. The mechanistic decision based on a standard of performance on a single assessment method is replaced by a professional judgement based on accumulated and triangulated information across multiple sources of assessment information. Both of these views agree in trying to improve the quality of assessment. But their basic concepts are different. The 'psychometric' view aims at reducing the outcome by throwing away information in the most efficient way. For example, the answers a candidate gives to a 100-item multiple-choice test contain much information: which incorrect answers were chosen, which patterns of incorrect answers exist, which correct answers were given, and so on. The information is all in there, but then we recalculate this into a summated score, which is then further dichotomised into a pass or fail decision. This is not meant as a negative approach to this view. Quite the opposite: somewhere down the line decisions have to be made as to whether the candidate is competent. But the underlying reason why all this information can be thrown away is that the individual item is not meaningful for the assessment, and is only there to contribute (to load) on one common trait or factor. The other view struggles to maintain the richness of information of the assessment while keeping it manageable. So far this has proven to be far from easy, because we are so used to numerical combinations of results. In most examination programmes, the preferred approach is to combine information from various assessment instruments by converting all outcomes to numerical values and then combining them arithmetically (test 1 counts for 30%, test 2 counts for 50%, etc.). We often tend to believe that this is the only possible way to combine information from various sources. But there are other ways. A good and familiar example is the patient chart. This is a document in which numerical values are mostly converted into qualitative values: a 'sodium level of 134 mmol/L' is described as a 'low normal value'. This has to be done in order to be able to combine the information about the sodium level (and all other biochemical and haematological parameters) with 'three weeks of cramping upper abdominal pain' and the 'fear of having cancer' from the history. Despite the qualitative nature of the information doctors are able to combine everything to make important decisions about the specific treatment of the patient based on the combined information in this document, as well as the final (dichotomous) decision: to discharge the patient or not. That is all very nice in theory, but where does it start to happen that these two visions can coexist? We think a good example can be found in this issue of Medical Education. Three papers describe research that tries to evaluate the quality of assessment. Subhashis Basu and colleagues describe the quality of a newly developed test on musculo-skeletal knowledge using a more traditional view, using Cronbach's α, increase in scores between subjects of different level of training and feasibility.1 But that is not all; they further describe the standard-setting procedure and how it is used to stimulate student learning in a specific educational setting. Their work focuses not only on the instrument itself but also on its use. This is a matter Charlotte Ringsted and colleagues investigate in more depth.2 They have focused on the effects of in-training assessment on students' perceived level of confidence. If assessment is to steer student learning it would be helpful if the assessment led to more confidence that the competency has been mastered. The possible negative and positive factors influencing this effect are important. Hester Daelmans et al. have studied the overall usefulness of a complete assessment programme.3 They seem to have used methods that are based on the psychometric view (generalisability analysis, etc.) but they have looked at the value of the assessment as a programme and not as one specific instrument. We think these three studies are exemplary in demonstrating the way in which we are currently merging the two views of assessment. It is moving from thinking of assessment as trying to capture the whole picture with one single best instrument towards seeing assessment as a programme which runs parallel to or, even better, completely intertwined with education. References 1 Basu S, Roberts C. Newble DI, Snaith M. Competence in the musculoskeletal system: assessing the progression of knowledge through an undergraduate medical course. Med Educ 2004; 38: 1253–60. Wiley Online LibraryPubMedWeb of Science®Google Scholar 2 Ringsted C, Pallisgaard J, Østergaard D, Scherpbier A. The effect of in-training assessment on clinical confidence in postgraduate education. Med Educ 2004; 38:1261–69. Wiley Online LibraryPubMedWeb of Science®Google Scholar 3 Daelmans HEM, Van Der Hem-Stokroos HH, Hoogenboom RJI, Scherpbier AJJA, Stehouwer CDA, Van Der Vleuten CPM. Feasibility and reliability of an in-training assessment programme in an undergraduate clerkship. Med Educ 2004; 38:1270–7. Wiley Online LibraryPubMedWeb of Science®Google Scholar Citing Literature Volume38, Issue12December 2004Pages 1208-1210 ReferencesRelatedInformation