Koczwara - Et - Al-2012-Medical - Education - Dr. Syntia

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

change management

Evaluating cognitive ability, knowledge tests and


situational judgement tests for postgraduate selection
Anna Koczwara,1 Fiona Patterson,1,2 Lara Zibarras,1,3 Maire Kerrin,1 Bill Irish4 & Martin Wilkinson5

OBJECTIVES This study aimed to evaluate the and SJT independently predicted more vari-
validity and utility of and candidate reactions ance than the cognitive ability test measuring
towards cognitive ability tests, and current non-verbal mental ability. The other cognitive
selection methods, including a clinical problem- ability test (measuring verbal, numerical and
solving test (CPST) and a situational judgement diagrammatic reasoning) had a predictive value
test (SJT), for postgraduate selection. similar to that of the CPST and added signifi-
cant incremental validity in predicting perfor-
METHODS This was an exploratory, longitu- mance on job simulations in an SC. The best
dinal study to evaluate the validities of two single predictor of performance at the SC was
cognitive ability tests (measuring general intel- the SJT. Candidate reactions were more positive
ligence) compared with current selection tests, towards the CPST and SJT than the cognitive
including a CPST and an SJT, in predicting ability tests.
performance at a subsequent selection centre
(SC). Candidate reactions were evaluated CONCLUSIONS In terms of operational
immediately after test administration to validity and candidate acceptance, the com-
examine face validity. Data were collected from bination of the current CPST and SJT proved
candidates applying for entry into training in to be the most effective administration of
UK general practice (GP) during the 2009 tests in predicting selection outcomes. In
recruitment process. Participants were junior terms of construct validity, the SJT measures
doctors (n = 260). The mean age of partici- procedural knowledge in addition to aspects
pants was 30.9 years and 53.1% were female. of declarative knowledge and fluid abilities
Outcome measures were participants’ scores on and is the best single predictor of perfor-
three job simulation exercises at the SC. mance in the SC. Further research should
consider the validity of the tests in this study
RESULTS Findings indicate that all tests mea- in predicting subsequent performance in
sure overlapping constructs. Both the CPST training.

Medical Education 2012: 46: 399–408


doi:10.1111/j.1365-2923.2011.04195.x

Read this article online at www.mededuc.com ‘read’


Discuss ideas arising from this article at www.mededuc.com ‘discuss’

1
Work Psychology Group Limited, Nottingham, UK Correspondence: Professor Fiona Patterson, Department of Social and
2
Department of Social and Developmental Psychology, University of Developmental Psychology, University of Cambridge, Free School
Cambridge, Cambridge, UK Lane, CB2 3RQ, Cambridge, UK. Tel: 00 44 7847 600630;
3
Department of Psychology, City University, London, UK E-mail: [email protected]
4
School of Primary Care, Severn Deanery, Bristol, UK
5
NHS Midlands and East, West Midlands GP Education,
Birmingham, UK

ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408 399
A Koczwara et al

demonstrate improved utility because they are short-


INTRODUCTION
er (and thus take less candidate time) and are not
specifically designed for GP selection and would
Large-scale meta-analytic studies show that general
therefore not require clinicians’ time during devel-
cognitive ability tests are good predictors of job
opment phases. The first cognitive ability test was the
performance across a broad range of professions and
Ravens Advanced Progressive Matrices.15 This is a
occupational settings.1–4 Cognitive ability tests assess
power test focusing on non-verbal mental ability
general intelligence (IQ) and have been used for
(referred to as NVMA in the present study) and
selection in high-stakes contexts such as military and
measures general fluid intelligence, including
pilot selection.5,6 Cognitive ability tests have been
observation skills, clear thinking ability, intellectual
used and validated for medical school admissions,7
capacity and intellectual efficiency. The test takes
but not yet for selection into postgraduate medical
40 minutes to complete. The NVMA score indicates
training. Previous research focused on medical
the candidate’s potential for success in high-level
school admissions; the use of cognitive ability tests in
positions that require clear, accurate thinking,
medical selection remains controversial.7–9 This
problem identification and evaluation of solutions.15
paper presents an evaluation of the validity and utility
These are abilities shown to be relevant to GP
of cognitive ability tests as a selection methodology
training.16 The second cognitive ability test was a
for entry into training in UK general practice (GP).
speed test designated the Swift Analysis Aptitude
The selection methodology currently used for entry
Test.17 It is a cognitive ability test battery (referred to
into GP training demonstrates good evidence of
as CATB in the present study) and consists of three
reliability, and face and criterion-related validity.10–13
short sub-tests measuring verbal, numerical and
The selection methodology comprises three stages:
diagrammatic analysis abilities. This test allows three
eligibility checks are succeeded by shortlisting via the
specific cognitive abilities to be measured in a
completion of two invigilated, machine-marked tests,
relatively short amount of time, giving an overall
and this is followed by the administration of a clinical
indication of general cognitive ability. The whole
problem-solving test (CPST) and a situational judge-
CATB takes 18 minutes and therefore might offer
ment test (SJT).10,11 The CPST requires candidates to
practical utility compared with other IQ test batteries
apply clinical knowledge to solve problems involving
in which single test subsets typically take 20–30 min-
a diagnostic process or a management strategy for a
utes.17 Both the NVMA and CATB have good internal
patient. The SJT focuses on a variety of non-cognitive
reliability and have been validated for selection
professional attributes (empathy, integrity, resilience)
purposes in general professional occupations,15,17 but
and presents candidates with work-related scenarios
not in selection for postgraduate training in
in which they are required to choose an appropriate
medicine.
response from a list. The final stage of selection
comprises a previously validated selection centre (SC)
This present study examines the validity of two forms
using three high-fidelity job simulations which assess
of cognitive ability test and the CPST and SJT
candidates over a 90-minute period, involving three
selection tests in predicting performance in the
separate assessors and an actor. These are: (i) a group
subsequent SC. Table 1 outlines each of the tests
discussion exercise referring to a work-related issue;
evaluated in the present study. Example items are
(ii) a simulated patient consultation in which the
given in Table S1 (online). Theoretically, the NVMA
candidate plays the role of doctor and an actor plays
is a measure of fluid intelligence; it measures the
the patient, and (iii) a written exercise in which the
ability to logically reason and solve problems in new
candidate prioritises a list of work-related issues and
situations, independent of learned knowledge.18
justifies his or her choices.10,14
Similarly, the CATB measures fluid intelligence, but
also measures elements of crystallised intelligence
In this study, two cognitive ability tests were piloted (experience-based ability in which knowledge and
alongside the live selection process for 2009 in order skills are acquired through educational and cultural
to explore their potential for use in future experience)17 because a level of procedural knowl-
postgraduate selection. Cognitive ability tests were edge regarding verbal and numerical reasoning is
considered here because if either cognitive ability test required to understand individual items. The CPST
were to demonstrate improved validity over and measures crystallised intelligence, especially declara-
above that of the existing CPST and SJT short-listing tive knowledge, and is designed as a test of attainment
tests, this might indicate potential for significant examining learned knowledge gained through previ-
gains in the effectiveness and efficiency of the current ous medical training. Finally, the SJT is a measure
process. For example, cognitive ability tests might designed to test non-cognitive professional attributes.

400 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408
Evaluating tests for selection

Table 1 Description of the tests and outcome measures used in the study (Table S1 [online] gives examples of items in the tests )

Test name Theoretical underpinning of test Description

Ravens Advanced Power test measuring fluid intelligence The advanced matrices differentiate between candidates at
Progressive Power tests generally do not impose a time limit on the high end of ability
Matrices completion The test has 23 items, each with 8 response options;
The non-verbal format reduces culture bias thus a maximum of 23 points are available (1 point per
correct answer)

Swift Analysis Speed test measuring fluid intelligence and some Three subsets, each with 8 items with 3–5 response
Aptitude aspects of crystallised intelligence options; thus a maximum of 24 points are available
Speed tests focus on the amount of questions (1 point per correct answer)
answered correctly within a specific timeframe

Clinical Clinical problem-solving test measuring crystallised The CPST has 100 items to be completed within
problem-solving abilities, especially declarative knowledge 90 minutes
test (CPST) It is designed as a power test

Situational Test designed to measure non-cognitive The SJT has 50 items to be completed in 90 minutes
judgement professional attributes beyond clinical knowledge There are two different types of items: ranking and choice
test (SJT) It is designed as a power test

Selection centre Multitrait–multimethod assessment centre in which (i) A group exercise, involving a group discussion referring
candidates are rated on their observed behaviours to a work-related issue
in three exercises (ii) A simulated patient consultation in which the candidate
plays the role of the doctor and an actor plays the
patient
(iii) A written exercise in which candidates prioritise a list
of work-related issues and justify their choices

In making judgements about response options, can- correlation of r = 0.46 with cognitive ability.20 In the
didates need to apply relevant knowledge, skills and present context, although the SJT was designed to
abilities to resolve the issues they are presented measure non-cognitive domains, we may expect a
with.19,20 positive association between the SJT and the two
cognitive ability tests. This is because theory suggests
We would anticipate some relationship among all that intelligent people may learn more quickly about
four tests as they measure overlapping constructs. the non-cognitive traits that are more effective in the
Because the two cognitive ability tests both measure work-related situations described in the SJT.23
similar constructs, we might expect a reasonably
strong correlation between the two. The CPST, The present study aimed to evaluate the construct,
although essentially a measure of crystallised intelli- predictive, incremental and face validity of two
gence, is likely to also entail an element of fluid cognitive ability tests in comparison with the present
intelligence as verbal reasoning is necessary to CPST and SJT selection tests. In examining predictive
understand the situations presented in the question and incremental validity, we used a previously vali-
items. Thus, we would expect the CPST to be dated approach11 with overall performance at the SC
positively related to the cognitive ability tests. The as an outcome measure. Performance at the SC has
construct validity of SJTs is less well known; research been linked to subsequent training performance14
suggests that they may relate to both cognitive and predicts supervisor ratings 12 months into train-
ability21 and learned job knowledge.22 Indeed, a ing.24 We therefore posed the following four research
meta-analysis showed that SJTs had an average questions:

ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408 401
A Koczwara et al

1 Construct validity: what are the inter-correlations and SJT predictors would be added in the first step,
among the two cognitive ability tests, the CPST and then the cognitive ability test(s) would be added
and the SJT? in the second step to determine the additional
2 Predictive validity: do scores on each of the tests variance in the SC score predicted by the cognitive
independently predict subsequent performance ability test. Secondly, we used stepwise regression
at the SC? analysis. With this method, the order of relevance of
3 Incremental validity: compared with the current predictor variables is identified by the statistical
CPST and SJT, do the cognitive ability tests package (SPSS version 17.0; SPSS, Inc., Chicago, IL,
(NVMA and CATB) each account for additional USA) to establish which predictors independently
variance in performance at the SC? predict the most variance in SC scores.
4 Face validity: do candidates perceive the tests to
be fair and appropriate? Sampling

A total of 260 candidates agreed to participate in the


study. Of these, 53.1% were female. Their mean age
METHODS was 30.9 years (range: 24–54 years). Participants
reported the following ethnic origins: White (38%);
Design and procedure Asian (43%); Black (12%); Chinese (2%); Mixed
(2%), and Other (3%). The final sample of candi-
Data were collected during the 2009 recruitment dates for whom NVMA data were available numbered
process for GP training in the West Midlands region 219 because 26 candidates did not consent to the
in the UK. Candidates were invited to participate on a matching of their pilot data with live selection data
voluntary basis and gave consent for their scores at and a further 15 candidates were not invited to the
the SC to be accessed. It was emphasised that all data SC. The final sample of candidates for whom CATB
would be used for evaluation purposes only. Candi- data were available numbered 188 because, of the 215
dates successful at shortlisting were invited to the SC, candidates who initially completed the CATB, 13
where their performance on job simulations forms consented to participate in the pilot but did not want
the basis for job offers. For each of the three SC their pilot data matched to their live selection data,
exercises, assessors rated candidates’ performance and a further 14 candidates did not pass the short-
around four of the following competencies: Empathy listing stage and so were not invited to the SC. There
and Sensitivity; Communication Skills; Coping with were no significant demographic differences between
Pressure, and Problem Solving and Professional the two samples and the overall demographics of
Integrity. These competencies were derived from the both were similar to those of the live 2009 candidate
previous job analysis16 and assessors were provided cohort.
with a 4-point rating scale (1 = poor, 4 = excellent)
with which to rate the candidate and behavioural
anchors to assist in determining the rating. For each RESULTS
of the exercises, the sum of the ratings for the four
competencies was calculated to create a total score. Table 2 provides the descriptive statistics for all tests
Exercise scores were summed to create the overall SC for the full pilot sample (n = 260). The means and
score. ranges for both cognitive ability tests were similar to
those typically found in their respective comparison
Associations among all variables in the study were norm groups, which comprised managers, profes-
examined using Pearson correlation coefficients; sional and graduate-level employees; thus the sam-
note that none of the correlations reported in the ple’s cognitive ability test scores were comparable
results have been corrected for attenuation. To with those of the general population.15,17 The CPST
investigate the relative predictive abilities of the four and SJT scores, the two cognitive ability tests and the
tests in the study (two cognitive ability tests and the SC data (including all exercise scores and overall
current selection methods, CPST and SJT), two types scores) were normally distributed.
of regression analyses were conducted using overall
performance at the SC as the dependent variable. What are the inter-correlations among the two
Firstly, we used hierarchical regression analysis. With cognitive ability tests, the CPST and the SJT?
this method, the predictor variables are added into
the regression equation in an order determined by Significant positive correlations were found between
the researcher; thus in the present context, the CPST the NVMA and the CATB (r = 0.46), CPST (r = 0.36)

402 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408
Evaluating tests for selection

Table 2 Descriptive statistics and correlations among study variables

n Mean SD Range 1 2 3 4 5 6 7 8

NVMA 234 11.12 4.07 1–21 (0.85)


CATB 202 12.38 4.23 2–24 0.46  (0.76)
 
CPST 202 254.62 39.49 99–326 0.36 0.41  (0.86)
 
SJT 202 254.23 39.20 110–331 0.34 0.47  0.44  (0.85)
   
Group exercise 188 12.68 2.67 4–16 0.19 0.28 0.28  0.39  (0.90)
Written exercise 188 12.69 2.43 5–16 0.15* 0.23  0.19  0.28  0.30  (0.89)
         
Simulation exercise 188 12.84 2.92 5–16 0.29 0.32 0.32 0.40 0.28 0.20  (0.92)
         
SC overall score 188 38.21 5.72 16–48 0.30 0.39 0.38 0.50 0.74 0.67  0.73  (0.87)

* p < 0.05;   p < 0.01 (two-tailed)


Correlations between variables were uncorrected for restriction for range. Numbers in parentheses are the reliabilities for the selection
methods for the overall 2009 recruitment round. For the NVMA and CATB, these reliabilities are those reported in the respective manuals
SD = standard deviation; NVMA = non-verbal mental ability test; CATB = cognitive ability test battery; CPST = clinical problem-solving test;
SJT = situational judgement test; SC = selection centre

Table 3 Regression analyses

NVMA dataset, n = 219 CATB dataset, n = 188

B SE B b B SE B b

Hierarchical regression analysis


Step 1, R2 = 0.31 Step 1, R2 = 0.29
Constant 13.77 2.51 Constant 13.63 2.97
SJT 0.07 0.01 0.49* SJT 0.07 0.01 0.42*
CPST 0.03 0.01 0.18  CPST 0.03 0.01 0.20 
2 2
Step 2, DR = 0.01 Step 2, DR = 0.02
NVMA 0.15 0.09 0.10 CATB 0.21 0.10 0.15à

Stepwise regression analysis


Step 1, R2 = 0.29 Step 1, R2 = 0.26
Constant 17.33 2.19 Constant 18.26 2.53
SJT 0.08 0.01 0.54* SJT 0.08 0.01 0.51*
Step 2, DR2 = 0.01 Step 2, DR2 = 0.03
CPST 0.03 0.01 0.18  CPST 0.03 0.01 0.20 
2
Step 2, DR = 0.02
CATB 0.21 0.10 0.15à

* p < 0.001,   p < 0.01, à p < 0.05


SE = standard error; NVMA = non-verbal mental ability test; CATB = cognitive ability test battery; CPST = clinical problem-solving test;
SJT = situational judgement test

ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408 403
A Koczwara et al

and SJT (r = 0.34), and also between the CATB and combined, a hierarchical multiple regression was
CPST (r = 0.41) and SJT (r = 0.47) (all p < 0.01) performed (Table 3). Scores on the CPST and SJT
(Table 3).Thus, the cognitive ability tests have both were entered in the first step, which explained 31.3%
common and independent variance with the CPST of the variance in SC overall score (R2 = 0.31,
and SJT and to some extent measure overlapping F(2,216) = 49.24; p < 0.001); however, adding the
constructs. NVMA in the second step offered no unique variance
over the CPST and SJT (DR2 = 0.01, F(2,215) = 2.85;
Do scores on each of the tests independently predict p = 0.09). A stepwise regression was also performed
subsequent performance at the SC? (Table 3) to establish which tests independently
predicted the most variance in SC scores. Scores on
The analyses in Table 2 showed a positive correlation the SJT were entered into the first step, indicating
between NVMA scores and all SC exercises (group, that the SJT explains the most variance (28.9%) of all
r = 0.19; simulation, r = 0.29 [both p < 0.01]; written, the tests (R2 = 0.29, F(1,217) = 88.02; p < 0.001).
r = 0.15 [p < 0.05]). However, in comparison with the Scores on the CPST were entered into the second and
CPST and SJT, both the CPST and SJT had substan- final step, explaining an additional 2.5% of the
tially higher correlations with the three SC exercises variance (DR2 = 0.03, F(2,216) = 7.73; p = 0.01). The
(CPST, r = 0.19–0.32; SJT, r = 0.28–0.40 [all NVMA was not entered into the model at all,
p < 0.01]). Furthermore, both the CPST and SJT had confirming its lack of incremental validity over the
higher correlations with overall SC scores (r = 0.38 two current short-listing assessments.
and r = 0.50, respectively) compared with the NVMA
(r = 0.30) (all p < 0.01). To establish the extent to which the CATB predicted
SC performance over and above both the CPST and
Results show a positive correlation between the CATB SJT, a hierarchical multiple regression was performed
and the SC exercises (group, r = 0.28; simulation, (Table 3), repeating the method described above.
r = 0.32; written, r = 0.23 [all p < 0.01]). The CPST Scores on the SJT and CPST were entered into the
correlated with the group and simulation exercises to first step and explained 28.6% of the variance
the same extent as the CATB (r = 0.28 and r = 0.32, (R2 = 0.29, F(2,185) = 36.99; p < 0.001); entering the
respectively [both p < 0.01]), but had a lower corre- CATB into the second step explained an additional
lation with the written exercise (r = 0.19 [p < 0.01]) 1.7% of the variance in overall SC performance
(Table 2). The SJT had higher correlations with all (DR2 = 0.02, F(1,184) = 4.44; p = 0.04). A stepwise
exercises (group, r = 0.39; written, r = 0.28; simula- regression was also performed, entering SJT scores
tion, r = 0.40 [all p < 0.01]). Further, the CPST had a into the first step. This explained the most variance in
lower correlation with overall SC scores compared SC performance (25.5%) of all the tests (R2 = 0.26,
with the CATB (r = 0.38 and r = 0.39, respectively F(1,186) = 63.53; p < 0.001). Scores on the CPST were
[both p < 0.01]), but the SJT had a higher correla- entered into the second step, explaining an addi-
tion (r = 0.50 [p < 0.01]). tional 3.1% of the variance (DR2 = 0.03,
F(1,185) = 8.05; p = 0.005). Finally, scores on the
Thus, overall findings indicate that, of all the tests, CATB were entered into the final step, explaining an
the SJT had the highest correlations with perfor- additional 1.7% of the variance (DR2 = 0.02,
mance at the SC. The SJT and CPST were more F(1,184) = 4.44; p = 0.04). Overall, findings indicate
effective predictors of subsequent performance than that the CATB does add some incremental validity in
the NVMA; the SJT was a better predictor of perfor- predicting SC performance.
mance at the SC than the CATB, and the CPST had a
similar predictive value to the CATB. A stepwise regression was also performed to establish
which of the four tests (SJT, CPST, NVMA, CATB)
Compared with the current CPST and SJT, do the independently predicted the most variance in SC
cognitive ability tests (NVMA and CATB) each scores. Scores on the SJT were entered into the first
account for additional variance in performance at the step, indicating that the SJT explains the most
SC? variance (25.5%) of all the tests (R2 = 0.26,
F(1,186) = 63.53; p < 0.001). Scores on the CPST were
We established the extent to which the cognitive entered into the second step, explaining an addi-
ability tests each accounted for additional variance tional 3.1% of the variance (DR2 = 0.03,
above the current CPST and SJT. To examine the F(1,185) = 8.05; p = 0.005). Finally, CATB scores were
extent to which NVMA scores predicted overall SC entered into the third step, explaining an additional
scores over and above the CPST and SJT scores 1.7% of the variance (DR2 = 0.02, F(1,184) = 4.44;

404 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408
Evaluating tests for selection

p = 0.04). The NVMA was not entered into the model SJT. This notably reflected perceptions of the CATB
at all and thus lacked incremental validity over the as providing insufficient opportunity to demonstrate
other tests. Overall, findings indicate that the NVMA ability and as not helping selectors to differentiate
does not add incremental validity in predicting SC among candidates.
performance, but the CATB does add some incre-
mental validity.
DISCUSSION
Do candidates perceive the tests as fair and
appropriate? Two cognitive ability tests were evaluated as potential
selection methods for use in postgraduate selection
All participants were asked to complete a previously and were compared with the current selection tests,
validated candidate evaluation questionnaire,12,25 the CPST and SJT. Results show positive and signif-
based on procedural justice theory, regarding their icant relationships among the CPST and SJT and
perceptions of the tests. A total of 249 candidates cognitive ability tests, indicating that they measure
completed the questionnaire (96% response rate), in overlapping constructs. For both the CPST and the
which they indicated their level of agreement with SJT, the correlation with the CATB was higher than
several statements regarding the content of each test. with the NVMA. This is probably because the CATB,
These results are shown in Table 4, along with CPST and SJT are all presented in a verbal format,
feedback on the SJT and CPST from the live 2009 whereas the NVMA has no verbal element and is
selection process. Overall results show that the CPST presented in a non-verbal format.
received the most favourable feedback, followed by
the SJT. Candidates did not react favourably to the Implications for operational validity
NVMA, mainly as a result of perceptions of low job
relevance and insufficient opportunity to demon- Considering both the predictive validation analyses
strate ability. The CATB was also relatively negatively and candidate reactions, the CPST and SJT (mea-
received; feedback was slightly better than for the suring clinical knowledge and non-cognitive profes-
NVMA but still markedly worse than for the CPST and sional attributes, respectively) have been shown to be

Table 4 Procedural justice reactions to the tests

NVMA CATB CPST SJT


(% of candidates, (% of candidates, (% of candidates, (% of candidates,
n = 249) n = 195) n = 2947) n = 2947)

SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA

The content of the test was clearly 62 23 16 47 29.2 24 3 7 89 13 22 63


relevant to GP training
The content of the test seemed 40 33 26 32 35 33 4 9 85 9 22 68
appropriate for the entry level I am
applying for
The content of the test appeared to 33 31 37 30 29 41 4 10 85 19 27 53
be fair
The test gave me sufficient 66 23 11 58 26 15 9 18 72 36 28.9 34
opportunity to indicate my ability for
GP training
The test would help selectors to 57 21 20 50 25 24 10 20 67 35 29.1 34
differentiate between candidates

NVMA = non-verbal mental ability test; CATB = cognitive ability test battery; CPST = clinical problem-solving test; SJT = situational
judgement test; SD ⁄ D = strongly disagree or disagree; N = neither agree nor disagree; A ⁄ SA = agree or strongly agree; GP = general practice

ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408 405
A Koczwara et al

a better combination in predicting SC performance


Selection
compared with the cognitive ability tests. The NVMA measures
results showed a moderate correlation with SC per-
formance, but no incremental validity over the CPST
and SJT. Furthermore, the NVMA was negatively NVMA
received by candidates. The CATB was moderately Intelligence:
correlated with SC performance and showed incre- fluid abilities
mental validity over and above the CPST and SJT; CATB
however, the CPST and SJT in combination demon-
strated significantly more variance in SC performance
than when either test was combined with the CATB. Procedural Personality
knowledge SJT and implicit
The results show that the test measuring non-cogni-
trait policies
tive professional attributes (the SJT) is the best single
predictor of subsequent performance at the SC. For
Declarative
operational validity, the best combination of tests in knowledge
CPST
terms of explaining the greatest amount of variance in
SC performance included the CPST, SJT and CATB.
However, the increase in variance explained by the Figure 1 Selection measures and their hypothesised con-
struct validity. NVMA = non-verbal mental ability test;
CATB is not large and has to be weighed against the CATB = cognitive ability test battery; SJT = situational
cost and time implications of increasing the amount judgement test; CPST = clinical problem-solving test
of test-taking time per candidate.

Theoretical implications Practical implications

As the SJT was the best single predictor of SC Although the CATB may appear to offer some
performance, it could be argued that the constructs advantages relating to cost savings in terms of
that best predict subsequent performance in job administration and test development, there are sev-
simulations include a combination of crystallised and eral reasons why replacing the CPST with the CATB
fluid intelligence, along with ‘non-cognitive’ profes- might have negative implications in this context. The
sional attributes measured by the SJT. The construct first reason relates to patient safety: using the general
validity of SJTs has been a subject for debate amongst cognitive ability tests alone would not allow the
researchers and we argue that the results presented detection of insufficient clinical knowledge. A test of
here provide further insights. Motowidlo and Beier23 attainment (e.g. the CPST) appears particularly
suggest that the procedural knowledge measured by important in this setting. Secondly, candidate per-
an SJT includes implicit trait policies, which are ceptions of fairness are not favourable towards
implicit beliefs regarding the costs and benefits of generic cognitive ability tests with regard to job
how personality is expressed and its effectiveness in relevance. This finding supports research on appli-
specific jobs (which is likely to relate to the way in cant reactions in other occupations in which cogni-
which professional attributes are expressed in a work tive ability tests were not positively received by
context). Results suggest that the SJT broadens the candidates26,27 and were perceived to lack relevance
constructs being measured (beyond declarative to any given job role.28 Such negative perceptions
knowledge and fluid intelligence) and therefore the among candidates can result in undesirable out-
SJT demonstrates the highest criterion-related validity comes, such as the loss of competent candidates from
in predicting performance in high-fidelity job simu- the selection process,29 which has a subsequent
lations in which a range of different work-related negative effect on the utility of the selection pro-
behaviours are measured (beyond knowledge and cess.30 Furthermore, extreme reactions may lead to
cognitive abilities). However, in order to build and an increased propensity for legal case initiation by
extend the current research and to test the ideas candidates.31 By contrast, the CPST received the most
presented by Motowidlo and Beier,23 we recommend approving feedback from candidates compared with
that future research exploring the construct validity all the other tests (NVMA, CATB and SJT) and its
of the SJT should also include a measure of person- immediate job relevance and fairness were favourably
ality and implicit trait policies to test this possible perceived. In this context, rejecting applicants to
association. We therefore present Fig. 1, a diagram specialty training on the basis of generalised cognitive
illustrating potential pathways among variables that ability tests alone may be particularly sensitive
could be considered in future research. because the non-selection of candidates based on

406 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408
Evaluating tests for selection

cognitive ability test scores may be at odds with their and drafted the initial manuscript. All authors contributed
previous achievement of high academic success that to the critical revision of the paper and approved the final
enabled their initial selection into medical school. manuscript for publication.
Our findings may suggest that there is a trade-off Acknowledgements: Phillipa Coan, Work Psychology Group
between costs and time constraints, positive candi- Ltd, is acknowledged for her contribution to data analysis.
Gai Evans is acknowledged for her significant contribution
date perceptions and using tests that identify clinical
to data collection via the General Practice National
knowledge and clinically relevant attributes. This
Recruitment Office, UK. Paul Yarker, Pearson Education,
represents a dilemma in terms of balancing the need London, UK, and Dr Rainer Kurz, Saville Consulting,
to reduce costs and administrative and development Surrey, UK, are acknowledged for their help in accessing
time, while ensuring that the most appropriate test materials for research purposes.
knowledge, skills and attributes are assessed during Funding: this work was funded by the Department of
selection in a manner that is also acceptable to Health in the UK.
candidates. Conflicts of interest: AK, FP and MK have provided advice to
the UK Department of Health in selection methodology
Finally, results showed that the two current selection through the Work Psychology Group Ltd.
tests (the CPST and SJT) assess cognitive ability to an Ethical approval: this study was approved by the Psychology
extent, but that they also assess other constructs likely Research Ethics Committee, City University, London.
to relate more closely to behavioural domains that are
important in the selection of future general practi-
tioners, such as empathy, integrity and clinical REFERENCES
expertise.16 There is an added security risk associated
with using off-the-shelf cognitive ability tests because 1 Hunter JE. Cognitive ability, cognitive aptitudes, job
knowledge, and job performance. J Vocat Behav 1986;29
they can be accessed directly from the test publishers
(3):340–62.
and are susceptible to coaching effects. By contrast, 2 Schmidt FL, Hunter JE. The validity and utility of
there is a reduced risk with the CPST and SJT as they selection methods in personnel psychology: practical
both adopt an item bank approach and access is more and theoretical implications of 85 years of research
closely regulated. findings. Psychol Bull 1998;124 (2):262–74.
3 Robertson IT, Smith M. Personnel selection. J Occup
This study demonstrates that the CPST and SJT are Organ Psychol 2001;74 (4):441–72.
more effective tests than the NVMA and CATB for the 4 Schmidt FL. The role of general cognitive ability and
purposes of selection into GP training. In combina- job performance: why there cannot be a debate. Hum
tion, these tests provide improved predictive validity Perform 2002;15 (1 ⁄ 2):187–210.
compared with the NVMA and CATB, but they are 5 Campbell JP. An overview of the army selection and
classification project (Project A). Pers Psychol 1990;43
also perceived as having greater relevance to the
(2):232–9.
target job role. It should be noted that the present 6 Martinussen M. Psychological measures as predictors of
paper represents a first step towards demonstrating pilot performance: a meta-analysis. Int J Aviat Psychol
the predictive validity of the cognitive ability tests as 1996;6 (1):1–20.
the outcome measure used was SC performance. The 7 Ferguson E, James D, Madeley L. Factors associated
ultimate goal in demonstrating good criterion-related with success in medical school: systematic review of the
validity is to predict performance in postgraduate literature. BMJ 2002;324 (7343):952–7.
training; therefore future research might investigate 8 James D, Yates J, Nicholson S. Comparison of A level
the predictive validity of the NVMA and CATB by and UKCAT performance in students applying to UK
exploring subsequent performance in the job role. medical and dental schools in 2006: cohort study. BMJ
The methods used currently in the GP selection 2010;340:478–85.
9 Lynch B, MacKenzie R, Dowell J, Cleland J, Prescott G.
process have been shown to predict subsequent
Does the UKCAT predict Year 1 performance in med-
training performance14 and supervisor ratings ical school? Med Educ 2009;43 (12):1203–9.
12 months into training,24 and these outcome criteria 10 Patterson F, Baron H, Carr V, Plint S, Lane P. Evalua-
may be useful in future research. tion of three short-listing methodologies for selection
into postgraduate training in general practice. Med
Educ 2009;43 (1):50–7.
Contributors: FP and MK conceived of the original study. 11 Patterson F, Carr V, Zibarras L, Burr B, Berkin L,
AK and FP designed the study, analysed and interpreted the Plint S, Irish B, Gregory S. New machine-marked
data, and contributed to the writing of the paper. BI and tests for selection into core medical training: evidence
MW contributed to the overall study design and organised from two validation studies. Clin Med 2009;9 (5):417–
data collection. LZ contributed to the interpretation of data 20.

ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408 407
A Koczwara et al

12 Patterson F, Zibarras LD, Carr V, Irish B, Gregory S. 25 Bauer TN, Truxillo DM, Sanchez RJ, Craig JM, Ferrara
Evaluating candidate reactions to selection practices P, Campion MA. Applicant reactions to selection:
using organisational justice theory. Med Educ 2011; development of the selection procedural justice scale
45 (3):289–97. (SPJS). Pers Psychol 2001;54 (2):387–419.
13 Zibarras LD, Patterson F. Applicant reactions, 26 Anderson N, Witvliet C. Fairness reactions to personnel
perceived fairness and ethnic differences in a live, selection methods: an international comparison
high-stakes selection context. Division of Occupational between the Netherlands, the United States, France,
Psychology Conference, 14–16 January 2009, Black- Spain, Portugal, and Singapore. Int J Select Assess
pool, UK. 2008;16 (1):1–13.
14 Patterson F, Ferguson E, Norfolk T, Lane P. A new 27 Nikolaou I, Judge TA. Fairness reactions to personnel
selection system to recruit general practice registrars: selection techniques in Greece: the role of core
preliminary findings from a validation study. BMJ self-evaluations. Int J Select Assess 2007;15 (2):
2005;330:711–4. 206–19.
15 Raven J, Raven JC, Court JH. Manual for Raven’s 28 Smither JW, Reilly RR, Millsap RE, Pearlman K, Stoffey
Progressive Matrices and Vocabulary Scales. The Advanced RW. Applicant reactions to selection procedures. Pers
Progressive Matrices. Oxford: Oxford Psychologists Press Psychol 1993;46:49–76.
1998. 29 Schmit MJ, Ryan AM. Applicant withdrawal: the role of
16 Patterson F, Ferguson E, Lane P, Farrell K, Martlew J, test-taking attitudes and racial differences. Pers Psychol
Wells A. A competency model for general practice: 1997;50 (4):855–76.
implications for selection, training, and development. 30 Murphy KR. When your top choice turns you down:
Br J Gen Pract 2000;50:188–93. effect of rejected offers on the utility of selection tests.
17 Hopton T, MacIver R, Saville P, Kurz R. Analysis Psychol Bull 1986;99:133–8.
Aptitude Range Handbook. St Helier: Saville Consulting 31 Anderson N. Perceived job discrimination (PJD):
Group 2010. toward a model of applicant propensity to case
18 Cattell RB. Intelligence: Its Structure, Growth, and Action. initiation in selection. Int J Select Assess 2011;19:
New York, NY: Elsevier Science Publishing 1987. 229–44.
19 Lievens F, Peeters H, Schollaert E. Situational judge-
ment tests: a review of recent research. Pers Rev 2008;37
(4):426–41. SUPPORTING INFORMATION
20 Christian MS, Edwards BD, Bradley JC. Situational
judgement tests: constructs assessed and a meta-analysis Additional supporting information may be found in the
of their criterion-related validities. Pers Psychol online version of this article. Available at: http://online
2010;63:83–117. library.wiley.com/doi/10.1111/j.1365-2923.2011.04195.x/
21 McDaniel MA, Morgeson FP, Finnegan EB, Campion suppinfo
MA, Braverman EP. Use of situational judgement tests
to predict job performance: a clarification of the liter- Table S1. Description of the tests and outcome measures
ature. J Appl Psychol 2001;86 (4):730–40. used in the study, with examples.
22 Weekley JA, Jones C. Video-based situational testing.
Pers Psychol 1997;50 (1):25–49.
Please note: Wiley-Blackwell is not responsible for the
23 Motowidlo SJ, Beier ME. Differentiating specific job
content or functionality of any supporting materials sup-
knowledge from implicit trait policies in procedural
plied by the authors. Any queries (other than for missing
knowledge measured by a situational judgement test.
material) should be directed to the corresponding author
J Appl Psychol 2010;95 (2):321–33.
for the article.
24 Lievens F, Patterson F. The validity and incremental
validity of knowledge tests, low-fidelity simulations, and
high-fidelity simulations for predicting job perfor- Received 3 June 2011; editorial comments to authors 16 August
mance in advanced level high-stakes selection. J Appl 2011, 12 October 2011; accepted for publication 10 November
Psychol 2011;96 (5):927–40. 2011

408 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 399–408

You might also like