OBJECTIVES This study aimed to evaluate the and SJT independently predicted more vari-
validity and utility of and candidate reactions ance than the cognitive ability test measuring
towards cognitive ability tests, and current non-verbal mental ability. The other cognitive
selection methods, including a clinical problem- ability test (measuring verbal, numerical and
solving test (CPST) and a situational judgement diagrammatic reasoning) had a predictive value
test (SJT), for postgraduate selection. similar to that of the CPST and added signifi-
cant incremental validity in predicting perfor-
METHODS This was an exploratory, longitu- mance on job simulations in an SC. The best
dinal study to evaluate the validities of two single predictor of performance at the SC was
cognitive ability tests (measuring general intel- the SJT. Candidate reactions were more positive
ligence) compared with current selection tests, towards the CPST and SJT than the cognitive
including a CPST and an SJT, in predicting ability tests.
performance at a subsequent selection centre
(SC). Candidate reactions were evaluated CONCLUSIONS In terms of operational
immediately after test administration to validity and candidate acceptance, the com-
examine face validity. Data were collected from bination of the current CPST and SJT proved
candidates applying for entry into training in to be the most effective administration of
UK general practice (GP) during the 2009 tests in predicting selection outcomes. In
recruitment process. Participants were junior terms of construct validity, the SJT measures
doctors (n = 260). The mean age of partici- procedural knowledge in addition to aspects
pants was 30.9 years and 53.1% were female. of declarative knowledge and fluid abilities
Outcome measures were participants’ scores on and is the best single predictor of perfor-
three job simulation exercises at the SC. mance in the SC. Further research should
consider the validity of the tests in this study
RESULTS Findings indicate that all tests mea- in predicting subsequent performance in
sure overlapping constructs. Both the CPST training.
Correspondence: Professor Fiona Patterson, Department of Social and
Developmental Psychology, University of Cambridge, Free School
Lane, CB2 3RQ, Cambridge, UK. Tel: 00 44 7847 600630;
E-mail: [email protected]
School of Primary Care, Severn Deanery, Bristol, UK
NHS Midlands and East, West Midlands GP Education,
Birmingham, UK
Table 1 Description of the tests and outcome measures used in the study (Table S1 [online] gives examples of items in the tests )
Ravens Advanced Power test measuring fluid intelligence The advanced matrices differentiate between candidates at
Progressive Power tests generally do not impose a time limit on the high end of ability
Matrices completion The test has 23 items, each with 8 response options;
The non-verbal format reduces culture bias thus a maximum of 23 points are available (1 point per
correct answer)
Swift Analysis Speed test measuring fluid intelligence and some Three subsets, each with 8 items with 3–5 response
Aptitude aspects of crystallised intelligence options; thus a maximum of 24 points are available
Speed tests focus on the amount of questions (1 point per correct answer)
answered correctly within a specific timeframe
Clinical Clinical problem-solving test measuring crystallised The CPST has 100 items to be completed within
problem-solving abilities, especially declarative knowledge 90 minutes
test (CPST) It is designed as a power test
Situational Test designed to measure non-cognitive The SJT has 50 items to be completed in 90 minutes
judgement professional attributes beyond clinical knowledge There are two different types of items: ranking and choice
test (SJT) It is designed as a power test
Selection centre Multitrait–multimethod assessment centre in which (i) A group exercise, involving a group discussion referring
candidates are rated on their observed behaviours to a work-related issue
in three exercises (ii) A simulated patient consultation in which the candidate
plays the role of the doctor and an actor plays the
(iii) A written exercise in which candidates prioritise a list
of work-related issues and justify their choices
In making judgements about response options, can- correlation of r = 0.46 with cognitive ability.20 In the
didates need to apply relevant knowledge, skills and present context, although the SJT was designed to
abilities to resolve the issues they are presented measure non-cognitive domains, we may expect a
with.19,20 positive association between the SJT and the two
cognitive ability tests. This is because theory suggests
We would anticipate some relationship among all that intelligent people may learn more quickly about
four tests as they measure overlapping constructs. the non-cognitive traits that are more effective in the
Because the two cognitive ability tests both measure work-related situations described in the SJT.23
similar constructs, we might expect a reasonably
strong correlation between the two. The CPST, The present study aimed to evaluate the construct,
although essentially a measure of crystallised intelli- predictive, incremental and face validity of two
gence, is likely to also entail an element of fluid cognitive ability tests in comparison with the present
intelligence as verbal reasoning is necessary to CPST and SJT selection tests. In examining predictive
understand the situations presented in the question and incremental validity, we used a previously vali-
items. Thus, we would expect the CPST to be dated approach11 with overall performance at the SC
positively related to the cognitive ability tests. The as an outcome measure. Performance at the SC has
construct validity of SJTs is less well known; research been linked to subsequent training performance14
suggests that they may relate to both cognitive and predicts supervisor ratings 12 months into train-
ability21 and learned job knowledge.22 Indeed, a ing.24 We therefore posed the following four research
meta-analysis showed that SJTs had an average questions:
1 Construct validity: what are the inter-correlations and SJT predictors would be added in the first step,
among the two cognitive ability tests, the CPST and then the cognitive ability test(s) would be added
and the SJT? in the second step to determine the additional
2 Predictive validity: do scores on each of the tests variance in the SC score predicted by the cognitive
independently predict subsequent performance ability test. Secondly, we used stepwise regression
at the SC? analysis. With this method, the order of relevance of
3 Incremental validity: compared with the current predictor variables is identified by the statistical
CPST and SJT, do the cognitive ability tests package (SPSS version 17.0; SPSS, Inc., Chicago, IL,
(NVMA and CATB) each account for additional USA) to establish which predictors independently
variance in performance at the SC? predict the most variance in SC scores.
4 Face validity: do candidates perceive the tests to
be fair and appropriate? Sampling
n Mean SD Range 1 2 3 4 5 6 7 8
B SE B b B SE B b
and SJT (r = 0.34), and also between the CATB and combined, a hierarchical multiple regression was
CPST (r = 0.41) and SJT (r = 0.47) (all p < 0.01) performed (Table 3). Scores on the CPST and SJT
(Table 3).Thus, the cognitive ability tests have both were entered in the first step, which explained 31.3%
common and independent variance with the CPST of the variance in SC overall score (R2 = 0.31,
and SJT and to some extent measure overlapping F(2,216) = 49.24; p < 0.001); however, adding the
constructs. NVMA in the second step offered no unique variance
over the CPST and SJT (DR2 = 0.01, F(2,215) = 2.85;
Do scores on each of the tests independently predict p = 0.09). A stepwise regression was also performed
subsequent performance at the SC? (Table 3) to establish which tests independently
predicted the most variance in SC scores. Scores on
The analyses in Table 2 showed a positive correlation the SJT were entered into the first step, indicating
between NVMA scores and all SC exercises (group, that the SJT explains the most variance (28.9%) of all
r = 0.19; simulation, r = 0.29 [both p < 0.01]; written, the tests (R2 = 0.29, F(1,217) = 88.02; p < 0.001).
r = 0.15 [p < 0.05]). However, in comparison with the Scores on the CPST were entered into the second and
CPST and SJT, both the CPST and SJT had substan- final step, explaining an additional 2.5% of the
tially higher correlations with the three SC exercises variance (DR2 = 0.03, F(2,216) = 7.73; p = 0.01). The
(CPST, r = 0.19–0.32; SJT, r = 0.28–0.40 [all NVMA was not entered into the model at all,
p < 0.01]). Furthermore, both the CPST and SJT had confirming its lack of incremental validity over the
higher correlations with overall SC scores (r = 0.38 two current short-listing assessments.
and r = 0.50, respectively) compared with the NVMA
(r = 0.30) (all p < 0.01). To establish the extent to which the CATB predicted
SC performance over and above both the CPST and
Results show a positive correlation between the CATB SJT, a hierarchical multiple regression was performed
and the SC exercises (group, r = 0.28; simulation, (Table 3), repeating the method described above.
r = 0.32; written, r = 0.23 [all p < 0.01]). The CPST Scores on the SJT and CPST were entered into the
correlated with the group and simulation exercises to first step and explained 28.6% of the variance
the same extent as the CATB (r = 0.28 and r = 0.32, (R2 = 0.29, F(2,185) = 36.99; p < 0.001); entering the
respectively [both p < 0.01]), but had a lower corre- CATB into the second step explained an additional
lation with the written exercise (r = 0.19 [p < 0.01]) 1.7% of the variance in overall SC performance
(Table 2). The SJT had higher correlations with all (DR2 = 0.02, F(1,184) = 4.44; p = 0.04). A stepwise
exercises (group, r = 0.39; written, r = 0.28; simula- regression was also performed, entering SJT scores
tion, r = 0.40 [all p < 0.01]). Further, the CPST had a into the first step. This explained the most variance in
lower correlation with overall SC scores compared SC performance (25.5%) of all the tests (R2 = 0.26,
with the CATB (r = 0.38 and r = 0.39, respectively F(1,186) = 63.53; p < 0.001). Scores on the CPST were
[both p < 0.01]), but the SJT had a higher correla- entered into the second step, explaining an addi-
tion (r = 0.50 [p < 0.01]). tional 3.1% of the variance (DR2 = 0.03,
F(1,185) = 8.05; p = 0.005). Finally, scores on the
Thus, overall findings indicate that, of all the tests, CATB were entered into the final step, explaining an
the SJT had the highest correlations with perfor- additional 1.7% of the variance (DR2 = 0.02,
mance at the SC. The SJT and CPST were more F(1,184) = 4.44; p = 0.04). Overall, findings indicate
effective predictors of subsequent performance than that the CATB does add some incremental validity in
the NVMA; the SJT was a better predictor of perfor- predicting SC performance.
mance at the SC than the CATB, and the CPST had a
similar predictive value to the CATB. A stepwise regression was also performed to establish
which of the four tests (SJT, CPST, NVMA, CATB)
Compared with the current CPST and SJT, do the independently predicted the most variance in SC
cognitive ability tests (NVMA and CATB) each scores. Scores on the SJT were entered into the first
account for additional variance in performance at the step, indicating that the SJT explains the most
SC? variance (25.5%) of all the tests (R2 = 0.26,
F(1,186) = 63.53; p < 0.001). Scores on the CPST were
We established the extent to which the cognitive entered into the second step, explaining an addi-
ability tests each accounted for additional variance tional 3.1% of the variance (DR2 = 0.03,
above the current CPST and SJT. To examine the F(1,185) = 8.05; p = 0.005). Finally, CATB scores were
extent to which NVMA scores predicted overall SC entered into the third step, explaining an additional
scores over and above the CPST and SJT scores 1.7% of the variance (DR2 = 0.02, F(1,184) = 4.44;
p = 0.04). The NVMA was not entered into the model SJT. This notably reflected perceptions of the CATB
at all and thus lacked incremental validity over the as providing insufficient opportunity to demonstrate
other tests. Overall, findings indicate that the NVMA ability and as not helping selectors to differentiate
does not add incremental validity in predicting SC among candidates.
performance, but the CATB does add some incre-
mental validity.
Do candidates perceive the tests as fair and
appropriate? Two cognitive ability tests were evaluated as potential
selection methods for use in postgraduate selection
All participants were asked to complete a previously and were compared with the current selection tests,
validated candidate evaluation questionnaire,12,25 the CPST and SJT. Results show positive and signif-
based on procedural justice theory, regarding their icant relationships among the CPST and SJT and
perceptions of the tests. A total of 249 candidates cognitive ability tests, indicating that they measure
completed the questionnaire (96% response rate), in overlapping constructs. For both the CPST and the
which they indicated their level of agreement with SJT, the correlation with the CATB was higher than
several statements regarding the content of each test. with the NVMA. This is probably because the CATB,
These results are shown in Table 4, along with CPST and SJT are all presented in a verbal format,
feedback on the SJT and CPST from the live 2009 whereas the NVMA has no verbal element and is
selection process. Overall results show that the CPST presented in a non-verbal format.
received the most favourable feedback, followed by
the SJT. Candidates did not react favourably to the Implications for operational validity
NVMA, mainly as a result of perceptions of low job
relevance and insufficient opportunity to demon- Considering both the predictive validation analyses
strate ability. The CATB was also relatively negatively and candidate reactions, the CPST and SJT (mea-
received; feedback was slightly better than for the suring clinical knowledge and non-cognitive profes-
NVMA but still markedly worse than for the CPST and sional attributes, respectively) have been shown to be
SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA SD ⁄ D N A ⁄ SA
NVMA = non-verbal mental ability test; CATB = cognitive ability test battery; CPST = clinical problem-solving test; SJT = situational
judgement test; SD ⁄ D = strongly disagree or disagree; N = neither agree nor disagree; A ⁄ SA = agree or strongly agree; GP = general practice
As the SJT was the best single predictor of SC Although the CATB may appear to offer some
performance, it could be argued that the constructs advantages relating to cost savings in terms of
that best predict subsequent performance in job administration and test development, there are sev-
simulations include a combination of crystallised and eral reasons why replacing the CPST with the CATB
fluid intelligence, along with ‘non-cognitive’ profes- might have negative implications in this context. The
sional attributes measured by the SJT. The construct first reason relates to patient safety: using the general
validity of SJTs has been a subject for debate amongst cognitive ability tests alone would not allow the
researchers and we argue that the results presented detection of insufficient clinical knowledge. A test of
here provide further insights. Motowidlo and Beier23 attainment (e.g. the CPST) appears particularly
suggest that the procedural knowledge measured by important in this setting. Secondly, candidate per-
an SJT includes implicit trait policies, which are ceptions of fairness are not favourable towards
implicit beliefs regarding the costs and benefits of generic cognitive ability tests with regard to job
how personality is expressed and its effectiveness in relevance. This finding supports research on appli-
specific jobs (which is likely to relate to the way in cant reactions in other occupations in which cogni-
which professional attributes are expressed in a work tive ability tests were not positively received by
context). Results suggest that the SJT broadens the candidates26,27 and were perceived to lack relevance
constructs being measured (beyond declarative to any given job role.28 Such negative perceptions
knowledge and fluid intelligence) and therefore the among candidates can result in undesirable out-
SJT demonstrates the highest criterion-related validity comes, such as the loss of competent candidates from
in predicting performance in high-fidelity job simu- the selection process,29 which has a subsequent
lations in which a range of different work-related negative effect on the utility of the selection pro-
behaviours are measured (beyond knowledge and cess.30 Furthermore, extreme reactions may lead to
cognitive abilities). However, in order to build and an increased propensity for legal case initiation by
extend the current research and to test the ideas candidates.31 By contrast, the CPST received the most
presented by Motowidlo and Beier,23 we recommend approving feedback from candidates compared with
that future research exploring the construct validity all the other tests (NVMA, CATB and SJT) and its
of the SJT should also include a measure of person- immediate job relevance and fairness were favourably
ality and implicit trait policies to test this possible perceived. In this context, rejecting applicants to
association. We therefore present Fig. 1, a diagram specialty training on the basis of generalised cognitive
illustrating potential pathways among variables that ability tests alone may be particularly sensitive
could be considered in future research. because the non-selection of candidates based on
cognitive ability test scores may be at odds with their and drafted the initial manuscript. All authors contributed
previous achievement of high academic success that to the critical revision of the paper and approved the final
enabled their initial selection into medical school. manuscript for publication.
Our findings may suggest that there is a trade-off Acknowledgements: Phillipa Coan, Work Psychology Group
between costs and time constraints, positive candi- Ltd, is acknowledged for her contribution to data analysis.
Gai Evans is acknowledged for her significant contribution
date perceptions and using tests that identify clinical
to data collection via the General Practice National
knowledge and clinically relevant attributes. This
Recruitment Office, UK. Paul Yarker, Pearson Education,
represents a dilemma in terms of balancing the need London, UK, and Dr Rainer Kurz, Saville Consulting,
to reduce costs and administrative and development Surrey, UK, are acknowledged for their help in accessing
time, while ensuring that the most appropriate test materials for research purposes.
knowledge, skills and attributes are assessed during Funding: this work was funded by the Department of
selection in a manner that is also acceptable to Health in the UK.
candidates. Conflicts of interest: AK, FP and MK have provided advice to
the UK Department of Health in selection methodology
Finally, results showed that the two current selection through the Work Psychology Group Ltd.
tests (the CPST and SJT) assess cognitive ability to an Ethical approval: this study was approved by the Psychology
extent, but that they also assess other constructs likely Research Ethics Committee, City University, London.
to relate more closely to behavioural domains that are
important in the selection of future general practi-
A Koczwara et al
A Koczwara et al
