0% found this document useful (0 votes)
205 views6 pages

Sample Test Specification

The document describes the development of an achievement test to measure student progress after a 3-month English reading course. It details the test specifications, including content, structure, timing, scoring procedures, and trials conducted during development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views6 pages

Sample Test Specification

The document describes the development of an achievement test to measure student progress after a 3-month English reading course. It details the test specifications, including content, structure, timing, scoring procedures, and trials conducted during development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

• a description of the test, giving details of sections, timings, etc.

7
(which may include a version of the speci cations);

Stages of test development


• sample items (or a complete sample test);
• advice on preparing for taking the test;
• an explanation of how test scores are to be interpreted;
• training materials (for interviewers, raters, etc.);
• details of test administration.

The handbooks should be made available in print form or/and online.

10. Training staff


Using the handbook and other materials, all staff who will be
involved in the test process should be trained. This may include
interviewers, raters, scorers, computer operators and invigilators
(proctors).

11. Test maintenance


If a test is to be used repeatedly over time, statistical and qualitative
analysis should be carried out regularly in order to identify any
problems that may have crept in. At some point, alternative versions
are likely to become necessary, as word spreads of the original
test’s content. In this case, the development process will have to be
repeated, beginning with the writing of items (assuming there is no
perceived need to change the speci cations).

Two examples of test development follow.

EXAMPLE OF TEST DEVELOPMENT 1: AN ACHIEVEMENT TEST


Statement of the problem
There is a need for an achievement test to be administered at the end of a
pre-sessional course of training in the reading of academic texts in the social
sciences and business studies (the students are graduates who are about
to follow postgraduate courses in English-medium universities). The teaching
institution concerned (as well as the sponsors of the students) wants to know
just what progress is being made during the three-month course. The test must
therefore be suf ciently sensitive to measure gain over that relatively short

71
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press
Stages of test development
period. While there is no call for diagnostic information on individuals, it would
be useful to know, for groups, where the greatest dif culties remain at the end
of the course, so that future courses may give more attention to these areas.
Backwash is considered important; the test should encourage the practice of
the reading skills that the students will need in their university studies. This is,
in fact, intended to be only one of a battery of tests, and a maximum of two
hours can be allowed for it. It will not be possible at the outset to write separate
tests for different subject areas. social sciences & business studies

Speci cations
7

Content
Operations These are based on the stated objectives of the course, and
include expeditious and slower, careful reading.

Expeditious reading: Skim for main ideas; search read for information; scan to
nd speci c items in lists, indexes, etc.

Slower, careful reading: Construe the meaning of complex, closely argued


passages.

Underlying skills that are given particular attention in the course:


• Guessing the meaning of unfamiliar words from context;
• Identifying referents of pronouns etc. often some distance removed in the text.

Types of text The texts should be authentic, academic (taken from textbooks
and journal articles).

Addressees Academics at postgraduate level and beyond.

Lengths of texts Expeditious: c. 3,000 words Careful: c. 800 words.

Topics The subject areas will have to be as ‘neutral’ as possible, since the
students are from a variety of social science and business disciplines
(economics, sociology, management etc.).

Readability Not speci ed.

Structural range Unlimited.

Vocabulary range General academic, not specialist technical.

Dialect and style Standard American or British English dialect. Formal, academic
style.

Speed of processing Expeditious: 300 words per minute (not reading all words).
Careful: 100 words per minute.

Structure, timing, medium and techniques


Test structure Two sections: expeditious reading; careful reading.

Number of items 30 expeditious; 20 careful. Total: 50 items.

Number of passages 3 expeditious; 2 careful.

72
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press
7
Timing Expeditious: 15 minutes per passage (each passage collected after 15

Stages of test development


minutes).
Careful: 30 minutes (passage only handed out after 45 minutes, when
expeditious reading has been completed).
TOTAL: 75 minutes.

Medium Paper-and-pencil. Each passage in a separate booklet.

Techniques Short answer and gap lling for both sections.

Examples:

a) For inferring meaning from context:


For each of the following, nd a single word in the text with an
equivalent meaning. Note: the word in the text may have an ending
such as -ing, -s, etc.
highest point (lines 20–35)

b) For identifying referents:


What does each of the following refer to in the text? Be very precise.
the former (line 43)

Criterial levels of performance


Satisfactory performance is represented by 80 percent accuracy in each of the
two sections.
The number of students reaching this level will be the number who have
succeeded in terms of the course’s objectives.

Scoring procedures
There will be independent double scoring. Scorers will be trained to ignore
irrelevant (for example, grammatical) inaccuracy in responses.

Sampling
Texts will be chosen from as wide a range of topics and types of writing as
is compatible with the speci cations. Draft items will only be written after the
suitability of the texts has been agreed.

Item writing and moderation


Items will be based on a consideration of what a competent non-specialist
reader should be able to obtain from the texts. Considerable time will be set
aside for moderation and rewriting of items.

Informal trialling
This will be carried out on 20 expert speaker postgraduate students in the
university.

Trialling and analysis


Trialling of texts and items suf cient for at least two versions will be carried out
with students currently taking the course, with full qualitative and statistical

73
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press
Stages of test development
analysis. An overall reliability coef cient of 0.90 and a percent agreement (see
Chapter 5) of 0.85 are required.

Validation
There will be immediate content validation carried out by staff experienced in
teaching and testing.
Concurrent validation will be against tutors’ ratings of the students.
Predictive validation will be against subject supervisors’ ratings one month after
the students begin their postgraduate studies.
7

Handbooks
One handbook will be written for the students, their sponsors, and their future
supervisors.
Another handbook will be written for internal use.

EXAMPLE OF TEST DEVELOPMENT 2: A PLACEMENT TEST


Statement of the problem
A commercial English language teaching organisation (which has a number
of schools) needs a placement test. Its purpose will be to assign new
students to classes at ve levels: false beginners; lower intermediate; middle
intermediate; upper intermediate; advanced. Course objectives at all levels
are expressed in rather general ‘communicative’ terms, with no one skill being
given greater attention than any other. As well as information on overall ability
in the language, some indication of oral ability would be useful. Suf cient
accuracy is required for there to be little need for changes of class once
teaching is under way. Backwash is not a serious consideration. More than two
thousand new students enrol within a matter of days. The test must be brief
(not more than 45 minutes in length), quick and easy to administer, score and
interpret. Scoring by clerical staff should be possible. The organisation has
previously conducted interviews but the number of students now entering the
school is making this impossible.

Speci cations

Content
Operations Ability to predict missing words (based on the notion of ‘reduced
redundancy’5).

Length of text One turn (of a maximum of about 20 words) per person.

Types of text Constructed ‘spoken’ exchanges involving two people. It is hoped


that the spoken nature of the texts will, however indirectly, draw on students’
oral abilities.

5.
See Chapter 14 for a discussion of reduced redundancy.

74
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press
7
Topics ‘Everyday’. Those found in the textbooks used by the organisation.

Stages of test development


Structural range All those found in the textbooks (listed in the speci cations but
omitted here to save space).

Vocabulary range As found in the textbooks, plus any other common lexis.

Dialect and style Standard English English. Mostly informal style, some formal.

Structure, timing, medium and techniques


Test structure No separate sections.

Number of items 100 (though this will be reduced if the test is shown to do its
job well with fewer items).

Timing 30 minutes (Note: this seems very little time, but the more advanced
students will nd the early passages extremely easy, and will take very little time. It
does not matter whether lower-level students reach the later passages.)

Medium Pencil-and-paper.

Technique All items will be gap lling. One word per gap. Contractions count as
one word. Gaps will relate to vocabulary as well as structure (not always possible
to distinguish what is being tested).

Examples: A: Whose book that?


B: It’s mine.
A: How did you learn French?
B: I just picked it as I went along.

Criterial levels of performance


These will only be decided when comparison is made between performance
on the test and (a) the current assignment of students by the interview and
(b) the teachers’ view of each student’s suitability to the class they have been
assigned to by the interview.

Scoring procedures
Responses will be on a separate response sheet. A template with a key will be
constructed so that scoring can be done rapidly by clerical staff.

Informal trialling
This will be carried out on 20 rst-year expert speaker undergraduate students.

Trialling and analysis


Many more items will be constructed than will nally be used. All of them (in as
many as three different test forms, with linking anchor items) will be trialled on
current students at all levels in the organisation. Problems in administration and
scoring will be noted.

75
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press
Stages of test development
After statistical and qualitative analysis, one test form made up of the ‘best’ items
will be constructed and trialled on a different set of current students. The total
score for each of the students will then be compared with his or her level in the
institution, and decisions as to criterial levels of performance made.

Validation
The nal version of the test will be checked against the list of structures in the
speci cations. If one is honest, however, one must say that at this stage content
validity will be only a matter of academic interest. What will matter is whether the
test does the job it is intended for. Thus the most important form of validation will be
7

criterion-related, the criterion being placement of students in appropriate classes,


as judged by their teachers (and possibly by the students themselves). The smaller
the proportion of misplacements, the more valid the test.

Handbook
A handbook will be written for distribution by the organisation to its various schools.

READER ACTIVITIES
On the basis of experience or intuition, try to write a speci cation for a
test designed to measure the level of language pro ciency of students
applying to study an academic subject in the medium of a foreign
language at an overseas university. Compare your speci cation with those
of tests that have actually been constructed for that purpose.

FURTHER READING

Test development process


O’Sullivan (2012b) presents an outline of the test development process.
Davidson and Fulcher (2012) offer advice on the development of test
speci cations. Speci cations for a test designed to assess the level
of English of students wishing to study at tertiary level in the UK, the
Test of English for Educational Purposes (TEEP), are to be found in Weir
(1988, 1990).
For other models of test development see Alderson et al. (1995) and
Bachman and Palmer (1996). The model used by Bachman and Palmer is
highly detailed and complex but their book gives information on ten test
development projects.
Alderson and Buck (1993) report on the test development procedures of
certain British testing bodies.

Common European Framework


Language Testing 22, 3 (2005) includes a number of articles about the
use of the Common European Framework (see Online resources, below) in
language testing.

76
https://doi.org/10.1017/9781009024723.007 Published online by Cambridge University Press

You might also like