K-3 Reading Assessment
K-3 Reading Assessment
K-3 Reading Assessment
INTRODUCTION
Anyone who has worked as an elementary school teacher or principal understands the value of reliable and valid assessments of early reading progress. Timely, reliable assessments indicate which children are falling behind in critical reading skills so teachers can help them make greater progress in learning to read. Reliable and valid assessments also help monitor the effectiveness of instruction for all children; without regularly assessing childrens progress in learning to read, we cannot know which children need more help and which are likely to make good progress without extra help. Because scientific studies have repeatedly demonstrated the value of regularly assessing reading progress1,2, a comprehensive assessment plan is a critical element of an effective school-level plan for preventing reading difficulties. The general principles outlined in this guide, such as the early identification of students who are struggling in learning to read3, are all based on scientific findings, but the detailed recommendations for implementation derive from practical experiences in helping many school leaders implement successful plans.
2.
3. 4.
Kindergarten
Kindergarten students require sensitive assessments of their growth in phonemic awareness, phonics skills (knowledge of letters and beginning phonemic decoding ability), and vocabulary. Their reading skills are rarely sufficiently developed to usefully assess text reading fluency and reading comprehension. Sometimes listening comprehension is assessed instead of reading comprehension to identify students whose language processing skills place them at risk for difficulties comprehending text once they can read words fluently and accurately.
Grade 1
It is important to continue monitoring students development of phonemic awareness in first grade because struggling students may continue to have difficulty in this area. The development of accurate and fluent phonemic decoding skills should also be monitored in first grade, since these foundational skills for reading accuracy undergo major development in this period. As soon as students can begin to read connected text with reasonable accuracy, their development of oral reading fluency should be monitored. Oral measures of young childrens reading fluency are much more reliable5 than measures of silent reading fluency. Oral reading fluencys importance as an index of reading growth extends from first through third grades. Continued growth in vocabulary should also be assessed, and reading comprehension can be reliably assessed in most students by the end of first grade.
Grade 2
Second graders may need continued monitoring of their phonemic decoding ability, especially for multi-syllable words, particularly in schools with high proportions of poor and minority students, who have traditionally been at risk for difficulties with the early mastery of these skills. Continued monitoring of
reading fluency is critical through second grade, since students must make strong growth in this skill to maintain grade-level reading proficiency. A comprehensive assessment plan should also measure second graders vocabulary and reading comprehension.
Grade 3
The primary dimensions of reading growth that should be monitored in third grade are reading fluency, vocabulary, and reading comprehension.
Screening Tests
Briefly administered, screening tests provide an initial indication of which students are entering the school year at risk for reading difficulties because
Informal Reading Inventories Informal reading inventories are often used to gain a level of detail about students specific skills and knowledge that is not typically provided by formal screening, progress monitoring, or diagnostic assessments. For example, some informal inventories provide information about the specific letter-sounds a student may have mastered, or the types of words he or she can accurately decode using phonemic decoding strategies, or the types of errors students make most frequently when reading orally. However, information about test reliability and validity is not usually provided for informal reading inventories. School leaders and teachers should examine the specific information each element of their comprehensive assessment plan provides, and determine the most efficient way to gain the information necessary for planning classroom instruction and making decisions about allocating schoollevel resources. The goal is to gain enough information about student progress to make effective decisions while minimizing the time spent administering assessments. With a fully implemented comprehensive assessment plan such as the one described here, there may be less need for informal reading inventories than in the past. Much of the information these inventories provide can be gathered through careful observation during instruction. Informal reading inventories might be used much as formal diagnostic tests are: only when there is a well defined need for additional information that will be directly helpful in making instructional decisions.
they are lagging in the development of critical reading skills. Valid and reliable screening tests can help teachers differentiate their instruction based on what students already know and can do.
The second type of progress monitoring test has a shorter history of use in American schools. Sometimes referred to as general or external progress monitoring tests, they measure critical reading skills such as phonemic awareness, phonics, fluency, vocabulary, or comprehension, but are not tied to any specific reading curriculum.7 Rather, through extensive development research, these tests establish performance targets, or benchmarks for different points in the school-year (i.e., beginning, middle, and end) that predict success in meeting grade-level reading standards by the end of the year. When administered at the end of the school year, these tests also identify students who will likely have trouble meeting grade-level standards at the end of the next school year unless they receive extra help. For example, a general progress monitoring test might establish an oral reading fluency target, or benchmark of 69 correct words per minute by February of second gradea target associated with a high probability of meeting the end of the year grade-level standard on a measure of reading comprehension. Another example would be a benchmark of being able to blend three-phoneme words by the end of kindergarten in order to be prepared for success in learning phonemic decoding skills during first grade. General progress monitoring tests provide performance targets teachers can aim for in order to ensure that their students are on track for meeting grade-level reading standards by the end of the school year. Examples of widely used general progress monitoring tests are the Dynamic Indicators of Basic Early Literacy Skills (DIBELS),8 the Texas Primary Reading Inventory (TPRI),9 and the Phonological Awareness Literacy Screening (PALS)10 tests. Curriculum-based measurement is a general term that is frequently used as an umbrella term for general progress monitoring tests. The National Center on Student Progress Monitoring (http://www.studentprogress.org/) provides extensive information
Formative and Summative Assessment The term formative assessment has been widely used to describe assessments that serve essentially the same purpose as the progress monitoring tests described here. Basically, both serve to provide information about student progress in order to make mid course corrections or improvements to instruction. Typically, formative assessment is contrasted with summative assessment, or the assessment of a final outcome or product. Summative assessment is synonymous with the term outcome assessment used in this guide.
about the use of progress monitoring assessment to guide instruction. They have also conducted evaluative reviews of various progress monitoring tests and have made these available on their website.
Diagnostic Tests
Relatively lengthy, diagnostic tests provide an in-depth, reliable assessment of important component skills in reading. Their major purpose in the early elementary grades is to provide information for planning more effective instruction. Diagnostic tests should be given when there is a clear expectation that they will offer new, or more reliable, information about a childs reading difficulties that can be used to help plan more powerful instruction.
Diagnostic Tests and Diagnostic Information
It is important to distinguish between diagnostic tests and diagnostic information. Diagnostic information is any knowledge about a childs skills and abilities that is useful in planning instruction. It can come from student work, teacher observations, or other tests, as well as diagnostic tests. For example, if a child performs poorly on a test of reading comprehension at the end of second grade, it would be useful to know if he or she is impaired in reading fluency or accuracy, knowledge of word meanings, general background knowledge, or use of efficient comprehension strategies. Any information gathered about the childs knowledge and skill in the components of reading comprehension is diagnostic information that could be used to direct instructional interventions. In another example, if a child were struggling to acquire fluent and efficient phonemic decoding skills (phonics), it would be useful to have reliable information about his or her level of phonemic awareness and letter-sound knowledge, since both are required to understand and use the alphabetic principle in reading. If the child were relatively strong in phonemic awareness, but had a poorly developed knowledge of letter-sound relationships, this information could be used to focus intervention work. Diagnostic tests are one important way to obtain diagnostic information that can help guide interventions for students who are experiencing difficulty learning to read. However, reliable and valid diagnostic information can come from sources other than formal diagnostic tests.11
If schools are implementing screening, external progress monitoring, and outcome assessments in a reliable and valid way, the need for additional testing using formal diagnostic instruments should be reduced. For example, reliable and valid screening measures are available in K-3 for phonemic awareness, phonics, reading fluency, and vocabulary. There are also reliable and valid measures to monitor progress throughout the year in phonemic awareness, letter knowledge, phonics, and reading fluency. If these components are reliably assessed at the beginning of the year and several times during the year with screening and progress monitoring instruments, the resulting diagnostic information may prevent the need for additional assessment with formal diagnostic tests. For example, if a school used reliable and valid screening tests for phonemic awareness, phonics, and vocabulary at the beginning of first grade, a certain percentage of children would be identified as at risk because of low performance on these measures. The question becomes, should these at-risk students be administered an additional diagnostic test to learn more about a broader range of components than were tested on the screening measures? The answer would be, only if this information could be used to plan additional instruction for the at-risk students. The screening measure would already provide information for three major components of reading that can be measured reliably at the beginning of first grade. Based on diagnostic information from the screening measures, interventions in critical components of reading could begin immediately, rather than waiting for additional information generated by diagnostic tests. The argument for not doing additional diagnostic testing in this case is that it would not likely add any critical information for planning effective interventions, and might delay the start of necessary interventions for these at-risk students.
Using Diagnostic Tests with At-Risk Students
Whether an additional diagnostic measure should be given after a student has been identified as at risk by a screening or progress monitoring measure depends on two things. First, the reliability with which each critical reading component has been assessed is key: If there is some question about whether the child performed poorly because the test was improperly administered, or the child was having a bad day, a diagnostic test could be used to confirm
the finding about the need for additional instruction. (Less expensively, a different form of the screening or progress monitoring measure could be readministered.) Second, if the screening or progress monitoring battery did not assess all the dimensions of reading or language skill relevant to planning an effective intervention, a diagnostic assessment could help fill any remaining gaps in understanding the childs knowledge and skill. A number of situations might arise in which knowledge beyond that provided in a screening or external progress monitoring measure would be useful in planning instruction. For example, in some instructional programs, a program-specific placement test is used to help place the child at exactly the right spot in the programs instructional sequence. Further, the childs teacher might find it useful to know precisely which letter-sound correspondences a child knows, or in which sight words he or she is fluent. However, neither type of information is typically provided by standardized diagnostic tests. Rather, this information is gained through a program-specific placement test, or less formal teacher-administered tests. In summary, the screening, progress monitoring, and outcome elements of a comprehensive assessment plan often provide valid and reliable diagnostic information about a childs instructional needs. Because they are timeconsuming and expensive, complete diagnostic reading tests should be administered far less frequently than the other assessments, although specific subtests from diagnostic instruments might be used to provide information in areas not assessed by screening, progress monitoring, or outcome assessments. For example, if progress monitoring measures are not reliably assessing vocabulary, and a child is still struggling with reading comprehension at mid-year, the teacher might seek a mid-year diagnostic assessment of vocabulary to assess the childs skills on this key component of reading comprehension. School leaders should continually ask if the value to teachers of the information from formal diagnostic tests in planning instruction merits the time spent administering such tests.
As part of a comprehensive plan, they should be administered at the end of every year from kindergarten through third grade, although the kindergarten tests may differ greatly from those administered at the end of 1st, 2nd, and 3rd grades, once children have begun to acquire skills in reading comprehension. Longitudinal studies of reading have shown that students are much more likely to meet grade-level standards in reading at the end of third grade if they have met those standards in each preceding year (grades K-2).12 Thus, outcome tests at the end of grades K-2 are useful to school leaders to ensure that instruction in each grade is sufficiently powerful to keep most students on track for successful performance when they take important reading accountability measures at the end of third grade.
10
Curriculum-embedded progress monitoring tests should also be given whenever the teacher needs guidance on how well students have mastered the content or skill in the current unit of instruction. The time between assessments may vary depending on the curriculum being used or the topics being covered. Diagnostic tests are administered only when specific questions arise about instruction for individual students that cannot be answered from teacher observations, student work, and other forms of assessment (i.e., screening, progress monitoring, or outcome assessments). They should only be given when there is a clear expectation that they will provide information useful in planning more powerful instruction. Diagnostic tests are also sometimes required when evaluating students for placement in special education programs. Reading outcome tests are administered as close to the end of the year as practical to allow information from them to help make decisions about students for the coming year, and they should be given to all students for whom the test format is appropriate. Obviously, students with severe physical or mental disabilities or who are English language learners may need some form of alternate assessment, but the percentage of students excluded from the standard outcome assessment should be very small. Even though students with some forms of disability may not be expected to perform as highly as students without disabilities, they should still be expected to show reasonable progress on outcome assessments from year to year.
11
If tests will be used to make important decisions about individual students, the tests should meet reasonable standards of reliability and validity. For example, if students are assigned to receive intensive interventions on the basis of their performance on a screening or progress monitoring test, it is important that these tests reliably measure critical reading skills. Further, if information from the tests is to be used to help plan instruction within the interventions, then the tests used to assign students to particular groups should provide valid measurement of critical skills. Part of the process of selecting tests for use within a comprehensive assessment plan should always include examining the test manuals for information about the tests reliability and validity for the way it will be used within the overall assessment plan. The reading skill measured by a particular test, as well as its reliability and validity, are the major scientific considerations involved in selecting tests for use within a K-3 assessment plan. However, other considerations may also play a role, such as the initial cost of the test, the cost of individual test forms, and the amount of training required to administer the test. Best practice is to choose tests with sufficient evidence of reliability and validity that can also be administered and interpreted in a reliable and valid way by those who will administer and use the test data for making instructional decisions.
12
that the scores are entered into the data management system so that they are available to teachers. An advantage of an assessment team is that the tests are likely to be administered more consistently across all classes. A schoolwide assessment team also disrupts instruction less than using teachers to administer the tests. For example, if a progress monitoring assessment requires 10 minutes per student, then a teacher would need to spend slightly more than three hours doing nothing but administering the tests. Another advantage of an assessment team approach is that fewer people need to be trained to administer the tests. Some schools blend approaches, using teachers to administer the tests to some of their students, while the school-level team assesses the rest of the students.
Diagnostic tests are usually administered by an educational diagnostician or school psychologist or by a teacher or reading coach with extensive training in their administration and interpretation.13 Some diagnostic tests require that the person administering them have credentials in school or clinical psychology. The diagnostic tests that are most useful in planning instruction assess various dimensions of reading and language skill, and can usually be administered by a wider range of personnel than intelligence tests.
Group-administered, year-end outcome tests are usually administered by classroom teachers, often with proctoring help from someone outside the classroom.
2.
13
3.
All teachers or members of the school-level assessment team need to receive adequate training in administering the tests. It is important to remember that teachers may not be used to administering tests according to standard guidelines, yet these standard guidelines make test data interpretable across students and across testing intervals. One person needs to be designated to do the necessary follow-up and coordination to ensure that the testing is accomplished by all teachers, or across all students, during the time periods specified in the master testing schedule. A plan for scoring all tests must be developed and executed. A plan for entering and summarizing test data is necessary; typically, individual student scores will need to be transferred to a classroom, gradelevel, or school file.
4.
5. 6.
14
support or professional development in a given area?), more than one person (i.e., teacher, grade-level team leader, reading coach, assistant principal, principal) will need access to student data and reports. Some decisions can be based on individual student data, but others may require summaries of data at the classroom or grade level. Investing in an efficient data management tool is critical to the long-term success of a comprehensive assessment plan.
A tests reliability is the degree to which it provides a dependable, consistent measurement of some trait or ability.17 A reliable test is likely to produce similar estimates of a students ability no matter who gives the test (assuming they are well trained), or when it is administered (assuming testing is conducted at a reasonable time of day). A tests reliability is expressed as a number between 0 and 1, with .80 falling at the lower margin of acceptability, and .90 being the most desirable standard.18,19 A tests reliability can be calculated a number of ways; internal consistency reliability typically produces the highest estimates, while test-retest reliability often produces slightly lower estimates.
Validity
In the simplest terms, tests are said to be valid if they measure the trait or ability they say they are measuring. Unfortunately, it is easier to define validity than to demonstrate conclusively that a given test is valid, or to describe the level of validity with a single number, as in the case of test reliability. This is because a tests validity depends on the purpose for which it is used. In discussing a tests validity, it is always important to keep its purpose in mind. Most current textbooks on educational testing17,18,19 describe three important types of validity: 1) content description; 2) criterion prediction; and 3) construct identification. Content description validity simply refers to the extent and consistency with which the test items cover a representative sample of the knowledge or ability being measured. This type of validity is usually established by expert judgment and statistical analyses that show the items are consistent in the way they measure the knowledge or skill being tested.
15
Criterion prediction validity is usually established by determining whether performance on the test in question predicts outcomes in the way it should. For example, a screening test for phonemic awareness and letter knowledge at the beginning of kindergarten should predict a students ability to decode words phonemically at the end of the year. By the same token, a test of phonemic decoding ability in the middle of first grade should predict oral reading fluency by the end of the year. If these predictive relationships cannot be demonstrated, then something is wrong, either with the theory of reading development on which the tests are based, or with the tests themselves (i.e., perhaps they do not measure the ability with sufficient reliability to predict later development). The authors of screening and progress monitoring tests, in particular, should provide evidence that performance on these tests is usefully related to important outcome measures in reading. Construct identification validity is the most complex form of validity, and is usually demonstrated by a convergence of evidence from several sources. For example, based on current theories of reading development, scores from a valid test of oral reading fluency should show: 1) regular development with age; 2) differences among groups of students that traditionally show different patterns of development in reading (e.g., differences in socio-economic levels, differences between students who are, or are not, classified as learning disabled); 3) responsiveness to intensive interventions that have been shown to affect reading fluency; and, 4) appropriate relationships with other reading skills (i.e., a significant relationship with reading comprehension). Many types of evidence are usually assembled to demonstrate a tests construct identification validity.
16
REFERENCES
1
Fuchs, L., & Fuchs, D. (1999). Monitoring student progress toward the development of reading competence: A review of three forms of classroombased assessment. School Psychology Review, 28, 659-671. Shinn, M. (1998). Advanced applications of curriculum-based measurement. New York: Guilford. Torgesen, J. K. (2004). Avoiding the devastating downward spiral: The evidence that early intervention prevents reading failure. American Educator, 28, 6-19. National Reading Panel (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. National Institute of Child Health and Human Development, Washington, D.C.
Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239-256. Diamond, L. (2005). Assessment-driven instruction: A systems approach. Perspectives, Fall, 33-37 Deno, S. L. (2003). Developments in curriculum based measurement. The Journal of Special Education, 37, 184-192) Official DIBELS home page http://dibels.uoregon.edu/ Official TPRI home page http://www.tpri.org/ Official PALS home page http://pals.virginia.edu/ Nitko, A. J. (2001). Educational assessment of students (3rd Ed.). Englewood Cliffs, NJ: Prentice- Hall/Merrill Education. Juel, C. (1988). Learning to read and write: A longitudinal study of 54 children from first through fourth grades. Journal of Educational Psychology, 80, 437-447. Fuchs, L., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9, 20-28.
10
11
12
13
17
14
Shinn, M. (2002). Best practices in using curriculum-based measurement in a problem solving model. In Thomas A. & Grimes J. (Eds.) Best Practices in School Psychology IV. Bethesda MD: National Association of School Psychologists. American Psychological Association (1999). The Standards for Educational and Psychological Testing. American Psychological Association: Washington, D.C. Chartdog url: http://www.interventioncentral.org/htmdocs/tools/chartdog/chartdog.shtml Anastasi, A., & Urbina, S. (1997). Psychological testing. (7th Ed.). Upper Saddle River, NJ: Prentice-Hall. Aiken, L. R. (1994). Psychological testing and assessment. Needham Heights, MA: Allyn & Bacon. Salvia, J. & Ysseldyke, J. E. (1998). Assessment (7th Ed.) Boston: Houghton Mifflin.
15
16
17
18
19
18
19
20