Badrun Kartowagiran*, Eka Ary Wibawa, Fitri Alfarisa,

and Dian Normalitasari Purnama
Universitas Negeri Yogyakarta
*e-mail: [email protected]

Abstract: Observation is argued as the most suitable technique to assess the execution of
authentic assessment. Unfortunately, it requires great amount of time and money. We need an
alternative. Therefore, the purpose of this study was to develop an instrument in the form of a
student assessment sheet on the implementation of authentic assessments in Mathematics
subjects. This research is a development research that uses standard procedures for
developing instruments. The result of the analysis with Aiken‟s formula showed that every
item of the instrument was in a good category. The result of the analyses using Exploratory
Factor Analysis (EFA), Confirmatory Factor Analysis (CFA), and Multitrait-Multimethod
showed that the instrument had good construct validity. The result of reliability estimation
using Cronbach Alpha (α) also showed that the instrument was in the reliable category. Thus,
it can be concluded that the instrument in the form of student assessment sheets for assessing
the implementation of authentic assessment in junior high school Mathematics learning is
highly valid and reliable, which means that the developed instrument can replace the
equivalent observation sheet.

Keywords: assessment sheet development, authentic assessment, mathematics



Abstrak: Observasi dianggap sebagai teknik yang paling tepat untuk menilai implementasi
asesmen autentik. Sayangnya, teknik ini memerlukan waktu dan biaya yang banyak, sehingga
perlu dicarikan alternatifnya. Oleh karena itu, tujuan penelitian ini adalah mengembangkan
instrumen yang berbentuk lembar penilaian siswa terhadap pelaksanaan asesmen autentik
pada mata pelajaran Matematika. Penelitian ini merupakan penelitian pengembangan yang
menggunakan prosedur baku pengembangan instrumen. Hasil analisis dengan formula Aiken
menunjukkan bahwa semua butir yang ada pada instrumen termasuk katagori baik. Hasil uji
analisis menggunakan Exploratory Factor Analysis (EFA), Confirmatory Factor Analysis
(CFA), dan Multitrait-Miltimethod (MTMM) menunjukkan bahwa validitas konstruk
instrumen termasuk kategori baik. Hasil estimasi reliabilitas menggunakan Cronbach Alpha
(α) juga menunjukkan bahwa instrumen tergolong reliabel. Penelitian ini menyimpulkan
bahwa instrumen yang berbentuk lembar penilaian siswa terhadap pelaksanaan penilaian
autentik di SMP dalam pembelajaran matematika memiliki validitas dan reliabilitas tinggi,
yang berarti instrumen yang dikembangkan dapat menggantikan lembar observasi.

Kata Kunci: pengembangan lembar penilaian, asesmen autentik, matematika

INTRODUCTION producing graduates with strong characters
The quality of education defines the is the instructional goal, the teaching
quality of a nation. Better education makes process shall include trainings and
better nation. Today, many indicators show activities that build students‟ characters and
that the level of education, particularly in the assessment shall include assessment
Indonesia is still far from being ideal. The and description of students‟ characters.
Government and every part of the In line with the above description,
community especially teachers have to put Reeves (2010) states that assessment
more efforts to improve education quality. processes and material mastery– which are
Teachers should stand on the frontline in included in teaching strategies– are two
the effort to make education better. They substantial components in teaching
are essential factors in such an effort. processes. Furthermore Reeves (2010)
Barber & Mourshed (2012) state that high- states that in order to improve the quality of
performing teachers and headmasters are teaching through assessment, teachers
the starting point of high-achieving must: (1) identify the essential components
students. Furthermore, Barber and of the syllabus, (2) develop the
Mourshed state that “student placed with performance assessment system (including
high performing teachers will progress formulating essays) with rubrics, (3)
three times as fast as those placed with low conduct examination with essay-based
performing teachers”. content, (4) evaluate the result of the
Sallis (2002:150) writes that there are examination using previously prepared
ten indicators that define schools„ quality rubrics, and (5) review the result of the
and the following is the value of each examination upon evaluating them,
indicator: (1) access 5%, (2) available including reviewing the competencies that
services for customers 5%, (3) leadership have not been mastered by the students. In
15%, (4) physical environment and the next step, those competencies will serve
resources 5%, (5) teaching-learning process as the basis for formulating the remedial
20%, (6) students 15%, (7) staff 15%, (8) program. In such a manner, the students
external connection 5%, (9) organization have second opportunity to master those
5%, and standards 10%. High-performing competencies.
human resource working with adequate There are two types of assessments:
resources and following correct processes (1) assessment as a means of improving the
gives a high-performing result. But high- capability of teachers in delivering lessons
performing human resources following or assessment for learning (AfL) and (2)
incorrect process –even with abundant assessment as a means of improving the
resources– will not be able to give optimum capability of students in receiving lessons
result (Massy, 1997: 249). This means that or assessment as learning (AaL). Both
improving the learning process of a school types are the preparatory steps before
is an essential part of the effort to improve conducting assessment on the result of the
schools‟ quality; better learning process study or assessment of learning (AoL)
means better school. (Arends & Kilcher, 2010). In principle,
In the effort to improve the quality of assessment must be able to drive teachers
the learning process, teachers have many to deliver lessons better and also to
options. One of them is to develop better encourage students to put more effort in
learning quality assessment. This is just their study.
natural. Diranna, Osmundson, Topps, Authentic assessment is the only
Barakos, Gearhart, Cerwin, …, Strang assessment model that fulfills the above-
(2008) states that instructional goals, mentioned principles. It uses the technique
teaching models and assessment techniques of triangulation and triangulation of the
are linked to one another. For example, if source of information and covers all phases

of teaching (input, processing, and result). experience for students, so that they have
In line with the above argument, the better performance. Students perform better
Indonesian Regulation of the Minister of when they are offered with opportunities to
the Education and Culture Number 66 Year demonstrate what they do and every time
2013 on Assessment Standards states that they get opportunity will be followed by
authentic assessment is a comprehensive specific performance improvement. Typical
assessment method that assesses every performance exists in the situation of
teaching phase: input, processing, and assessment when students are provided
output. This method of assessment is with an opportunity to demonstrate the
deemed to be comprehensive because it result of their study with the assumption
covers assessment on the area of that they give their best.
knowledge, skills, and spiritual and social Vu & Alba (2014) state that
attitude. Frey & Schmitt (2007) argue that conventionally, assessment is considered
authentic assessment aims at measuring the authentic when the tasks are real-to-life or
capability of responding to given tasks or have real-life value. Wiggins (1998) states
tests – which are formulated based on that in order to be authentic, assessment has
everyday real life problems. Gulikers, to be realistic; it requires judgment and
Bastiaens, & Kirschner (2004) add that innovation, and “asks the student to “do”
authentic tasks incorporate knowledge, the subject, that is, to go through the
skills, and attitude aspects. procedures that are typical to the discipline
Still in connection with authentic under study”; is conducted in the context
assessment, Tombari & Borich (1999) state mirroring situations in which the skills are
that authentic learning and authentic best performed; requires students to
assessment are identification processes on demonstrate various skills related to
individuals‟ knowledge, ideas, problem- complex problems, including decision-
solving capabilities, social skills and making situation; and provides feedback,
attitude in their daily interaction in their trainings and second opportunity to solve
communities, work places and advanced problem at hand. Some elements of
courses. An authentic process assesses authentic assessment aim not only at
every material taught and practiced in the assessing competencies, but also at helping
classroom and requires students to apply students prepare themselves to handle
their skills, knowledge and ability to professional world in the future (Raymond,
process things as they are practiced by Homer, Smith, & Gray, 2012).
adults in work place, presented in In line with the argument of most
classroom activities and work book and experts‟ in the field, this research defines
required in real life. Moreover, Tombari & authentic assessment as real, unpretentious
Borich (1999) mention some characteristics assessment that continuously and
of authentic assessment as follows. (1) It sustainably assesses teaching input, process
assesses materials taught and practiced in and output covering assessment on
the classroom. (2) It provides real-life- knowledge, skills, spiritual and social
based task as a part of assessment process. attitude. This means that full development
(3) It is done continuously. (4) It has of the students is only possible when
standards or criteria. (5) Its assessment teaching process includes assessment that is
condition is the same as that of real-world authentic. Furthermore, information
conditions. (6) It directly assesses students‟ concerning the authentic assessment
performance when they are following implementation in learning processes can
training or in the process of solving be collected through observation.
problems. Kartowagiran & Jaedun (2016) showed that
Authentic teaching and authentic observation regarding authentic assessment
assessment are designed to produce better was replaceable with evaluation on

authentic assessment implementation by validation of the content of the instrument
students. was conducted by assessing the suitability
The challenge is the availability of an of the instrument items with the indicators
instrument in the form of student by five experts – three experts in
assessment sheets as a medium for students educational research and evaluation and
to assess the implementation of authentic two experts in mathematics education. The
assessment. That is exactly why this items which were not very suitable with the
research is initiated. Therefore, the purpose indicators was given a score of one and the
of this study was to develop an instrument most suitable was given a score of five. The
in the form of a student assessment sheet on data obtained from the five experts were
the implementation of authentic analysed using the Aiken Formula to
assessments on Mathematics subjects. This determine the V value. The items with the
instrument is expected to replace an V value lower than the critical V value
equivalent observation sheet. according to the Aiken Table had to be
The instrument which already had
good content validity was tried out to five
This research is a development students who were going to apply the
research that uses standard procedures for instrument, in order to get the information
developing instruments which were about its readability or in order to know the
published by AERA (2014). The statements which could not be understood
procedures are: (1) reviewing theories, (2) by the users. After the instrument was
developing outline, (3) putting down revised based on the result of the
instrument items, (4) conducting theoretical readability test, the first trial was conducted
analysis on instrument items and revision, to 90 junior high school students. The data
(5) testing the instrument content validity on the first trial were analysed using the
using expert judgment and then measuring EFA to show the construct validity of the
content validity index (V) using Aiken‟s instrument; the item with the loading factor
formula (Aiken, 1985), (6) conducting of less than 0.3 had to be omitted (Hair,
instrument readability test and revision, (7) Ringle, Hult, & Sarstedt, 2014).
conducting the first trial and then Furthermore, the second trial was
instrument construct validity evidentiary conducted to 150 junior high school
test using the Exploratory Factor Analysis students, and the data obtained were
(EFA) technique, (8) conducting second analysed using the CFA technique in order
trial and then confirming instrument to confirm the construct validity of the
construct validity using the Confirmatory instrument. The final stage was to measure
Factor Analysis (CFA) technique, (9) the construct validity of the instrument
estimating reliability using Cronbach Alpha using the MTM technique, by correlating
technique, and (10) determining the the student evaluation data and researcher
construct validity using multitrait- observation data on the implementation of
multimethod by correlating the data on the authentic assessment in the teaching of
evaluation by students and the researcher junior high school mathematics. When the
observation data on the implementation of correlation coefficient was higher than 0.8,
authentic assessment in the teaching of the student evaluation sheet could replace
junior high school mathematics the observation sheet (Grewal, Cote, &
The assessment grid and items were Baumgartner, 2004).
written by the first author, and analysed
theoretically by three co-authors. The result
of the theoretical analysis was used to
revise the instrument. Furthermore, the

FINDINGS AND DISCUSSION revision which then followed with review
Findings from the experts. The result of the review
The result of this research is an was then computed using the Aiken
instrument in the form of student Formula. According to Aiken (1985), when
assessment sheets on the implementation of the number of raters is five, the number of
authentic assessment in mathematics choices is also five, so that the minimum V
subject. The instrument was developed value is 0.80. The result of the analysis
based on four components: attitude, skill, using the Aiken Formula showed that items
knowledge, and teachers‟ discipline in 3, 7, and 13 were in the poor category with
implementing authentic assessment 0.73 content validity index (V) and the rest
principles. The discipline in implementing of the items were in the good category. The
authentic assessment in this research good items were items 2 and 5 with the V
consists of three principles: the assessment value of 0.80; items 8, 14, and 15 with the
has to be realistic, it has to assess HOTS V value of 0.87; and items 1, 4, 6, 9, 10. 11,
(Higher Order of Thinking Skills), and it and 12 with the V value of 0.93. The
has to be sustainable. The instrument grid distribution of the valid items resulted from
can be seen in Table 1. the calculation using the Aiken Formula
can be seen in Table 2.
Table 1. Student Assessment Sheet on the Table 2 shows that after the omission
implementation of Authentic Assessment of the items which were not valid, the
No. Indicators Item Number instrument consisted of only 12 items.
1. Attitude assessment 1,2,3 Later, the instrument was tried out at the
2. Skill assessment 4,5,6 first stage to 90 grade 11 students of 15
3. Knowledge 7, 8, 9, 10 junior high schools in Yogyakarta Special
Region, who took Mathematics. The data
4. Discipline authentic 11, 12, 13, 14,
assessment 15 from the first trial were analysed using the
EFA technique and the result showed that
Initially, the instrument in the form of Kaiser-Meyer-Olkin Measure of Sampling
student assessment sheets to evaluate the Adequacy (KMO) was at 0.743. Every item
implementation of authentic assessment in had anti-image coefficient is greater than
mathematics teaching consisted of 15 0.5, which means that it satisfied the
items. The next was readability test and requirement for the factor analysis.

Table 2. Calculation Result of Aiken V Index

Number of Index of Number of New
Factor Information
Item Aiken V Items
Discipline in Implementing Item 1 0.93 Valid 1
Authentic Assessment Item 2 0.80 Valid 2
Item 3 0.73 Not Valid -
Item 4 0.93 Valid 3
Knowledge Item 5 0.80 Valid 4
Item 6 0.93 Valid 5
Item 7 0.73 Not Valid -
Item 8 0.87 Valid 6
Attitude Item 9 0.93 Valid 7
Item 10 0.93 Valid 8
Item 11 0.93 Valid 9
Skill Item 12 0.93 Valid 10
Item 13 0.73 Not Valid -
Item 14 0.87 Valid 11
Item 15 0.87 Valid 12

The result of the first trial showed authentic assessment is fit for its purpose.
that 12 items had the loading factor greater This means that the data supported the
than 0.7, which means that they were valid. concept of student assessment sheets in the
Since the implementation of authentic evaluation of the authentic assessment
assessment had four components, hence the implementation in junior high schools; in
authentic assessment variants that can be short, the instrument is valid. Moreover, the
explained using these four components reliability of the assessment sheets was
were 65.845%. These components were: estimated using Cronbach Alpha and the
attitude, knowledge, skills and discipline in result was at 0.810 which according to
the implementation of authentic Feldt & Brennan (1989), the instrument can
assessment. This is in line with Frey, be categorized as reliable.
Schmitt, & Allen‟s research (2012) who In addition to the factor analysis
describe context assessment as one of the technique, the construct validity was also
dimensions of authentic assessment which verified with multitrait-multimethod
consists of three aspects: realistic or context technique. Campbell & Fiske (1959)
activity, performance-based task, and introduce this technique and claim that the
cognitively complex task. technique aims at performing verification
In the second trial, the instrument was on the construct validity of an instrument
administered for 150 students and the that measures the same traits but measured
analysis used the CFA technique. The goal with two or more different methods. The
of the second trial was to confirm the instruments with good construct validity
analysis result of the first trial. This is in show high degree of correlation among the
line with Cramer (2003) who argues that measurement results of the same traits but
EFA explores theories and CFA tests with different methods (Azwar, 2013). In
theories. The result of the confirmatory accordance with this, Mardapi (2017) states
factor analysis with CFA is shown in that when using multi-trait-multi-method
Figure 1. validity to measure more than one trait, we
Figure 1 shows that the instrument need to apply more than one method.
construct of the implementation of

Figure 1. The Result of The Second Trial Analysis With CFA

In this research, to measure the for not doing attitude evaluation were: (1)
construct validity with multi-trait-multi- that they could not make the instrument for
method, we correlated the result of the measuring attitude competence, (2) that the
assessment from the students with the result class size was very big, and (3) that
of the observation on the implementation of measuring attitude was the Counselling and
authentic assessment. This step is important Civic Education teachers‟ responsibility.
because both instruments measure the Item 6 represents teachers‟ intensity in
quality of the authentic assessment; the first asking questions to students during classes.
instrument was observation sheets used by
the researchers and the second on was
assessment sheets used by the students. In
this research, the assessment sheets are
deemed to have the construct validity when
the correlation coefficient among
assessment results scored at least 0.8
(Grewal, et al., 2004). In this research, the
correlation between the result of the
assessment and the result of the observation
was at 0.965. This shows that there is a
very strong correlation between the scores
of students 'assessment of the authentic Figure 2. The Correlation between the
assessment implementation and the results Result of the Assessment from the Students
of the researchers' observations of authentic and the Result of the Researchers’
assessments. Thus the two methods proved Observation
empirically to measure the same trait,
namely authentic assessment Discussions
implementation. This also gives meaning The above-mentioned result shows
that student assessment sheets can replace that the instrument developed in this
the observation sheet. research has good content validity,
Figure 2 indicates that the score of construct validity, and reliability.
each instrument points -both from students‟ Concerning the capability of the teachers in
assessment sheets and researchers‟ conducting the attitude assessment, we
observation sheets in the evaluation of the have to say that it does not look quite good.
implementation of authentic assessment- The low intensity of the assessment on
are consistent. Most of the items have students‟ attitude is the problem. This is
almost the same scores, so the correlation due to the lack of teachers‟ understanding
between them is quite strong. Figure 2 of how what‟s written in the Lesson Plan,
shows that two items (Item 1 and Item 6) taught and demonstrated by the teachers in
score poorly (below standard). Item 1 classes affects students‟ attitude. This is in
represents teachers‟ discipline in assessing line with Kartowagiran & Maddini‟s
students‟ attitude during the teaching and research (2015) which reported that
learning process. Basically, in varying attitudinal competence developed in classes
degrees, all teachers have done this job, but and demonstrated by the teachers had
most of them do not regularly record effects on students‟ attitude. Besides, in the
students‟ attitude in their journal or assessment of the students, teachers have to
notebook. This is in line with the research pay more attention to the manner they
by Kartowagiran & Jaedun (2016) which communicate with students. Thus, they can
found that 47% of their sample teachers did improve it. This is also in line with the
attitude evaluation. The teachers‟ reasons research of Retnawati, Kartowagiran,

Arlinwibowo, & Sulistyaningsih (2017) authentic assessment in the subject of
that showed how the lack of teacher-student Mathematics, there are two items (Items 1
communication arose as one of the factors and 6) that do not give optimum results.
that held up students from achieving the Item 1 indicates the low level of discipline
best result in their study. in conducting assessment on students‟
Item 6 represents teachers‟ intensity attitude during the process of learning. In
in asking questions to students during connection with the issue of discipline, the
classes. Figure 2 shows that the teachers teachers confronted a number of challenges
asked questions only once or twice in a that hindered them from performing
meeting. This means that the intensity was authentic assessment optimally. One of
considered low and the teachers did not them was that the authentic assessment
practice the ability to ask questions. In line technique required a great deal of time (c.f.
with the result of the research by Ermasari, Mintah: 2003). Furthermore, Mintah adds
Subagia, & Sudria (2014) which found that that the implementation of authentic
there were four factors that hindered the assessment with high degree of discipline
teachers in asking question: the lack of will deliver positive impact not only on
understanding of the types of questions, the students‟ development, but also on
lack of planning in formulating and asking students‟ concept of self-development and
questions, the lack of training relevant to motivation. Consequently, it is mandatory
formulating and asking questions and the for teachers to improve the degree of
lack of awareness on the challenges the discipline they put on in the assessment of
teachers had to deal with. The teachers students‟ attitude.
need to improve their skills in asking
questions and drive the intensity of the
practice of asking questions. By such
manners, the students have a chance to
develop the ability to think critically.
Additionally, there were still
unrealistic and/or irrelevant questions; the
questions made sense mathematically, but
not realistically. This type of items is not
authentic items (content). In line with Frey,
et al. (2012), authentic items (content) have
to be composed of realistic and/or relevant
questions. Let us return to Figure 2 for a
moment. For Items 2, 3 and 4, there are Figure 3. The Students’ Evaluation on the
wide gaps between the scores of the Implementation of Authentic Assessment in
students‟ assessment sheets and those of the the Subject of Mathematics
researchers‟ observation sheets. The scores
of the researchers‟ observation sheets are Furthermore, the teacher‟s questions
significantly higher. This is reasonable are only at the second (understand) and
since the observation was only conducted third (apply) levels of the Bloom‟s
three times in one whole semester and the Taxonomy, and they are not yet at the forth
evaluation from the students was conducted level (analyse). The learning materials
in every class of the semester. Figure 3 which are tested are not realistic;
below shows the students‟ assessment on mathematically, the questions are correct,
the implementation of authentic assessment but they are not applied in the students‟
by mathematics teachers. everyday life. For example, Budi lifted a
In Figure 3, based on the students‟ 50-kilogram ball and carried it running for
evaluation of the implementation of 500 m, and so on.

Authentic assessment is basically a authentic assessment. Performance
complex concept. This makes attempting to assessment only focuses on specific
apply an authentic assessment into practice competences, but authentic assessment
might be an exhausting task for teachers. It focuses not only on a single competency,
is easy to fall into confusion in the meaning to say it has a broader scope. In
discussion of this concept. It is clear that authentic assessment, teachers can use
the concept of authenticity in the journals (teachers‟ notes), whereas in
description of authentic assessment is performance assessment teachers need not
significantly deeper than just mere realism use journals. The instrument used for
(or being realistic). Most of the evaluating performance is merely an
publications that we have reviewed focus observation sheet, and/or evaluation sheet,
on class assessment. But some other while that for doing authentic assessment is
experts, especially in their early an observation sheet and/or evaluation sheet
publications, attempted to explore the which must be accompanied with journal
characteristics of inauthenticity in most of (teachers‟ notes on students‟ behavior).
large scale standard test. The implementation of authentic
Typically, only performance-based assessment in mathematics teaching need to
assessments or assessments with cognitively be done because there are many advantages
complex tasks –that do not put the value of of it. This is in line with Nitko & Brookhart,
the tasks outside classroom into (2011) who writes that there are some
consideration– are categorized as authentic advantages of authentic assessment. It
assessment. It may also be defined that possesses the ability to show students‟
authenticity based on whether any students‟ development based on goals holistically and
arguments, students‟ team work, or assessing skills to “do” in the area of
students‟ involvement in defining scoring knowledge and skills; it provides more
criteria are required. meaningful assessment of students for
On top of that, relevancy with real students (Whitelock & Cross, 2012); it
world tasks is also a commonly mentioned encourages students to improve their
component of authenticity. Many real world interest and skills (Svinicki, 2004; Gulikers,
tasks or works are cognitively complex, Kester, Kirschner, & Bastiaens, 2008); it
followed by clear and widely understood improves students‟ confidence, knowledge
criteria of success. It is impossible to think and skills (Raymond et.al, 2012); it
of a real world task that is not performance- enhances the integration of what students
based. Obviously, it is improper to assume know and how they act with who they are
that the authenticity aspects –that are not becoming (Vu & Alba, 2014). Moreover,
focused on in the definitions from the authentic assessment also gives students
publication–are not included in the real chances to learn by doing and to support
conceptualization of the experts. teachers in their effort to develop their
Other concepts that potentially add to teaching quality based on students‟
the teachers‟ confusion is the description performance, resulting in more accurate
from Frey, et al. (2012). Frey, et al. (2012) assessment result (Linh, 2016). With these
states that Oosterhof, Mertler and Popham advantages in mind, the implementation of
argue that authentic assessment is a part of authentic assessment is beneficial to both
performance assessment. On the other hand, students and teachers.
he also states that Kubiszyn and Borich, Meanwhile, Hargreaves, Earl &
Taylor and Bobbit‐Nolen and Airasian Schmidt (2002) state that authentic
argue that performance assessment is a part assessment encourages students to be more
of authentic assessment. This research responsible for their study, produce
stands with the concept expressed later: assessment as an integral part of learning
performance assessment is a part of process, be more creative and implement –

and not only memorize what they have confirming the construct validity using the
learned. Furthermore, Hargreaves, et al. multitrait-multimethod. It could be
(2002) found that: (1) teachers were more concluded that students‟ assessment sheets
comfortable with authentic assessment as an instrument in the assessment of the
because they did not need to test implementation of authentic assessment in
examination content first; (2) authentic junior high school Mathematics teaching
assessment was effective in building has a high degree of validity and reliability,
common collaborative understanding which means that the developed instrument
among teachers, students and parents can replace an equivalent observation sheet.
because authentic assessment assessed
every students‟ activities and involved ACKNOWLEDGEMENT
parents in many occasions; and (3) The authors would like to thank the
authentic assessment provided better Ministry of Research, Technology, and
feedbacks for teachers. Higher Education of Indonesia for their
The advantages of applying authentic financial support to this research.
assessment in teaching are so many that it
is logical that in the Curriculum 2013 used REFERENCES
by Indonesian teachers, for example, who Aiken, L. R. (1985). Three coefficients for
are obliged to apply authentic assessment. analyzing the reliability and validity
Nevertheless, it must be noted that there are of ratings. Educational and
still many teachers who cannot apply psychological measurement, 45(1),
authentic assessment well. Such teachers 131-142.
need to be trained to improve their ability doi:10.1177/0013164485451012
to apply authentic assessment. In order to
make them serious in applying authentic Arends, R. I., & Kilcher, A. (2010).
assessment, evaluation needs to be done. Teaching for student learning
The evaluation is done by the school becoming an accomplished teacher.
principal who is helped by students using New York, NY: Routledge.
the developed assessment sheet.
Azwar, S. (2013). Reliabilitas dan
CONCLUSION validitas. Yogyakarta: Pustaka Pelajar.
The students‟ assessment sheets on
the implementation of authentic assessment Barber, M. & Mourshed, M. (2012).
as an instrument was developed in the Profesional development
following procedures: (1) reviewing international. New York, NY:
theories, (2) developing outline and writing Pearson.
down the points of the instrument, (3)
analyzing the points of the instrument and Campbell, D. T., & Fiske, D. W. (1959).
conducting revision, (4) conducting trials Convergent and discriminant
and defining the characteristics of the validation by the multitrait-
instrument, (5) finalizing the instrument, multimethod matrix. Psychological
(6) conducting instrument readability test Bulletin, 56(2), 81-105.
and revision, (7) conducting the first trial doi:10.1037/h0046016
and then instrument construct validity
evidentiary test using the EFA technique, Cramer, D. (2003) Advanced quantitative
(8) conducting the second trial and then data analysis. London: McGraw-Hill
confirming instrument construct validity Education.
using the CFA technique, (9) estimating
the instrument reliability by using Diranna, K., Osmundson, E., Topps,
Cronbach Alpha formula, and (10) J., Barakos, L., Gearhart, M.,

Cerwin, K., …, Strang, C. (2008). of practical experience on perceptions
Asessment-centered teaching (A of assessment authenticity, study
reflective practice). London: Sage. approach, and learning outcomes.
Learning and Instruction, 18(2), 172-
Ermasari, G., Subagia, I. W., & Sudria, I. 186.
B. N. (2014). Kemampuan bertanya doi:10.1016/j.learninstruc.2007.02.01
guru IPA dalam pengelolaan 2
pembelajaran. Jurnal Pendidikan dan
Pembelajaran IPA Indonesia, 4(1), 1- Hair, J. F., Ringle, C. M., Hult, T., &
12. Retrieved from Sarstedt, M. (2014). A primer on
http://oldpasca.undiksha.ac.id/e- Partial Least Squares Structural
journal/index.php/jurnal_ipa/article/vi Equation Modeling (PLS-SEM).
ew/1111 Thousand Oaks: Sage.

Feldt, L. S., & Brennan, R. Hargreaves, A., Earl, L., & Schmidt, M.
(1989). Reliability. In R. L. Linn (2002). Perspectives on alternative
(Ed), Educational measurement (3rd assessment reform. American
ed.). New York, NY: Macmillan. Educational Research Journal, 39(1),
Frey, B. B., & Schmitt, V. L. (2007). doi:10.3102/00028312039001069
Coming to terms with classroom
assessment. Journal of Advanced Kartowagiran, B., & Maddini, H. (2015).
Academics, 18(3), 402-423. Evaluation model for islamic
doi:10.4219/jaa-2007-495 education learning in junior high
school and its significance to
Frey, B. B., Schmitt, V. L., & Allen, J. P. students‟ behaviours. American
(2012). Defining authentic classroom Journal of Educational Research,
assessment. Practical assessment, 3(8), 990-995. doi:
research & evaluation, 17(2), 1-18. 10.12691/education-3-8-7
Retrieved from
https://pareonline.net/pdf/v17n2.pdf Kartowagiran, B., & Jaedun, A. (2016).
Model asesmen autentik untuk
Grewal, R., Cote, J. A., & Baumgartner, H. menilai hasil belajar siswa Sekolah
(2004). Multicollinearity and Menengah Pertama (SMP):
measurement error in structural Implementasi asesmen autentik di
equation models: implications for SMP. Jurnal Penelitian dan Evaluasi
theory testing. Marketing Science, Pendidikan, 20(2), 131-141. doi:
23(4), 519-529. 10.21831/pep.v20i2.10063
Linh, N. N. (2016, August). Authentic
Gulikers, J. T., Bastiaens, T. J., & assessment: A case study of its
Kirschner, P. A. (2004). A five- implementation in a lecturer’s classes
dimensional framework for authentic in Vietnam. Paper presented at the
assessment. Educational technology International Conference on
research and development, 52(3), 67- Education and Social Integration, Ho
86. Retrieved from Chi Minh City, Vietnam.
Mardapi, D. (2017). Pengukuran,
Gulikers, J. T., Kester, L., Kirschner, P. A., penilaian, dan evaluasi pendidikan,
& Bastiaens, T. J. (2008). The effect

edisi kedua. Yogyakarta: Parama Sallis, E. (2002). Total quality management
Publishing. in education. London: Routledge.

Massy, W. (1997). Teaching and learning Surya, A., & Aman, A. (2016). Developing
quality-process review: The Hong formative authentic assessments
Kong programme. Quality in Higher based on learning trajectory for
Education, 3(3), 249–262. elementary school. Research and
doi:10.1080/1353832970030305 Evaluation in Education, 2(1), 13-24.
doi: 10.21831/reid.v2i1.6540
Mintah, J. K. (2003). Authentic assessment
in physical education: Prevalence of Svinicki, M. D. (2004). Authentic
use and perceived impact on students' assessment: Testing in reality. New
self-concept, motivation, and skill Directions for Teaching and
achievement. Measurement in Learning, 100(Winter 2004), 23-29.
physical education and exercise doi: 10.1002/tl.167
science, 7(3), 161-174. doi:
10.1207/S15327841MPEE0703_03 Tombari, M.L & Borich, G.D. (1999).
Authentic Assessment in the
Nitko, A. J., & Brookhart, S. M. (2011). classroom (application and practice).
Educational assessment of student. Upper Saddle River, NJ: Prentice
Boston, MA: Pearson. Hall.

Permendikbud 2013 No. 66, Standar Vu, T. T., & Alba, G. D. (2014). Authentic
Penilaian. assessment for student learning: An
ontological conceptualisation.
Raymond, J. E., Homer, C. S. E., Smith, R., Educational Philosophy and Theory,
& Gray, J. E. (2012). Learning 46(7), 778-791.
through authentic assessment: An doi:10.1080/00131857.2013.795110
evaluation of a new development in
the undergraduate midwifery Whitelock, D., & Cross, S. (2012).
curriculum. Nurse Education in Authentic assessment: What does it
Practice, 13(5), 471-4766. mean and how is it instantiated by a
doi:10.1016/j.nepr.2012.10.006 group of distance learning
academics? International Journal of
Reeves, D.B. (2010). Transforming e-Assessment, 2(1), article 9.
profesional development into student Retrieved from
result. Alexandria: ASCD. http://journals.sfu.ca/ijea/index.php/
journal/ article/view/31.
Retnawati, H., Kartowagiran, B.,
Arlinwibowo, J., & Sulistyaningsih, Wiggins, G. (1998). Educative assessment:
E. ( 2017). Why are the mathematics Designing assessments to inform and
national examination items difficult improve student performance. San
and what is teachers‟ strategy to Francisco, CA: Jossey-Bass.
overcome it? International Journal of
Instruction, 10(3), 257-276. doi:

