TOWARDS DEEP EVALUATION
© Francois Victor Tochon
Assessment, as it is lived in American schools, has changed its role. The humane dimension of teacher know-how in assessment has been replaced by quantitative performance assessments that play a political function in a number of countries. This essay discusses it from a critical perspective, considering research on the validity of standardized performance tests. Teachers’ field assessments tend to integrate the human factor as a major consideration, to the contrary of standardized measures that re-shape teaching to match the demand of the lawmakers. The current totalitarian movement in the field of school assessment places instrumental reason to its extreme, as if machines and bureaucratic criteria could replace human agents. Because it enforces asimilation, it may contribute to the genocide of linguistic minorities. Assessment is always a relative enterprise, as it depends upon a conceptual interpretation and the context of its application. A more humane and deeper approach is proposed.
Two decades of research on teachers’ professional philosophies has indicated the importance of recognizing the intricacies of their decisions regarding assessment and the humaneness of their evaluative abilities (Tochon, 2000). Despite this considerable evidence, educational assessment has been used as a weapon by aggressive politics seeking to replace teachers in the arena of educational evaluation in certain Anglo-Saxon countries, as well as in countries that have adopted their model. Assessment entails forms and educational content, and its automatization without theoretical or practical knowledge has proved itself to be dangerous. In several nations, quantitative testing is mandated; the argument that the validity of standardized tests is more robust obscures other arguments, specifically economic arguments. Through this mechanism, the evaluative capacity of teachers is pushed into the background.
Practical prudence and theoretical wisdom in evaluative knowledge
The instruments of evaluation used in an academic context must not separate practical knowledge and the theoretical conceptualization that serve as their foundation; this will uproot both theory and practice (Tochon, 2001a). This proof had been made by Aristotle (350 BC/2004)—technique (techne) only has meaning if it is guided by practical knowledge and prudence (phronesis) for a goal of theoretical wisdom (sophia). Kant (1784) reexamined this proof; he deplored the absence of theoretical Reason behind technical goals that used instrumental logic, an argument taken up by Habermas (2003): instrumental thought only has meaning if it is tied to practical expertise and theoretical knowledge. Together, they permit the social critique necessary for democracy. This argument, corroborated by critical systems theory (Fuenmayor, 2006), which gives priority to human thought, contextualized in processes imposed by instrumental logic. For example, in educational assessment, this arises from privileging the relationship between student and teacher in an iterative evaluative process rather than ceding power to instruments applied independently of the context, as in the case of standardized quantitative tests.
Bain (1987), in a study of time in schools, noted that in middle school (11-15 years), 300 hours were allocated each year for evaluation; those hours were unavailable for teaching and learning. Today, these hours have multiplied due to the imposition, by political motivations, of a form of accounting influenced by economic modeling. Most notably in the US, politicians have imposed on schools a type of evaluation influenced by New Public Management, which promotes standardized testing as the core of institutional accountability based on four principles: 1) educational attainment is measurable; 2) it must be measured through scientifically valid instruments; 3) participants who are allowed autonomy must be accountable for their actions; 4) schools should be regulated by the results. From this point, the logic of evaluation and ensuing construction of tests are disconnected from the reality in the schools and the reality of the human beings who are the target of evaluation. Standardized testing, which draws on a model of industrial engineering, acquires a political standing in the macrosystemic regulation that is hardly compatible with the organic microprocesses involved with teaching and learning. However, no tidy empirical consensus has been indicated regarding the value of these reforms (Mons, 2009). Additionally, the negative impact of standardization on the educational performance of students from low-income, ethnic minority, and immigrant groups, as well as students with disabilities, can be observed.
New Public Management, the dominant paradigm in the United States, particularly in the analysis of institutional management, is in response to political and economic exigencies, which are often in contradiction to the human relationships necessary in education. The connection between praxis, instrumentality, and wisdom is broken once teachers are no longer the evaluators. The forms of evaluation introduced in classrooms under the pretext of No Child Left Behind illustrate their toxic limits when these evaluations are associated to measures that create anxiety in students, school staff, and parents. These academic tests imposed in service to administrative goals have the effect of modifying teaching so that it is stripped of a pedagogical dimension, taking considerable time away from learning. In short, it should be asked if the philosophy of evaluation, centered on expected performance, must not be reexamined to ensure that evaluation does not create abuses, especially those Edgar Morin (1982) labeled the terrorism of the state, when instrumental logic becomes totalitarian.
Standardized tests of performance: positionality of teachers
Research on evaluation practices of teachers has largely been led by research groups separated from the field. There has been little participant action research, through autoethnography or collaborative research. The social nature of testing has been analyzed from a perspective of power relations. Brookhart (2004), for example, studied the tensions arising from the application of evaluation principles by student teachers, who had been engaged in social justice and supportive learning. Sarrazin, Tessier, and Trouilloud (2006) studied the positive climate created by teachers and the ways in which they encourage student motivation. Black & William (2006) established modalities appropriate for the application of evaluation; McMillan (2007) attempted to circumscribe the concept of equity in evaluation. Harris & Brown (2008) examined accountability from the point of view of the teacher and the student; this study was rare in that it used introspection to study what teachers thought about accountability. Simon, Chitpin, & Yahya (2010) noted how the underlying theories of evaluation processes are influenced by implications that are theoretical and technical, as well as psychological and social, to the extent that the effects cannot be examined if the people who are most directly concerned (the teachers) are not consulted. In the majority of this research, the effects of these new forms of standardized evaluation on classroom life and the perceptions of students and teachers are neglected.
In her article “The end(s) of Testing”, which sought to revalorize the evaluative knowledge of teachers, Eva Baker, then the president of the American Educational Research Association, noted in her 2007 presidential address that the emphasis placed on testing created a disturbance in school life. She argued that, to recalibrate the situation according to principles of equity and justice, it was necessary to overcome the institutional logic of accountability because people need a safe place to learn. Because the use of standardized tests rejects the primary research findings on learning and motivation, many students and teachers fear and loathe the tests. What arises from this lack of a conceptual link between mandated policy and educational needs is an inconvenient truth: testing does not, in general, measure what schools want to teach. This contradicts the underlying assumption behind the push for accountability. We have very little evidence which suggest that standardized assessment achieves its goals or offers useful guidelines for decision making. “Nevertheless, we act as if tests were valid, in the face of weak or limited evidence” (Baker, 2007, p. 310). Budgetary restrictions and work schedules do not allow a qualitative interpretation of student performance, even though qualitative evaluation made by the teachers themselves have more external and ecological validity.
The pernicious effects of standardized performance tests
Regional and national standardized tests have negative side effects; for example, they modify the content actually taught and lower the overall level and interpretation of what comprises standard skills (Koretz, 2002). In the profession of teaching, they engender a hyperconsciousness of what Weinberger (2007, p. 54) called “accountabalism”, a form of “evaluative cannibalism”. Teachers come to “teach to the test”, which is then normalized (Igen-Igaenr, 2005). However, this may have, in certain cases, an educational effect (Demailly, 2001), if the evaluation is developed in a participative manner with strong involvement with teachers, on the basis of democratic objectives (as opposed to an authoritarian evaluation), and if the conceptualization of change has a direct connection with the field, strong convictions, and vitality. This is rarely the case.
Both in the content as well as their methods of integration, tests limit the space for the freedom and abilities of the teacher, already subject to the multiple constraints of hours, programming, and verification at several levels, not including pressure from parents. Certain teachers express the opinion that the content is almost pushed to the background, as the pressure of quantitative standardized testing is strong: it seems important that the student fits into the mold of prescribed production. Creativity in students is increasingly worn down by precise goals, defined in quantitative measurements, which assures a common calibration of acquired skills. This model reassures politicians because it seems logical to think that if a student fails in academic accomplishments, measures must be put into place to determine the possible causes, with remedies for the student, workshops for the teacher, and economic sanctions if the school achieves weak output.
But the mechanization of assessment can lead to the loss of human values. Students and teachers exhibit numerous health problems linked to high-stakes testing (Gregory and Clark, 2003). In certain countries, like the United States (although the situation may vary from one state to another), these measures can be pushed to the extreme. The interpretation of results from quantitative assessment is naturalized as an objective fact that has serious consequences, from a warning to the school in the case of low achievement on official tests, to, after a period of enforcement of measures believed to be effective for low-performing schools, closing the school if it does not improve to the minimal threshold for normal performance standards. All this so that a child is not left behind; to get to a “higher performing” school in another neighborhood or even town, the daily journey by school bus will often double in length. Mechanized evaluation in a context such as this, stripped of sense and wisdom, may create a diaspora of the poorest children. The scientific reporting on these phenomena is worn away by the insistence on objectivity that prohibits the testimony of the indignities, detrimental to a large number of students, teachers, and schools, which may be caused by such measures.
Limitations of construct validity in evaluation
The principle argument that justifies the integration of standardized tests in schools is its credibility regarding teacher evaluation because these tests are believed to have more internal validity and construct validity. Although the internal validity of a test supports the relationship between cause and effect to the detriment of other possible variables, the construct validity determines if what is believed to be measured is well measured, and by extension, the generalizability of the results. However, Cronbach and Meehl, in 1955 (p. 297), had specified that generalizability of tests rests in the validation of “a principle for making inferences”. Alderson and Banerjee (2001), in their review of theories of test construction, demonstrated that internal validity was not as important as ecological validity of an external nature. The validity of an evaluation rests in an interpretation subject to privileged priorities, on which generalization depends. Over the course of the last few years, the research community has become more and more conscious of the consequences of tests and their use, interpretation, and even their misuse. The question now asked is if test developers should be held responsible for the impact of their tests on society, which would constitute as new form of validity, consequential validity (Alderson and Banerjee, 2001).
Researchers who put the spotlight on this new way of thinking of evaluation, and the use of tests in particular, such as Lissitz and Samuelson (2007), or McNamara and Roever (2006), have come to the conclusion that construct validity has multiple facets. There is no simple answer to the question of how to know if a test measures what it is supposed to measure. The instruments created by teachers for their use in class, to cite one example; standardized tests must be restricted to a specific context and should not be used outside of the conceptualization and the rationale, which justifies their use. This reiterates the importance of the fundamental conception of the test—the perceived sense of its utility—is more important than the construct validity. Researchers, for this reason, must investigate the interpretive modalities in context, a test’s pertinence in a particular learning situation, for example, rather than fixating on the scores. There, it focuses on a more reflexive and conceptual use of evaluation. “As a result of this unified perspective, validation is now seen as ongoing, as the continuous monitoring and updating of relevant information, indeed as a process that is never complete” and, in these facts, which cannot be assumed by anyone other than the teacher (Alderson and Banerjee, 2002, p 79). The evaluation is valid IF it is practical and IF its inferences are useful in practice. The utility of the evaluation in its context is therefore the primary criterion by which a test must be judged. The consequences of the use of the test, its authenticity or adequacy to the context, its potential of interaction and communication and its practical dimension become the major qualities and the anticipated criteria of a test that contribute to its verifiability.
Towards a deeper and more humane conception of evaluative competence
Performance testing, as a cause célèbre at the core of neoliberal ideology, places students, teachers, and schools in an unhealthy competition that can poison the social environment of school districts through creating never-ending conflicts. The publicity given to these tests is often proved to be detrimental and contrary to the goals of reform. Even if the variability of accountability models is acknowledged, systematic performance testing has unintended effects that negatively influence the process of learning. Tests cannot be justified outside of the scholastic context. They can be harmful if teachers are not involved in their conception, their administration, and their use for constructive goals. The evaluative competence of teachers cannot be dismissed, which is based on reflective practice and the wisdom of contextualized action. No matter which plan of action is adopted, it must be in conjunction with professional development, financial support for struggling schools, and support for collaborative programs between schools, for example, through a network of professional exchanges. The manner in which teachers participate in these evaluations must be reviewed in depth; the more those who participate in the conception, the administration, and the analysis of results, the greater their involvement in the process develops, as does the culture of evaluation (Mons, 2009, p.36).
When perspectives are mechanistic and superficial and when education is not truly reflective, reflexive, and profound, distortions reveal themselves; each time, it is possible to locate the problem in the absence of consideration for the knowledge of teachers because they are often not engaged in the reform process or its evaluation (Stobart, 2011). Confronted with policies that are often extremist, educational practitioners must become more active. They can conduct collaborative research with teacher educators and administrators to stimulate improve continuing education (Annan, 2011). Evaluation then allows the steering of reforms, clarifying of roles, and envisioning sustainable programming to support students from disadvantaged groups. Continuing education for teachers is a proven method to improve student outcomes. Participation action research in these networks permits the analysis of student difficulties and to locate their needs (Timperley, 2011).
Changing the nature of scholastic evaluation and valorizing the evaluative competences of teachers
The beginning of this text emphasized two contradictory forces in educational evaluation: that which supports a practice of reflexive wisdom, and that in which the aims are economic and political and focuses on accountability. The problem is that the second is carried out to the detriment of the first; in addition, the first dispossesses the teacher of a crucial dimension of his or her professional competence, which allows for teaching practices, through encouraging a dialogue with the child, that respond to the student’s individual needs. This human dimension risks being excluded by the measurements of standardized testing imposed in an authoritarian manner because these measurement change the nature of teaching, which loses, in large part, its educational dimension and is relegated to technical instruction.
Among the actors implicated in the system of evaluation that has been put in place by these reforms, administrators, teachers, learners, and parents are conditioned by the social and psychological environment created by standardized tests, pitting students, teachers, schools and school districts against each other through comparison of quantitative test scores. Although the majority of participants in these changes do not see, a priori, much harm in generalizing what has previously been a well-accepted local practice, it can be observed that the expansion of an evaluative program, removed from the control of educational practitioners engenders systematic distortion in the perception of the work of teachers, can be harmful to learning, to the climate in the school and the classroom, and for the humane dimension to which education has previously belonged. Standardized testing has become, in certain contexts, a factor in racial segregation and linguistic genocide. Every evaluation is socially marked because the preferential value placed on certain criteria diminishes the value of others in a way that is potentially discriminatory. The underlying culture of this process of valuing and devaluing is rarely discussed, as though evaluation is always objective, as opposed to a politically subject action. It is rarely questioned what is being devalued in the process.
Standardized testing appears to have become the only educational policy in some countries. This situation, associated with several other new constraints, entails certain characteristics of bureaucratic extremism. In several countries, the ubiquity of technical measurements of quantitative evaluation tends to strip people of the power of their own actions. Their freedom to think, to believe, and to express themselves is at stake. The danger is very real; sociometric evaluation is no longer tied to an ideal of what humanity can become, but has become a method of control and manipulation of human populations for the support of economic and security imperatives. The risk born of economic evaluation—a knowledge that has nothing to do with education—is instead compensated by the recalibration of economies and their agents, in the West as in other parts of the world, who wish a bloodless negotiated peace rather than a revolution to continue their abusive practices.
Evaluation, which etymologically applies to valorization what is unique to each person, becomes a tool to measure and homogenize in the commodification of knowledge. Rethinking evaluation in terms of valorization could lead us to reconceptualize its creative power. Production should target creation, the purpose of life, the sign of intelligence, the unique expression, the art of living, and innovative individual or collaborative research, instead of that which binds us to normative standards. Thus, in order to resolve, in part, certain of the problems mentioned above, we could surpass an evaluation of products, which limits the learner to a single type of result, and achieve an evaluation of process, leading to varied results. This new evaluation favors difference instead of homogeneity and normalization, which would reconnect the idea of evaluation as encouraging learning and personal development.
Reference: This page presents excerpts of an article published in French in the international journal Measure and Evaluation in Education, 34(3), 133-156, 2011, title: "Le savoir-évaluer comme politique éducative : vers une évaluation plus profonde".