There is a strong trend in education to strengthen the way that teacher performance is evaluated. In most states, this usually means evaluating teachers on the performance of their students. Obviously, this is a very difficult thing to do. Schools have the source data to do this by using the master schedule in the student system and dozens of assessment files containing student performance scores, but how do you transform all of this raw data into a format that can determine performance, and more importantly, how do you do it fairly?
In at least half of the states the legislature has mandated this type of teacher evaluation. In most cases, the rules for those mandates are shallow and do a poor job of really determining how a teacher is performing. For example, the predominant approach is to use the state NCLB testing for this evaluation by calculating the average percentage improvement of a teacher’s students on that test. While this generates interesting information, it is very inaccurate and quite unfair. Students in one teacher’s class may be mainly low performers, in another teacher’s class they may be mainly high performers. So the same improvement in one class might be 5% for the high performers but 50% for the low performers. Also, how do you adjust for the teacher that has five more students, or mostly non-English speakers? These are areas that state legislatures have simply overlooked, or chosen to overlook because of the complexity to do otherwise. However, any evaluation that is performed must be fair to be effective.
I would also suggest that there is a better reason to perform in-depth teacher evaluations than simply for performance evaluation purposes. A good evaluation that uses accurate data will determine the strengths and weaknesses of a teacher. A school administrator can align his or her staff much more effectively when they are aware of which teachers are better math or reading teachers, or which teachers are poor in those areas. It is my experience when evaluating these areas that there is always a bell curve with any statistically large group of teachers. It should be no surprise that every teacher has strengths and weaknesses. Some simply teach math better than reading, some teach science better. Also, some teachers simply get better overall results than others. This does not mean that some teachers in every school are under performing. While teachers may be ranked low compared to their peers, they may still be doing an acceptable job.
In Tyler Pulse we have taken all of these issues into consideration to create what we believe is one of the most advanced teacher evaluations possible. The key to any good evaluation is to determine the incremental impact that the teacher has had on their student’s performance. To do this, it is necessary to determine the historical annual improvement rates of the students in each teacher’s class, and then determine the difference between those and the improvement in the current year. Then this result should be further adjusted by outside influences such as class size, student demographics, etc. In this way a fair evaluation can be performed.
This type of teacher review is very difficult and requires tons of data, but this is necessary to be fair and accurate. We have taken this exact approach in Tyler Pulse. As background, in Tyler Pulse we support a Student Assessments data model that tracks all assessment results that a student has over an unlimited time frame. This model also uses this data to analyze student performance over time, but that is a subject for another posting. Pulse also has access to the master schedule in the Student Information System, so Pulse is aware of each classroom, the subject being taught, the teacher(s) assigned to the class and the students in that class. All of this data is combined into a single data set to perform quite a detailed analysis of the incremental value that teachers are contributing to their classrooms.
The following are some of the key factors that we evaluate with a research-based weighting scale. This weighting scale may also be adjusted in each school district or state. These factors include:
- Both the percentage and raw point gain of students taught by each teacher. This is the first step toward leveling the various competency levels of each classroom.
- The total percent of students taught by a teacher that have improved. It is always interesting to note that in almost every evaluation we have done, there are always students that simply do not improve during a year of instruction. These students should be marked for intervention programs.
- The classroom size. Teachers with 30 students in a room are simply stretched 5 students thinner than a teacher with 25 students in their room.
- The average age level in the room. Particularly in lower grade levels, a student in the bottom quartile of the age range in their grade will usually perform comparatively lower than students in the top quartile. A nine or ten month age difference makes a real difference in the lower grade levels.
- The average student “School Performance” rating of the students in the room. This is a ranking to determine how a student is performing in their general school activities. It includes adjustments for attendance, discipline, grades, assessments and much more. A teacher with challenged students has a more difficult task than a teacher with many apples on their desk.
- The demographics in the classroom. How many non-English speakers are there, how many are Special Ed, how many are Title 1 or gifted? Each of these factors has a distinct influence on teacher performance.
- And most importantly, what is the historical performance of the students? If a room of students historically performs at 50% of expectation, but this year they are performing at 75% of expectation, the teacher is doing an outstanding job, a 50% improvement. In most evaluations this teacher would be rated low because the students were not meeting expectation.
- Adjustments should also be made for students that did not test the appropriate number of times or enrolled in the teacher’s room half way into the school year. It would simply not be fair, or accurate, to do otherwise.
The result of this analysis is that a school administrator now knows the strength of each teacher in the school. Class assignments may now be made based on facts, via educated guessing. Students simply perform better when this occurs. Is it too much to expect better results when the best math teacher teaches math, instead of an average math teacher? Doesn’t it make sense to align students with teachers based on their needs?
The purpose of ranking teachers in most state and district plans is to weed out under performing teachers. But there is so much more to it than that. No evaluation can ever be effective if it is not fair, and most simply are not. Also, if the plan is fair and accurate, why couldn’t it be used to support better decision making? I would suggest that the reason most of these evaluations are not used for operational planning is because the evaluators simply know their (lack of) value. We are doing our best to change that.