How to strengthen teacher evaluation systems to improve student growth

March 28, 2024
 | By 
Taylor McCoy

Jump ahead

Teaching evaluations aren’t having the intended effect

What does a strong teacher evaluation system need?

Use Strive to build and test your teacher evaluation system

Strive helps to eliminate bias in teacher evaluation systems

We can help you create the evaluation system of your dreams

Give Strive a test drive with our free walkthrough

Reading time: 13 min

Teaching is one of the most heavily scrutinized professions. In addition to professional evaluations, teachers are subject to the feedback of peers, parents, and policymakers. As current and former educators, we get it. It’s a lot of pressure to be monitored by so many interested parties.

At Eduphoria, we know that feedback is essential. We created an application to streamline teaching evaluations for school districts. However, evaluations can be less than helpful (and potentially harmful) if not handled correctly. 

If teaching evaluations aren’t working for your district, the reason is probably skewed data. Fortunately, that’s something we can fix.

Teaching evaluations aren’t having the intended effect

Our US school systems have been trying to master the art of teacher evaluations for a while now. Unfortunately, it isn’t turning up the best results.

Around 2009, the Department of Education launched a contest between states. This contest, called Race to the Top (RTTT), offered funding to schools that increased the rigor of teaching evaluations. Teachers who made the cut would receive higher pay. The goal? Increase student performance. RTTT is largely why teaching evaluations now include measures of student growth. 

We can’t argue with an end goal of improved learning. All education-centered initiatives should consider the betterment of the students. Unfortunately, fifteen years post-RTTT, teacher retention continues to drop, job satisfaction is at an all-time low, and student performance isn’t where we want it to be. 

One of the foremost studies on this phenomenon was conducted by a team at Brown University, Annenberg. Its authors found that these more rigorous teaching evaluations did not improve student learning. In a few cases, they actually lowered student performance (Bleiberg, et al., 2021).

Why are these new evaluation systems not producing positive outcomes?

Some might argue that the high stakes of these new evaluations are related to their poor adoption and implementation. However, the Annenberg study indicates that the rigor of each system did not affect the result (17). 

It’s also important to note that some districts’ positive results were masked on the larger scale. So, if it is possible to affect positive change by revamping evaluation systems, what went wrong with the majority of schools?

These Brown University scholars identified a few possible reasons why evaluation reform hasn’t increased student performance:

  • Inconsistent application of federal standards
  • Increased turnover for lower-performing teachers
  • Inadequate feedback from untrained evaluators

There are other possible reasons for stagnated student growth. (Let’s not even get into how skewed our growth and assessment data will be in the aftermath of COVID-19.) A major factor, though, is each state’s varied interpretation of the standards.

The authors of the study note that “despite the widespread adoption of teacher evaluation reforms, many states designed evaluation systems that only vaguely resembled the systems most reformers envisioned” (25).

Each unique approach lacked the support system it needed to create positive change and produce reliable data. Consequently, teaching evaluation ratings under the new system remained unchanged from previous years. Per Education Week, ratings stayed positive even for teachers who didn’t meet the requirements for a positive rating.

The last 15 years of data indicate that bias has persisted in teacher evaluations. They aren’t being tested, calibrated, and changed in discernible ways. Thus, the data being gathered isn’t reliable or representative.

Reason number one: Biased data

“Bias” and “skew” both have negative connotations. While they do have gnarly effects on data, they don’t necessarily indicate something nefarious. Overwhelmingly positive teacher reviews are evidence of a supportive teacher community. We want our fellow educators to do well! We want them to enjoy higher pay! And we don’t want them to feel discouraged when they’re already dealing with so many challenges in their day-to-day job.

Those are good things. Good intentions. But, they skew the data.

Here are some other kinds of bias that can appear in teacher evaluation data:

  • Ageism
  • Familiarity bias
  • Previous evaluations
  • Departmental bias 
  • Racial discrimination
  • Gender identity discrimination
  • Sexual orientation discrimination
  • Disability or long-term illness discrimination
  • Caretaker discrimination
  • Religious discrimination

We could go on, but you’re seeing the point. Not only are there a myriad of ways in which a person could be biased in their evaluations, but there are outside factors that can skew the data, including:

  • Lack of objective standards
  • Lack of training 
  • Lack of accountability in implementation
  • Poor interrater reliability
  • Ratings drift

Bias and skew are less likely to exist in a system with checks and balances. The truth is that bias is a result of flawed systems. 

Reason number two: Flawed evaluation systems

We can and should talk about what went wrong, but first, let’s clear the air. We commend the teams of educators who spent many meetings over many months diving into best practices to create the best system for their state, district, or school.

Federal standards aren’t always going to account for the needs of your local community. However, you can have both. You can create a robust evaluation system that produces objective, unbiased data. You can create a system that also considers your unique needs and demographics.

First, here’s where many teaching evaluation systems fall short:

  • They don’t use best practices to create their rubrics and ratings.
  • They don’t measure a teacher’s skills based on objective, observable domains.
  • They don’t measure over a period of time; rather, they rely on one or two short interactions with the teacher.
  • They don’t use reliable measures of student growth.
  • They don’t balance student and parent feedback.
  • They only measure one kind of skill or one aspect of job performance.
  • They don’t personalize their feedback to the teacher.
  • They don’t offer professional development that is tied to the evaluation.
  • Evaluators are under-trained and underprepared to implement the new system.

If the list above sounds a lot like the system you’re used to, don’t worry. We're confident we can help you improve. We’ve used decades of research and experience to develop an evaluation management system that supports every person at every level of the teacher evaluation process.

What does a strong teacher evaluation system need?

A teacher stands in front of a crowd with a notepad in his hand.

We can’t talk about every aspect of a successful system in this article. We can tell you how to create an evidence-based system, eliminate bias, and measure performance.

Here are the basic elements of a successful teacher evaluation process:

  • Objective rubrics and evaluation forms
  • Personalized feedback
  • Professional development
  • Interrater reliability and iterative improvements

Your evaluation forms must be objective

No one likes to rank their friends and colleagues like a robot would. However, objective ratings are designed to combat bias. In a well-built evaluation system, you can support your fellow teachers with personalized feedback and objectify their skill sets.

First, make sure your rubrics rely on local, state, and federal standards. Ask yourself:

  • Do our evaluations measure the teacher’s abilities across multiple skill sets, such as behavior management, subject matter expertise, use of technology, etc.?
  • Do our evaluations consider student growth through multiple objective measures of success?
  • Do our evaluations help to eliminate bias through numbered rating systems, sliding scales, or yes/no qualifiers?
  • Do our evaluations consider the best interests of the teachers and the students?

Next, consider how you can use personalized feedback to maximize the positive effect on the teacher. Ask yourself:

  • Am I using the teacher’s goals to inform my feedback?
  • Am I using effective communication methods like structured protocols?
  • Am I offering professional learning communities and resources? Are they related to the teacher’s performance and learning goals?
  • Am I offering collaborative conversations throughout the year and not just during evaluations?

Feedback has to include more than just a number system

Numbered system rankings are only effective when combined with personalized feedback. A teacher who receives quick, impersonal ratings will likely feel that the evaluator didn’t consider the person behind the rating. If evaluators don’t connect with their teachers, teachers will discount the feedback as the product of a flawed system.

This isn’t just a gut instinct that we have as people who have been on the receiving end of these ratings. Research published by Walden University shows that teachers will ignore feedback that doesn’t meet certain criteria. They’ll also have pretty strong negative feelings about it. And why shouldn’t they? 

The teachers included in the study felt disrespected by the process and did not find the feedback they received to be helpful. They thought evaluators paid excessive attention to their deficiencies rather than growth opportunities. The author of this study found that flawed evaluation systems and untrained evaluators were primarily the cause of these perceptions.

However, when the quality of the feedback improved, so did the teacher’s willingness to incorporate it.

Here are some key takeaways from the study:

  • Evaluators must be experts in their evaluation domains to provide useful feedback.
  • Evaluators need to spend sufficient time observing the teacher.
  • Evaluators should have positive working relationships with the teachers they evaluate.
  • Evaluators need to provide specific and actionable recommendations.
  • Evaluations should be accurate and foster self-reflection.
  • Evaluations should include modeling and praise.

Helping evaluators meet these criteria takes a lot of work, and training is a time-consuming process. Save our webinar to see how Lubbock Cooper ISD set up a robust evaluation system through training, calibration, and data collection. The host is Eduphoria product manager Dr. Jeremy Wagner, a former district administrator over STEM and advanced academics.

Teachers need thoughtfully-crafted professional learning

Every professional needs training and learning opportunities to be highly effective. Teachers are no different. Their learning opportunities should give them the skills to design and implement a curriculum that helps students meet their goals. However, studies on teacher professional learning don’t always show a correlation between professional and student growth. This is not because professional development doesn’t lead to improved student learning. Often, the professional development being provided doesn’t meet the necessary quality standards.

That being said, even professional learning that doesn’t improve student growth still serves a function. There are other significant reasons to give a teacher learning opportunities, including:

  • Training and professional development improve a person’s confidence in their work (National Institute of Health).
  • Improved confidence can make people more willing to use what they learn in their daily professional lives (NIH).
  • Professionals with continual learning opportunities have higher job satisfaction and a lower likelihood of leaving the profession (Healthcare).

There will always be an evidence-based reason to provide faculty with high-quality professional learning choices. However, research shows that certain kinds of professional learning will positively impact student growth.

According to an article published by the Department of Education, professional development can improve student learning if it includes:

  • Specific applications to learning needs in the student body
  • Information the teacher finds helpful
  • Research and best practices
  • Self-selected learning goals
  • Ample time to practice new learning and reflect on current practices
  • Opportunities to receive feedback from colleagues and evaluators
  • A supportive culture around professional learning

Specific, actionable, and supportive training can solve several district problems at once. With adequate support, teachers will be able to pass new skills along to their students through learning. They will also be more confident, more satisfied, and more integrated in their school community.

Iterative improvements can fix a skewed data problem

Everyone who has spent time in the classroom knows you won’t always get it right the first time. When your plan falls short of expectations, you return to the drawing board and draft revisions. The National Council on Teacher Quality (NCTQ) discusses how to address bias through iterations to make improvements to your evaluation system. They suggest talking to your teachers to determine which parts of the process aren’t working for them and using tools that help you identify bias.

Talking with your staff is another method of gathering data to improve your system. Their feedback, in addition to the data you gather through data visualization tools, will help you to tackle a skewed data problem. 

Earlier, we mentioned that there are five ways outside factors can skew evaluation data. We’ve addressed how the first two, lack of objective standards and lack of training, create a lack of accountability in implementation. This can lead to:

  • Poor interrater reliability
  • Ratings drift

EdWeek’s discussion on a lack of accountability in teacher evaluations supports Brown University’s findings. Teacher evaluation systems aren’t being held to a higher standard. So, what does poor interrater reliability and ratings drift look like, and how can school and district leaders address it?

When evaluators rely on previous observations or their feelings about a person to make a professional rating, they are, in essence, creating and using an unreliable system to evaluate their teachers. When this happens, there is no interrater reliability between departments, schools, or districts. Whether ratings lean positive or negative, it isn’t fair to teachers, nor is it providing helpful information to evaluators.

Poor interrater reliability produces ratings drift.

Let’s say a school was able to establish a new system, train evaluators, and make systemic changes. At the end of a few years, they stop using their new system as the lens through which they evaluate their teachers, instead becoming more severe or more lenient over time. The ratings may drift higher without a rise in skill and student growth. They may also dip lower, though the teachers’ skills haven’t worsened from the previous year. Inaccurate ratings ultimately hinder growth opportunities for teachers, students, and schools.

If the systems are working, teachers should be improving or maintaining their previous scores. However, all of the research indicates that the systems aren’t working. Your teachers are likely improving, but there’s no real way to tell when the data is skewed.

We must be able to trust the data we collect.

That’s why we test our evaluation systems and improve them with iterative changes. Otherwise, our systems will continue to fall short of our ultimate goal of improving student growth.

Use Strive to build and test your teacher evaluation system

A math teacher is seated in front of a whiteboard with her hands together on the desk.

If you want to build a stronger teaching evaluation system, you can do it. Whether you’re in the beginning stages of revamping your system or need to streamline an existing one, we have a software solution for you.

We built Strive to help educators grow. Using our application, you can:

  • Create and share custom evaluation forms
  • Offer personalized feedback
  • Develop professional learning opportunities
  • Track system success through data visualization

Create and share custom evaluation forms

Some states have existing evaluation forms which districts can use to inform their new approach. With Strive, you can share these evaluation forms with your staff, edit existing forms, and create your own. 

Most of Strive’s evaluation templates are completely customizable, down to the language you use to describe each step in the process. Districts can address local needs by modifying evaluation forms to add custom criteria.

In the observation document, standards are displayed to the left along with a spectrum ranging from "improvement needed" to "dinstinguished." Evaluators can select the bubble that best describes the teacher's proficiency in each domain.

Turn evaluations into meaningful mentorships with high-quality feedback

To combine objective observations with personalized feedback, add notes under each domain, dimension, or group. In Strive, evaluators can upload evidence to document improvement, demonstrate skills, or highlight an area that could use additional training.

To give teachers a full and comprehensive rating, evaluators can drag and drop feedback from previous observations onto the summative form. They can also drag and drop notes from their private notepad.

Both teachers and evaluators can reach out through the in-app messaging system. This is a place where mentorship conversations can live. That way, key insights and breakthrough moments are available for future reference. Through messages, teachers and administrators can improve the evaluation process together. 

The messages screen shows previous messages between appraiser and appraisee. Each message shows the date and time of the message, the person who sent the message, and has an option to view more. The message box and "send" button are at the bottom of the screen.

Offer training to teachers and evaluators

The research surrounding professional learning in school communities is quite clear: both teachers and evaluators need personalized training.

Educators often source their professional learning from several in-person and online locations. As a result, it can be difficult to plan and track these courses unless they are in a centralized location. Thankfully, Strive offers a professional learning portfolio where teachers can upload their course certificates. It also acts as a professional learning hub for districts to create and share their own professional learning.

Teachers can register for and attend professional learning asynchronously using the Strive app. They can also secure seats for local training and conferences. Better yet, they can search for professional learning opportunities that match their goals! With options like these, teachers can access specific, actionable, and supportive learning opportunities on their own terms.

Professional learning isn’t just for the teachers, though. When you implement your teaching evaluation system, you’ll need a plan to disseminate that information to evaluators. In Strive, you can create conferences and request the attendance of interested parties.

Next, host your conference and align your faculty to start getting valid, valuable data that truly impacts student growth.

The conference creation screen in Strive. Pictured are the dates of the conference, the description, tags, and contributors. Conference creators can add contributors by their email address.

Strive helps to eliminate bias in teacher evaluation systems

You shouldn’t need a math specialization to collect and validate data. Strive does all of that for you. Plus, we have several tools to help ensure you’re leaving bias out of your evaluation system.

Introducing: the detailed analysis panel. 

The detailed analysis panel. Pictured is a list of several teachers. Next to each name are colored blocks which depict the breakdown of their ratings.

From this panel, district leaders and administrators can see how their teachers perform on evaluations. They can filter teachers according to school, department, or demographic to see if any data is skewed. Say one school consistently awards “needs-improvement” ratings with no teachers rising above that margin. District leaders could quickly identify which appraisers at that school need additional training and calibration.

Next, take a look at the scatterplot.

The scatterplot graph. Pictured is a scatterplot which shows how student growth and teacher performance compare. Teachers with highest skew are selected and displayed in the "insight" section.


This graph shows teacher ratings compared with student growth measurements. In an ideal world, high-performing teachers would have high-performing students. Yet, in some cases, teachers with high ratings have low-performing students or vice versa. With such explicit visual data depictions, it’s easier to identify areas where support is needed.

We can help you create the evaluation system of your dreams

Being a leader in education is a stressful job. You’ve got a lot on your shoulders, including the growth and development of hundreds, if not thousands, of students. It may feel like too big of a task to overhaul your evaluation system on your own. We never want you to feel like you’re in over your head. 

We’d love to offer you the support you need to get your schools working in the way you envision. Let our team talk to you about how Strive can support your teachers and grow your students. We offer ed tech tools that simplify and streamline processes like teacher evaluations, but we also offer the support of an incredible team.

At Eduphoria, we aren’t just talking heads. We’ve been in schools, teaching, supporting, and training. Education is our passion, and we love what we do. We want to help you realize your education-centric dreams, too.

Give Strive a test drive with our free walkthrough

 Use our walkthrough, or fill out our demo request form in the navigation bar to get an inside look at Strive for teacher evaluations.

Jump to top

More like this

More like this