Film Review Scores Are Fundamentally Flawed

Rotten Tomatoes and Metacritic were our first stop in determining how good a movie is. Until recently, I had no idea how each site got its rating in the reviews. As soon as I found out, I realized that I was reading them all wrong.

Where Rotten Tomatoes Come From and Metacritic Ratings

Rotten Tomatoes and Metacritic ratings are built into everything from movie viewing apps like Flixster to Google search results. You’ve probably seen the rating next to the title of the movie. Experienced users may even know that in fact every site has two ratings: one for critics and one for general visitors. What you cannot understand is that each node calculates these numbers very differently.

To get critical ratings, Rotten Tomatoes collects critical reviews from various sources, usually around a couple hundred, depending on how high the movie’s rating is. Each review is then classified as fresh (positive) or rotten (negative). The rating you show is the percentage of the total reviews that are considered “fresh”. For example, about the recent battle of superheroes Batman vs Superman, the site has collected 327 reviews, 90 of which fell into the positive category. 90 is 28% of 327, so that becomes the movie’s score.

Metacritic, on the other hand, uses a little more nuance in its system. The company collects online reviews and assigns them a score between 0 and 100. Where a site uses measurable metrics such as a numerical rating system or letter rating, Metacritic substitutes the number it thinks most closely matches that number. The site then uses a weighted average of all reviews. The company does not disclose how much weight it attaches to individual reviewers, but explains that certain reviewers are given more weight in the overall rating based on their “status”. This system allows you to reveal a little more nuance. In the case of Batman v Superman, Metacritic gave the film 44 points, significantly higher than Rotten Tomatoes’s 28%.

It’s worth noting that Rotten Tomatoes and Metacritic, as well as IMDb, also have different user ratings. They work more or less consistently across all three sites. Users can rate the movie on a scale of one to ten (technically Rotten Tomatoes uses a five-star rating, but you can use half stars, which makes the math functionally identical). Each site then has different ways to weight their ratings to determine the final user rating.

Rotten tomatoes take results to an extreme

The problem with the Rotten Tomato Method is that by boiling down the entire review to “good” or “bad,” it gives critical reviews a coin-tossing nuance. This dramatically changes the ratings in reviews in opposite directions. While Rotten Tomatoes hasn’t garnered much attention, you can find the “average rating” for each movie right below the Tomatometer rating on the website. This scale averages the peer reviewers’ ratings after they have been assigned a score on a ten-point scale. If we look at this Batman v Superman example again , we can see that his average rating is actually 4.9 . This is even higher than the Metacritic movie rated. However, because at Rotten Tomatoes, the attitude towards the reviewer who thought the film was normal but with some problems, as well as the reviewer who considered the film to be complete crap, drops to an awful 28%. …

However, this is not only a negative effect. We can take a look at another big summer superhero clash to see the opposite effect. Captain America: Civil War currently has a respectable 7.9 average rating on Rotten Tomatoes, but the Tomatometer score is significantly higher at 92% (126 Fresh Reviews out of 137). Again, the Metacritic method gives Civil War 77 , which is much closer to the Rotten Tomatoes average. Accordingly, this effect makes the Tomatometer something like a serum of Captain America’s super soldier: good becomes great. The bad gets worse .

The same effect applies to user ratings for Rotten Tomatoes, albeit less pronounced. Any 3.5 star rating (or 7 out of 10) is considered positive or “fresh.” A smaller value is considered negative or “rotten”. User rating is the percentage of positive ratings. While this is still oversimplified , the raw data has more room for compromise than the subjective “good” or “bad,” and it has a much larger dataset to extract from.

Metacritic is more subtle but can be more biased

Rotten Tomatoes’ biggest problem may be that it avoids nuance, but there is an understandable reason why it might want to. While Metacritic is nuanced, it is also sometimes criticized for doing it “wrong”. As we stated earlier, Metacritic assigns a numeric value to views before averaging them. However, choosing these numbers can be a subjective test.

For example, many review sites will offer letter ratings attached to their reviews, on a scale from A to F. In the case of F, Metacritic will give this review a 0 , while a review such as B- may get 67. Some reviewers disagree with how this score is assigned, believing that F should be closer to 50, or B should be closer to 80. Despite the lack of standardization of letter grades, this highlights a key Metacritic problem: how do you define a numerical value for an opinion?

Paradoxically, Metacritic gives reviewers more and less control over their grades. The ratings and opinions of the reviewer are more reliably represented by a numerical score than by a boolean good / bad. On the other hand, it also has more room for maneuver, which can lead to the views of the reviewer being presented differently than they are. This can become a huge problem if the industry begins to rely on ratings in reviews . Of course, if Metacritic only allowed each reviewer to select 100 or 0, there would probably be a lot more controversy (which, mathematically, is exactly what Rotten Tomatoes does).

What is really important for a review score

No matter how “objective” we try to get when it comes to ratings in reviews, we still try to convert opinions into numbers. It’s a bit like trying to turn love into fossil fuels. At first glance, the conversion doesn’t make sense. However, the scores from the review are still helpful. There are many films and most of us don’t have the time or money to watch them all on our own. Reviewers help us determine which films are worth spending time on. Convenient estimates simplify the problem by turning the solution into a simple two-digit number. In my experience (opinion too!), Here are the best ways to use each metric:

  • Rotten Tomatoes is a basic yes / no recommendation engine. If you want a simple answer to the question “Should I watch this movie?” Rotten tomatoes will probably answer it well. The rating does not necessarily reflect how good the movie is, but it measures the enthusiasm for the movie pretty well. Just keep in mind that this tends to take films to the extreme.
  • Metacritic tries to measure the value of a movie based on the opinions of reviewers. Opinions are never objective, but Metacritic is likely to resemble actual film quality more than Rotten Tomatoes. The downside is that the site can also inadvertently introduce its own opinion.
  • User reviews across all sites tend to reflect public opinion in a uniform way . There are minor discrepancies between user ratings for Rotten Tomatoes, Metacritic, and IMDb, but since they are all open to the public, you can use any user rating to get a decent rating. Take a look at what the average moviegoer thinks. Just keep in mind, this is exactly what you need. Average cinema audience. If your tastes differ from the generally accepted ones, you may disagree with user ratings.

Most importantly, remember that your opinion remains your own. The reviewers, no matter how well-meaning they are, come from different walks of life than you, and may enjoy some of the things you don’t like. Moviegoers love to follow the ratings in reviews as if they were playing sports. While it’s fun and all, it’s important to remember that no assessment is ever truly objective as long as they are measuring opinions. Use the metrics that are most useful to you to decide what you’ll spend your time on, but don’t let the numbers tell you what you like and what you don’t.

More…

Leave a Reply