[Go to end]

A New 9-Point Rating Scale

With two three point scales combined into one

In the movie titled "10" Dudley Moore's character is obsessed by Bo Derek's beauty... presumably he's rating her at the max on a 10-point scale. Most likely we all have seen someone whose looks we might rate a 10. And we might also award 9s and 8s to attractive people. But when beauty is less distracting, rating is rare. So essentially the 10-point scale becomes a 3 or 4-point scale.

Bo Derek

Movie and restaurant reviewers often use four stars, or thumbs, or whatever. Sometimes their judgment is cramped between two values, so then they tack on a half star or thumb to the two or three. Why, we have to ask, didn't they start with more stars or thumbs?

thumbs

Opinion polling, with questions like "how much do you like your job?" generally comes with a 5-point scale. Often words are spotted with each value where 1 is something like "Can't stand it" and 5 is "Love it." But if five people rate their job 1 and five give it a 5, the average is "It's a living."

Often, when a judgment is required, people tend to avoid one end of the scale. When judging beauty on a 10-point scale, the "ugly" values 1, 2, 3 are rarely used. And when asked how much it hurts, patients generally go for the bigger numbers... because, golly, it hurts.

pain rating scale

The fact is, people can make more confident judgments with a scale of only few categories. Accurately judging something as "Good" or "bad" is easier than rating something on a scale of 1-10. In addition agreements between raters about what is "good" and what is "bad" are more likely than agreements about which number, 1-10, to apply. Yet a scale with fewer rating points often does not provide enough information (that's when half-stars are used.) Movie-goers can hardly make an informed choice when all the movies playing are rated "thumbs down."

On the other hand, more scale values may give inaccurate impressions. Some teachers give only A's and B's, saving C's for low achievers and giving no D's or E's. Raters in athletic events generally squash a ten point scale into just two or three (9, 8, 7) and then reconstitute a finer discrimination by using fractional intervals--so what's the point of ratings six and lower?

Adding anchor points, that is values that are reliable across many ratings, can increase a scale's reliability. The most stable anchor is usually the midpoint. Language anchors, like "most likely" and "sometimes", are useful but may be understood differently by different people.

So the choice seems to be between a reliable, but coarse, two or three point scale and a seemingly more informative large scale, 1-10, that is susceptible to being bias and inconsistent. What would be ideal would be a combination of the two, a scaling method that is as easy and reliable as a three point scale yet provides more information and allows more discrimination.

A two-phase 9-point scale is the answer. Let's call it the 3X3 Rating Scale. The 3X3 requires the rater to make an evaluation with two separate judgments in a short amount of time instead of one – in other words, the operation is made in two phases, optimally a few seconds apart. Each of the judgments is made on a 3-point scale, as mentioned above, a rather easy and reliable operation with three anchor points. Combining the two evaluations yields a 9-point scale, 1 to 9; (1, 2, 3 )( 4, 5, 6 )( 7, 8, 9 ).

In the first phase of the 3X3 rating, the rater confronts the target, be it a movie or a maiden, and makes a quick, gross judgment assigning one of three values, like "unfavorable" "neutral" "favorable." The words could be "bad" "average" "good" or "no" "maybe" "yes," it doesn't matter. Whatever 3-point scale words are used, the initial judgment must be a gut-level reaction. The rater must not be tempted to confuse the call with any finer discrimination and must give no thought to the refinement to be made in the second phase. The first phase judgment must be independent and apart of any second thoughts. Once made, the judgment should not be changed for any reason.

In the second phase of 3X3, the rater again confronts the target, but this time considers his first rating and decides how well it fits. Again the judgment is made using a 3-point scale, except this time as an adjective to the first rating. If, for example, the first score was "bad", the second score clarifies how "bad" – that is, "really bad," "average bad," or "barely bad." Likewise an initial "average" score can become "less than average," "boringly average," or "better than average." The exact language description is unimportant because the two separate 3-pont scores are meshed together to make a 9-point scale with relatively equal intervals and constant anchor points.

The 9-point scale has three groups of three digits (3X3): the bad group is 1-2-3, the average group is 4-5-6, and good group is 7-8,9. The first rating sets the group, the second rating the digit in that group.

So, for example, if you thought the movie was "bad," this would put it in the 1-2-3 group; then if you thought about it and decided it wasn't as bad as some you'd seen, you would call it "barely bad" and give it a rating of 3.

If asked about your foot injury, you might instantly say "it kind of hurts" putting it in the 4-5-6 group; then you would think about it and say to yourself that it hurts tolerably and give it a rating of 4.

If you are a shallow guy and like to rate women's looks, you might react to a flight attendant with "she's a beauty" and that's in the 7-8-9 group. Then you notice your beau looking at you with piercing eyes and decide "but really not that beautiful" and give the woman a rating of 7.