Netflix doesn’t know me: How I lost faith in recommendation engines

by Rian on December 2, 2009

When Netflix first came out with their movie recommendations, I thought it was a great idea. I started rating movies I’d seen — good and bad — confident that the brain behind it all will do its magic and recommend some hidden movie gems that will, you know, change my life. Well, I’m still waiting for those movies. And to be honest, I’ve become a little bit frustrated with the whole thing.

Describing the latest example I encountered will reveal how much I liked a movie that I probably have no business liking, but I’m willing to sacrifice a little bit of my reputation in the name of science, or whatever this is…

The first problem I encountered is a pure UI issue, and has to do with how Netflix shows the star movie ratings on their pages. As an example, this is what I see for the movie August Rush in my queue:

You would assume that the customer average rating is just over the 3-mark, right? Well, looking at it closer, it turns out that Netflix shows you a rating they call “Our best guess” (3.4 in this case), instead of showing you the customer average (4.1 in this case):

Here’s the problem. I loved this movie. I’m giving it 4 stars. But since Netflix doesn’t know that I have a soft spot for modern musicals (despite how highly I rated the movie “Once”), the “Netflix brain” didn’t think I would like this movie as much as the average customer.

This is a problem you see often on sites where the UI does not give proper user feedback about what it’s showing you.  It took me a few weeks to realize they’re showing me “Our best guess” in search results, and not the true customer average. Now I have to mouse over to see the true average every time. Why? Because I don’t trust the brain any more. (By the way, this is just one example, but as I’ve looked into it more, I realized it’s a systemic problem for me — Netflix’s best guess is rarely in line with my tastes).

Incidentally, on Amazon.com, the average user rating is 4.5 out of 5 stars. Pretty good. So this is the problem then. There is such a wide range of tastes out there that it’s hard to know who to trust. This is the problem Netflix is trying to solve — let’s look at “users like you” and then show you that average instead of the overall average. You’re therefore initially more inclined to believe the “best guess” rating provided by Netflix, than the average consensus provided by all users. It’s a good idea, but the implementation doesn’t seem to be there yet.  (The discussion about the validity of 5-star ratings in general is a separate and very interesting discussion).

I say all this to make a simple point — it appears that the collective wisdom of all users does a better job of predicting if I will like a movie than the recommendation engine provided by Netflix. The question is whether it would ever be possible for recommendation engines to get to know you well enough based on your preferences. Maybe if it takes into account not only your movie interests, but also music, books, online activity, etc.? Yes it sounds creepy, but how else would Netflix know how much I like strange modern musicals?

{ 10 comments… read them below or add one }

-johnbodine December 2, 2009 at 12:06 pm

Nice post Rian, and while i partly agree, i find more weight should be given to the fact that this is still relatively nascent technology and that as more data points become available that “best guess” will likely become highly accurate. The key here is in the long tail of information about you, capturing the nuances.

Second, how did they do on other recommendations? This is but a single data point…

Google has the advantage in this space. eBay could but doesn’t quite understand yet.

-johnbodine

Reply

Rian December 2, 2009 at 12:33 pm

Hey John – thanks for the comment. Yes, this is one data point, but it’s actually part of a systemic issue for me. Maybe I have a weird taste in movies… Wondering what others’ experience with Netflix recommendations has been?

In general, I find Amazon knows me a little better — maybe because I buy across multiple categories so they have a more complete picture of me…

Reply

Chris December 2, 2009 at 12:46 pm

Hey Rian – Sorry that you had a negative experience. I actually work at Netflix now and would be interested in getting more of your input if you’re up for a coffee or lunch. – Chris M

Reply

Rian December 2, 2009 at 12:57 pm

Oops, sorry Chris – hope we’re still friends :) I’d love to catch up. I heard a talk by Bill Scott recently and was very impressed. In general I’m a huge Netflix fan — I just think recommendation engines have a long way to go, and in this particular case there is a significant UI issue due to lack of user feedback.

Reply

Chris December 2, 2009 at 2:03 pm

No apologies necessary, Rian. I agree, though we have made a lot of progress there is always going to be room for improvement. Our offices are right around the corner from yours, I really would love to get more of your input if you’re up for it.

Reply

-johnbodine December 3, 2009 at 4:18 pm

Perhaps art is more challenging to predict than consumer goods? i find i am personally drawn to all sorts of artistic things others wouldn’t imagine for me, movies would fall into this category and perhaps more so than music and of course hard goods such as electronics.

Also a tremendous problem to tackle, $1M for a 10% increase anybody? :)

Should be interesting to see any published results on Apple’s Genius feature and how it does at driving sales.

-johnbodine

Reply

Jimi December 22, 2009 at 10:24 pm

Good topic, and I came here because I was trying to figure out how to turn the best guess system off.

The major problem I have is that while a lot of people like movies a lot that I don’t, generally GOOD movies have GOOD ratings by the masses. Even if you have to sift through a little of those big budget action robot and star chaser movies with cheesy plots and lots of explosions that always seem to get high ratings from the masses.

This is my major problem with the system. I see movies that are rated in the 3-4 star range as best guesses that are rated like 1-2 stars by everyone else and they are literally terrible movies. This is most apparent in the horror genre. I’m sure we can all agree that most horror movies are pretty bad, and that many people rate them low for various reasons, but there are very good ones out there. What I see with the best guess ratings are horror classics out rated by complete trash many times.

With my best guess ratings, sorting through most genres I am lucky to find any distinction between any of these movies at all they are all rated so closely to eachother. Take for example Sci-Fi Horror…

This means that I might see something like “Alien 51″ (some horror movie with Heidi Fleiss that call girl, or whatever she is) being guessed to have the same rating as “Event Horizon” which is considered by many to be a bit of a classic. Actually it was rated a bit HIGHER than Event Horizon is.

How can anyone, or any system think that anyone would want to watch one of the worst sci-fi horror movies ever instead of a sci-fi classic no matter what their ratings are? This just doesn’t take into account enough factors.

I think one of the major issues is that it gives too much juice to movies you have watched instead of having a fallback mechanism that reverts to the mass averages. As a horror fan I watch a lot of crappy movies, and this isn’t because there aren’t better ones out there. It is because I have seen those good ones already and obviously can’t think of them all to go out and rate them unless I just happen to see them show up. This makes it appear as if I LOVE crappy movies and so I get recommended a ton more even over actual legit good movies, and I can’t find those good movies because they are rated the same, or lower than the terrible movies based on my viewing and rating history.

I rarely give a movie a 1 star rating. I may give it a 2 many times for b-rated movies simply because I didn’t hate it, and t shouldn’t be lumped in with the most awful I have ever seen. I’m a bit more fair in this regard I think than most are who don’t watch a lot of that type of movie. It is very common for the review to be 1 star from them and contain the words “WORST MOVIE I HAVE EVER SEEN!” because they haven’t seen that many from the genre and are rating in a completely illogical and immediate gratification sort of way.

Netflix takes a few things for granted in their overall system. First of all a 5 star system is way too limiting for the type of data you are trying to extract. IT just doesn’t allow there to be a realistic view of how people feel about movies and we are saddled with the majority of films having averages of 3 ratings or lower. I don’t know about you, but I see average movies all the time, but some are clearly better than another.

Change that system to a 10 point scale and your entire system is better immediately.

Reply

Rian December 22, 2009 at 10:32 pm

Hi Jimi

Great observations, thanks for sharing. I particularly agree with one of your central points that Netflix just doesn’t take enough factors into account. I am also a sci-fi horror fan and it is extremely difficult to find those hidden gems (btw, check out Let The Right One In — not sci-fi, but it’s amazing). I think if Netflix can figure out how to find those diamonds in the rough for the genres you’re interested in, it could be a huge win for them (and us). Let’s keep hoping they’ll do that.

And yes, Event Horizon is one of my favorites too :)

-Rian

Reply

Jimi December 22, 2009 at 10:45 pm

Hi Rian,

You replied pretty quick….haha

I have been looking at Let the Right One In for a long time now and have yet to watch it. Guess it is time…:D

That probably is the hardest genre for ratings to work. Obviously, you can throw out a lot of ratings at the far end of either side as just odd ratings for most movies, but that middle 3 is way too broad. If you could rate one movie a 4 that could watch, but didn’t think it was particularly good, and then rate a movie a 6 that is still rather average, but done much better than the other with a better story…Now we have more movement in the sort of average movies we are prone to watch in minimally the horror genre.

This is kind of how I learned to distinguish between pure rubbish, and maybe a decent flick on IMDb at least. You pretty much know that all horror movies there will be rated lower than others, but if you see them flirting with 6s and 7s you know that it is probably a very good movie to a horror fan like me. With Netflix you are looking at seeing the same rating for an IMDb that would be a 3-4 as you are one that would be a 6-7 there. There is a huge difference between those ratings, but not when you are only allowed to offer up a 1-5 rating.

Reply

JBMONCO February 24, 2010 at 8:40 am

First let me say that I LOVE August Rush. I’m a sucker for that kind of movie. Did a quick look at what Netflix recommends that I have already seen and it did peg that I would like Food, Inc. and Frost/Nixon (that’s a range). It also gave me reasons why I would like those based on my previous ratings of other movies. Also suggested Capitalism, a Love Story and the Soloist with ratings that are close to what I would rate them. Of the movies it recommends that I have NOT seen, I’m skeptical about whether I would like: Waitress, Master and Commander. For what its worth I have rated 1404 movies.

Reply

Leave a Comment

Previous post:

Next post: