Earlier this month, Watson, a computer built by IBM, faced off against Ken Jennings and Brad Rutter, the two greatest Jeopardy players of all time and trounced them. In a two day match, Watson earned $77,174 to Ken Jennings' $24,000 and Brad Rutter's $21,600. While the results seem to show that the human race's crown as Jeopardy masters has been passed, a deeper analysis of the facts tell a different story. I have no doubt that one day a computer will be Jeopardy champion, but that day isn't today. What you saw wasn't a fair match among opponents but rather something that was closer to an infomercial demonstration where the product produces "too good to be true" results based on a tilted playing field.
We'll get to how the game was rigged in Watson's favor in a moment, but first we'll look at the Watson backstory. When the project was approved, management explicitly required that the technology could be commercialized. Back in the 1990's IBM invested heavily in a project that resulted in Deep Blue, a chess playing phenomenon that went on to beat the best human player, Gary Kasparov. While the company won bragging rights, it turned out that there were no commercial applications for the technology. IBM didn't want to make that mistake again. When Watson was approved, it was done so with the belief that its question answering technology could be applied to many fields including healthcare, as an expert diagnostic assistant to help doctors, and retail, as a next-generation recommendation engine.
Over the years, I have seen many mind blowing demonstrations of gee-whiz technologies that never achieved commercial success. Each time investors got frustrated with the progress of the business, the management team would cook up another demo which promised that a breakthrough was just around the corner…and in the process, relieved the investors of several million more dollars. The most egregious cases are the ones that have created some of the most high profile public flops (think the Apple Newton). My analysis is that IBM execs just witnessed one of the best gee-whiz demos of all time and before they sink in any more money, they should have independent market and technology due diligence performed on Watson’s commercial prospects. The critical question they need to answer is: Can this generalized question-answering technology actually provide enough value over the purpose-build expert systems that already exist in fields like medicine to justify its cost? It’s a question that the Watson team cannot answer. They have too much personally invested in the program to come up with any answer other than ‘Yes’.
Now back to the game…In Jeopardy, you're not allowed to push the buzzer right away. You have to wait until Alex finishes reading the question. At that point, a light goes off and then you can ring in to answer. If you try and anticipate the light and ring in too early, you are locked out for a quarter second, meaning that there is next to no chance to win the buzzer race. This is where Watson's unfair advantage comes in. If during the period Alex is reading the question, Watson comes up with an answer that it thinks is right (based on my observation, that would be an answer that it has scored as having an 80% or more probability), it can ring in just 10 milliseconds after the light goes off – enabling it beat the human contestants, with their mere mortal reflexes, to the buzzer every time. So, even when the human contestants know the answers, Watson gets all of the points. The Jeopardy results didn't accurately reflect Watson's question answering ability, they reflected the combination of its question-answering ability plus its superhuman reflexes.
So how would Watson have fared if it had to rely on just its question-answering ability? To answer that question, we analyzed the results of Game 2 of the two-game series (Ideally we would have analyzed both games, but since we only TIVO'ed Game 2 and the match isn't available online, it'll have to do). In Game 2, the three contestants scored as follows:
- Watson: $41,413
- Ken Jennings: $19,200
- Brad Rutter: $11,200
However, final scores aren't necessarily a good measure of how each player fared. They are highly dependent on how players bet in the Final Jeopardy, who gets Daily Doubles and how much they bet on Daily Doubles. Taking out Final Jeopardy and Daily Doubles, the players scored as follows:
- Watson: $25,200
- Ken Jennings: $14,600
- Brad Rutter: $5,600
In watching the game, it was pretty easy to tell when Ken Jennings wanted to ring in but was beaten to the buzzer by Watson. He held the buzzer chest high and you could see when he pressed the trigger and lost. Since Brad Rutter kept his buzzer below the podium, it wasn't possible to tell when he tried to ring in. But, the data from Ken Jennings is enough to figure out the impact of reflexes. Of Watson's $25,200, $19,200, all but $6,000 worth, was won on questions where Ken Jennings tried to ring in. Had Watson and Ken had equal reflexes, it stands to reason that Ken would have buzzed in first in half those cases. Adjusting for reflexes (including the possibility that Ken would have rung in first and gotten it wrong, hurting him instead of helping him) would add $9,088 to Ken's score and taken off $9,344 from Watson, giving revised scores for those two players of:
- Watson: $15,856
- Ken Jennings: $23,688
Since both Watson and Ken Jennings got the Final Jeopardy question right, instead of losing, Ken would have had a sizable victory over Watson. In conclusion, we’ll end this post with our own game of Jeopardy.
Category: Man vs. Machine
$1,000 Clue: As of February 16, 2011, although not the fastest to the buzzer, these biological beings were still the best at answering Jeopardy questions.
Question: What are Humans?
I expect that there are many folks out there who will challenge our analysis. I’ve anticipated some of the objections and have addressed what I think are the three major ones below.
1. Shouldn't some of the points that we reallocated from Watson to Ken have gone to Brad, lowering Ken's revised total? While that is true, Brad would have also taken additional points from Watson. If we had data from Brad, we expect that the gap between Watson and Ken would be narrower, but that Ken would still enjoy a solid lead.
2. What about Game 1? Watson did even better in Game 1 than it did in Game 2. Wouldn't that have kept Watson the winner? Probably not. The reason Watson racked up such a huge total on Game 1 was that it answered 29 of 32 questions correctly in Double Jeopardy. I didn't have a tape, but I believe Ken and Brad also knew many of those answers and were shut out by the buzzer. Allocating those responses across players would have put one or both players within striking distance when they got to Final Jeopardy. Watson blew Final Jeopardy with a comically bad answer to an easy question. So, what would likely have happened is it would have been in second if not third place heading into Game 2
3. What about the humans' own "unfair advantage". Humans tend to ring in before they know the answer and then have several seconds to figure it out. If they had to answer right away like Watson, wouldn't Watson cream them? While this is true, I take exception to the notion that this represents an advantage for the humans. Instead, this represents a fundamental difference in how computers and humans process information. While it can take humans a few seconds to work out the right answer, we can intuit nearly instantaneously whether or not we will be able answer the question. Great Jeopardy players have great intuition and rarely get questions wrong after they ring in, as Ken Jennings demonstrated by getting just 1 question wrong in Game 2. Watson on the other hand seemed to either come to an answer very quickly or never got there. It doesn't have intuition and more time didn't appear to help it significantly. Changing the rules to take out the intuition factor would shift the advantage to Watson but would be counter the goal of the contest – figuring who is better at answering questions.