(2010), Google Now (2012), Cortana (2014) • multiple languages •“Eugene Goostman” passing Turing Test • 5 min. text based conversation • fooled 33% of judges it was human
Scoring • Boolean model, TF/IDF, Vector space model → Lucene’s practical scoring function •Plus “secret sauce” • constantly adjusted term boost + custom rules • Who is your user?
global • Need better comm. across multiple languages • Not only search, but understand & translate •Manual translation highly accurate but doesn’t scale • Use computers to supplement • e.g. statistical models on “aligned text” • Example: UN reports published in 23+ languages
• cannot tweak proprietary “black boxes” •Succeed by constantly • incorporate user feed and • tweak search results •Open Source allows you to do that • Ultimate “white box”
Java • gets you going real “quick” •Last chapter contains fact-based Q&A kickstarter • think “light-weight” IBM Watson • obvious where to optimize from there •Invitation to contribute on Open Source!