• Fuzzy searching is possible, but this only fixes spelling errors • Can also deal with varying suffixes (stemming), synonyms, etc. • Fast, but not necessarily “accurate” • Great for individual documents, not so much for linked documents Monday, 24 October, 11
from keywords • Users familiar with interface • Users don’t have to remember exactly what they want • Users think “my class at 5 PM on Thursday,” not “Databases” • This system could scale to support thousands of concurrent users Monday, 24 October, 11
“Hey there!”) • Powerful macro system • Built on top of the JVM • Plethora of Java libraries at our disposal • Pretty fast • Ubiquitous Monday, 24 October, 11
ubiquitous nature, OS choice is a personal preference more than anything • I find a unix-like platform preferable to develop on • Nothing stopping us from using Windows Monday, 24 October, 11
three indices: 1.Values - Set of distinct strings in each table (n-gram’d) 2.Entities - Each row in a table becomes an entity 3.Groups - Linked entities combined into groups Monday, 24 October, 11
• To here? csci 5010g Survey of Computer Science This course is a survey of some of the main... Type Course Course ID csci 5010g csci 5010g Attributes code csci 5010g title Survey of Computer Science description This course is a survey of some of the main... Monday, 24 October, 11
column • Rest are considered attributes • Make use of Clojure data structures (specifically maps) {:__type__ "Course" :__id__ "csci 5010g" :__attrib__ {:code "csci 5010g" :title "Survey of Computer Science" :description "This course is a survey of some of the main..."}} Monday, 24 October, 11
the hierarchy • Use this instead of foreign key relationships from the data store • Store links to many entities in a single document • Hypergraph > graph • Matching one entity in a document pulls out all related entities • Eg. finding a course would also pull out section entities, instructor entities, etc. Monday, 24 October, 11
Not repeated, n-gram analyzed • Every entity stored as a separate document • Can search for any text and return an entity that contains it • All entity groups • Can find related entities based on unique identifier • Traversing across groups requires recursive searches Monday, 24 October, 11
• Users help train the system based on their input and reaction • The system must scale up • Far slower than traditional keyword-based search • User interface is a challenge • How do we present the results to users in a useful manner? †subject to change Monday, 24 October, 11