Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Full-Text Search Demystified with Java

Full-Text Search Demystified with Java

Today’s applications are expected to provide powerful full-text search. But how does that work in general and how do I implement it on my site or in my application?

Actually, this is not as hard as it sounds at first. This talk covers:
* How full-text search works in general and what the differences to databases are.
* How to implement common requirements with Elasticsearch.

Attendees will learn how add common search patterns to their applications without breaking a sweat.

Philipp Krenn

June 12, 2016
Tweet

More Decks by Philipp Krenn

Other Decks in Programming

Transcript

  1. Testing it in Elasticsearch GET _analyze { "char_filter": [ "html_strip"

    ], "tokenizer" : "standard", "token_filter" : ["lowercase", "stop", "snowball"], "text": "The two <em>lazy</em> dogs, were slower than the less lazy <em>dog</em>, Rover." }
  2. { "tokens": [ { "token": "two", "start_offset": 4, "end_offset": 7,

    "type": "<ALPHANUM>", "position": 1 }, { "token": "lazi", "start_offset": 12, "end_offset": 21, "type": "<ALPHANUM>", "position": 2 },
  3. Inverted index Metadata: Term frequency, position doc0 doc1 doc2 two

    1 [1] 1 [5] 1 [3] lazi 2 [2,9] 0 1 [7] dog 1 [3] 0 3 [1,8,13] quick 0 1 [1] 0
  4. Storing data PUT /quotes/quote/1 { "text": "The two <em>lazy</em> dogs,

    were slower than the less lazy <em>dog</em>, Rover." }
  5. Storing data { "_index": "quotes", "_type": "quote", "_id": "1", "_version":

    1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false }
  6. Searching ... "hits": [ { "_index": "quotes", "_type": "quote", "_id":

    "1", "_score": 0.25, "_source": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "token_filter": [ "lowercase", "stop", "snowball" ], "text": "The two <em>lazy</em> dogs, were slower than the less lazy <em>dog</em>, Rover." } } ] } }
  7. score(q,d) = queryNorm(q) * coord(q,d) * SUM ( tf(t in

    d), idf(t)², t.getBoost(), norm(t,d) ) (t in q)
  8. Search happy hippopotamus I am happy in summer. After Christmas

    I’m a hippopotamus. The happy hippopotamus helped Harry.
  9. @OneToMany(mappedBy = "destCustomerId") @ManyToMany @Fetch(FetchMode.SUBSELECT) @JoinTable(name = "customer_dealer_map", joinColumns =

    { @JoinColumn(name = "customer_id", referencedColumnName = "id")}, inverseJoinColumns = { @JoinColumn(name = "dealer_id", referencedColumnName = "id")}) private Collection<Client> dealers;
  10. Ted Neward: ORM is "The Vietnam of Computer Science" —

    http://blogs.tedneward.com/post/the-vietnam- of-computer-science/
  11. import static org.elasticsearch.common.xcontent.XContentFactory.*; IndexResponse response = client.prepareIndex("movies", "movie", "1") .setSource(jsonBuilder()

    .startObject() .field("title", "The Godfather") .field("director", "Francis Ford Coppola") .field("year", 1972) .startArray("genres") .value("Crime").value("Drama") .endArray() .endObject() ).get();
  12. @Document(indexName = "movies", type = "movie") public class Movie {

    @Id private String id; private String title; @Field(type = String, index = analyzed, store = true) private String director; private Integer year; @Field(type = FieldType.Nested) private List<String> genres; // Standard getters and setters }
  13. Movie movie = new Movie("1"); movie.setTitle("The Godfather"); movie.setDirector("Francis Ford Coppola");

    movie.setYear(1972); movie.setGenres(Arrays.asList("Crime", "Drama")); movieService.save(movie);
  14. public abstract class EmployeeEntity { protected String name; } public

    class ManagerEntity extends EmployeeEntity { private Boolean approveFunds; } public class WorkerEntity extends EmployeeEntity { private Integer yearsExperience; }
  15. public abstract class EmployeeEntity { @Id protected String id; protected

    String name; } @Document(indexName = "employees", type = "manager") public class ManagerEntity extends EmployeeEntity { private Boolean approveFunds; } @Document(indexName = "employees", type = "worker") public class WorkerEntity extends EmployeeEntity { private Integer yearsExperience; }