Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clustering your e-commerce products (in Solr)

Clustering your e-commerce products (in Solr)

Clustering & learnings from e-commerce products

Tobias Kässmann

November 19, 2014
Tweet

More Decks by Tobias Kässmann

Other Decks in Programming

Transcript

  1. AGENDA • why do i talk about… • clustering ➡

    the lower level ➡ the higher level
  2. what i’ve done: use hadoop / mahout: • data: s24

    product text-data 
 (title & description of products) • preprocessing • vectorization • clustering • visualization & evaluation
  3. 0 B B B B B B @ Samsung WiFi

    Nokia Apple Tablet ... 1 C C C C C C A = > 0 B B B B B B @ 1 1 0 0 1 ... 1 C C C C C C A Text to Vector Samsung Multimedia-Tablet Galaxy Note 8 (GT-N5100) 3G + Wi-Fi
  4. visualize raw vectors transform them to 2D/3D by using Principal

    Component Analysis Singular Value Decomposition …
  5. learning 4: create a automated workflow • use your lucene

    index as direct input for hadoop/mahout • filter the results • automate your evaluation (should i change the params?) • monitoring • create visualizations
  6. Poc with: • Scikit learn ————
 • Hadoop/Mahout • Spark

    ® https://www.flickr.com/photos/lox/9408028555
  7. integration search results (highlighted or full) search results (highlighted or

    full) search results and clusters source: http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html
  8. usage • adhoc analysis • take a look at „other

    topics“: are these outliers? • attribute extraction • realtime autocomplete • dynamic filter
  9. learning 2: (carrot) clustering in realtime is slow! (with a

    large result set) Lingo: 300 products ~ 3 sec 100 products ~ 150 ms STC: 100 products ~15 ms