Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Solr et recherche dans un site ecommerce

Solr et recherche dans un site ecommerce

Adrien Brault

April 07, 2014
Tweet

More Decks by Adrien Brault

Other Decks in Programming

Transcript

  1. Standalone enterprise search server • Projet Apache (Comme Hadoop/CouchDB/Subversion) •

    Fonctions de recherche text avancées • Optimisé pour sites à fort traffic • API “REST” au format XML/JSON/CSV • Administration web • Replication et Sharding (SolrCloud) • “Near Real-time indexing”
  2. Pourquoi pas MySQL/ PostgreSQL ? • Performances • Facets •

    Fonctionnalités FULL Text • Support entreprise pour de la recherche
  3. Solr vs ElasticSearch • Tous les deux basés sur Lucene

    • ElasticSearch est plus récent et mieux pensé / plus “developer friendly” • Solr peut se resumer a “une interface HTTP simple a Lucene” • Avec Solr tout ne fonctionne pas avec le Sharding • solr-vs-elasticsearch.com
  4. Différents workers • Inventory • Beaucoup de message … doit

    être rapide • Popularity • Donnée calculée, donc chargée périodiquement • Main • Il faut bien quelque chose pour le reste!
  5. # Get latest “Download” link from # https://lucene.apache.org/solr 
 wget

    http://mir2.ovh.net/ ftp.apache.org/dist/lucene/solr/ 4.7.1/solr-4.7.1.tgz 
 tar -xf solr-4.7.1.tgz
 cd solr-4.7.1/example 
 java -jar start.jar
  6. <fields>
 <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
 
 <field name="name"

    type="text_general" indexed="true" stored="true"/>
 <field name="color" type="string" indexed="true" stored="true"/>
 
 <field name="price" type="float" indexed="true" stored="true"/>
 <field name="popularity" type="int" indexed="true" stored="true"/>
 <field name="inStock" type="boolean" indexed="true" stored=“true"/> <field name="isFeatured" type="boolean" indexed="true" stored="true"/>
 
 <dynamicField name=“t_*” type="text_general" indexed="true" stored="false"/>
 </fields>
 
 <uniqueKey>sku</uniqueKey>
 
 <copyField source="color" dest="t_color"/> schema.xml
  7. <types>
 <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
 <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
 


    <fieldType name="text_general" 
 lass="solr.TextField"
 positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory"
 ignoreCase="true"
 words="stopwords.txt"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer> 
 <analyzer type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory"
 ignoreCase="true"
 words="stopwords.txt"/>
 <filter class="solr.SynonymFilterFactory"
 synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
 </fieldType>
 </types> schema.xml
  8. <updateHandler class="solr.DirectUpdateHandler2">
 <autoCommit>
 <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
 <maxDocs>${solr.autoCommit.maxDocs:100000}</maxDocs>
 <openSearcher>true</openSearcher>
 </autoCommit>
 
 <autoSoftCommit>
 <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>


    </autoSoftCommit>
 </updateHandler>
 
 <requestHandler name="/replication" class="solr.ReplicationHandler">
 <lst name="master">
 <str name="enable">${enable.master:false}</str>
 <str name="replicateAfter">commit</str>
 <str name="replicateAfter">startup</str>
 <str name="confFiles">solrconfig.xml,schema.xml</str>
 </lst>
 <lst name="slave">
 <str name="enable">${enable.slave:false}</str>
 <str name="masterUrl">${solr.slave.masterUrl:}</str>
 <str name="pollInterval">00:00:30</str>
 </lst>
 </requestHandler> solrconfig.xml
  9. <requestHandler name="/select" class="solr.SearchHandler">
 <lst name="defaults">
 <int name="rows">10</int>
 
 <str name="defType">edismax</str>


    <str name="qf">name t_category^50 t_color^40 t_size</str>
 <str name="bf">sum(sqrt(popularity),if(isFeatured,5,0))</str>
 <str name="tie">0.1</str>
 <str name="mm">100%</str>
 
 <str name="spellcheck">true</str>
 ...
 <str name="spellcheck.count">10</str>
 </lst>
 
 <arr name="last-components">
 <str>spellcheck</str>
 </arr>
 </requestHandler> solrconfig.xml
  10. solarium/solarium • API complete pour interagir avec solr • QueryBuilder

    • Batch Updates • http://wiki.solarium-project.org/index.php/ V3:Basic_usage
  11. internations/solr-query-component use InterNations\Component\Solr\Expression \ExpressionBuilder;
 
 $eb = new ExpressionBuilder();
 echo

    $eb->field(
 'name',
 $eb->boost(
 $eb->eq('John Doe'),
 100 )
 );
 
 // name:"John Doe"^100 • https://github.com/InterNations/SolrQueryComponent
  12. hautelook/solarium-cache $client = ...;
 $cache = new RedisCache(); // or

    any doctrine cache adapter
 
 $plugin = new CachePlugin();
 $plugin->setCache($cache);
 $client->registerPlugin('cache', $plugin);
 
 $query = $client->createSelect(array(
 'cache_lifetime' => 60,
 ));
 $result = $client->execute($query); • Bundle inclus
  13. floriansemm/solr-bundle • Principe similaire a friendsofsymfony/elastica- bundle • Trop simple

    pour notre cas cependant • synchrone • indexation trop simple
  14. Objectifs de l’architecture • IndexerBundle • Asynchrone • ETL facilement

    testable • ClientBundle • Utilisation web et api possible • API “publique” découplée de Solr/Solarium
  15. 
 namespace Acme\Search\IndexerBundle\ETL\Inventory;
 
 class Extractor
 {
 public function getSkuQuantity($sku)


    {
 $qb = $this->createQB();
 
 $qb
 ->andWhere('sku.sku = :sku')
 ->setParameter('sku', $sku)
 ;
 
 return $qb->execute()->fetch();
 }
 
 public function getProductSkusQuantities($productId)
 {
 $qb = $this->createQB();
 
 $qb
 ->andWhere('sku.product = :product_id')
 ->setParameter('product_id', $productId)
 ;
 
 return $qb->execute()->fetchAll();
 }
 
 private function createQB() {}
 }
  16. namespace Acme\Search\IndexerBundle\ETL\Inventory;
 
 class Transformer
 {
 public function updateDocument(
 Document

    $document,
 $sku,
 $quantity
 ) {
 $document->setSku($sku); // PK
 $document->setQuantity($quantity);
 }
 }
  17. namespace Acme\Search\IndexerBundle\ETL;
 
 class Loader
 {
 public function updateSkuQuantity($sku)
 {


    return $this->updateQuantities(
 array($this->inventoryExtractor->getSkuQuantity($sku))
 );
 }
 
 public function updateProductSkusQuantities($productId)
 {
 return $this->updateQuantities(
 $this->inventoryExtractor->getProductSkusQuantities($productId)
 );
 }
 
 private function updateQuantities(array $extractorResults)
 {
 $documents = array();
 foreach ($extractorResults as $extractorResult) {
 $documents[] = $document = new Document();
 $sku = $extractorResult['sku'];
 $quantity = $extractorResult['quantity'];
 
 $this->itemTransformer->updateDocument($document, $sku, $quantity);
 }
 
 return $this->loaderClient->sendDocuments($documents);
 }
 }
  18. 
 namespace Acme\Search\IndexerBundle\ETL;
 
 class LoaderClient
 {
 private $solarium;
 private

    $commitOnUpdate;
 
 public function sendDocuments(array $documents)
 {
 $update = $this->solarium->createUpdate();
 $update->addDocuments($documents);
 
 return $this->sendUpdate($update);
 }
 
 private function sendUpdate(Query $update)
 {
 if ($this->commitOnUpdate) {
 $update->addCommit();
 }
 
 $result = $this->solarium->update($update);
 
 if ($result->getStatus() !== 0) {
 throw new UpdateErrorException($result);
 }
 
 return $result;
 }
 }
  19. Filtre namespace Acme\Search\ClientBundle\Filter; 
 class Filter
 {
 private $query;
 private

    $colors;
 private $sizes;
 private $page;
 private $limit;
 
 public function setQuery($query = null) {}
 public function getQuery() {}
 
 public function setColors(array $colors) {}
 public function getColors() {}
 
 // ...
 }
  20. Abstraire le filtre namespace Acme\Search\ClientBundle\Filter;
 
 class Filter
 {
 //

    ...
 
 const SORT_RELEVANCY = 'relevancy';
 const SORT_MOST_POPULAR = 'most_popular';
 const SORT_PRICE_DESC = 'price_desc';
 
 private $sort = self::SORT_MOST_POPULAR;
 
 public function getSort() {}
 public function setSort($sort) {}
 }
  21. Client pour le Catalogue namespace Acme\Search\ClientBundle\Catalog; " class Client
 {


    /**
 * @param Filter $filter
 * @return Result
 */
 public function search(Filter $filter)
 {
 $query = $this->queryFactory->createQuery($filter);
 $result = $this->client->execute($query);
 
 return $this->resultBuilder->create($result);
 }
 }
  22. QueryFactory namespace Acme\Search\ClientBundle\Solarium\Select;
 
 class QueryFactory
 {
 public function createCatalogQuery(Filter

    $filter)
 {
 $query = new CatalogQuery();
 $query->setQuery((new ExpressionBuilder())->all());
 
 if ($filter->hasFullTextQuery()) {
 $query->setQuery($filter->getQuery());
 }
 
 $query->addFilterColorFilterQuery($filter);
 $query->addFilterSizeFilterQuery($filter);
 
 return $query;
 }
 }
  23. Query namespace Acme\Search\ClientBundle\Solarium\Select;
 
 class CatalogQuery extends Query
 {
 protected

    function init()
 {
 parent::init();
 
 $this
 ->createFilterQuery(Document::FIELD_QUANTITY)
 ->setQuery(
 $this->expressionBuilder->field(
 Document::FIELD_QUANTITY,
 $this->getExpressionBuilder()->btwnRange(0, null)
 )
 )
 ;
 }
 }
  24. namespace Acme\Search\ClientBundle\Solarium\Select;
 
 class CatalogQuery extends Query
 {
 public function

    addFilterColorFilterQuery(Filter $filter)
 {
 $this->addFieldTermsFilterQuery(
 Document::FIELD_COLOR,
 $filter->getColors()
 );
 }
 
 public function addFieldTermsFilterQuery($field, $terms)
 {
 if (count($terms) === 0) {
 return;
 }
 
 $this
 ->createFilterQuery($field)
 ->setQuery(
 $this->getExpressionBuilder()->field(
 $field,
 $this->getExpressionBuilder()->grp($terms)
 )
 )
 ;
 }
 }
  25. Resultat 
 namespace Acme\Search\ClientBundle\Catalog;
 
 class Result implements \IteratorAggregate
 {


    /** @var Pagerfanta */
 private $pager;
 
 /** @var Product[] */
 private $products;
 
 /** @var FieldCount[] */
 private $colors;
 
 /** @var FieldCount[] */
 private $sizes;
 
 /** @var null|string */
 private $suggestedSpelling;
 }
  26. public function indexAction(Request $request)
 {
 $filter = new Filter();
 


    $bindingForm = $this->get('form.factory')->createNamed(
 '',
 'filter_binding',
 $filter
 );
 $bindingForm->submit($request->query->all(), false);
 
 $catalogClient = $this->get('acme_search_client.catalog.client');
 $result = $catalogClient->search($filter);
 
 $viewForm = $this->get('form.factory')->createNamed(
 '',
 'filter',
 $filter,
 [
 'method' => 'GET',
 'catalog_result' => $result,
 ]
 );
 
 return array(
 'filter' => $filter,
 'form' => $viewForm->createView(),
 'result' => $result,
 );
 }
  27. /**
 * @Method("GET")
 * @Route("/search")
 */
 public function searchAction(Request $request)


    {
 $filter = new Filter();
 
 $bindingForm = $this->get('form.factory')->createNamed(
 '',
 'filter_binding',
 $filter
 );
 $bindingForm->submit($request->query->all(), false);
 
 $catalogClient = $this->get('acme_search_client.catalog.client');
 $result = $catalogClient->search($filter);
 
 // Hateoas goodness!
 $catalogRepresentation = new CatalogRepresentation($result, $filter);
 
 return View::create($catalogRepresentation);
 }
  28. namespace Acme\AppBundle\Form\Type;
 
 class FilterBindingType extends AbstractType
 {
 public function

    buildForm(FormBuilderInterface $builder, array $options)
 {
 $builder
 ->add('query', 'hidden')
 ->add('sizes', 'collection', ['allow_add' => true])
 ->add('colors', 'collection', ['allow_add' => true])
 ->add('sort', 'hidden')
 ->add('page', 'hidden')
 ->add('limit', 'hidden')
 ;
 
 $builder->addEventListener(
 FormEvents::PRE_SUBMIT,
 function (FormEvent $event) {
 $submittedData = $event->getData();
 
 if (isset($submittedData['query'])
 && !isset($submittedData['sort'])
 ) {
 $submittedData['sort'] = Filter::SORT_RELEVANCY;
 }
 
 $event->setData($submittedData);
 }
 );
 }
 }
  29. namespace Acme\AppBundle\Form\Type;
 
 class FilterType extends AbstractType
 {
 public function

    buildForm(
 FormBuilderInterface $builder,
 array $options
 ) {
 
 }
 
 public function setDefaultOptions(OptionsResolverInterface $resolver)
 {
 $resolver->setDefaults([
 'data_class' => Filter::CLASS,
 'csrf_protection' => false,
 ]);
 $resolver->setRequired([
 'catalog_result'
 ]);
 $resolver->setAllowedTypes([
 'catalog_result' => Result::CLASS,
 ]);
 }
 }
  30. public function buildForm(
 FormBuilderInterface $builder,
 array $options
 ) {
 $builder


    ->add(
 'sizes',
 'choice',
 [
 'choice_list' => new FieldCountChoiceList(
 $options['catalog_result']->getSizes()
 ),
 'required' => false,
 ]
 )
 ->add(
 'colors',
 'choice',
 [
 'choice_list' => new FieldCountChoiceList(
 $options['catalog_result']->getColors()
 ),
 'required' => false,
 ]
 )
 ->add(
 'sort',
 'choice',
 [
 'choices' => self::getSortChoices(),
 ]
 )
 ;
 }
  31. use Solarium\QueryType\Select\Query\Component\EdisMax; " class EDisMaxType extends AbstractType
 {
 public function

    buildForm(FormBuilderInterface $builder, array $options)
 {
 $options = ['required' => false];
 $builder
 ->add('queryFields', 'textarea', $options)
 ->add('queryAlternative', 'text', $options)
 ->add('minimumMatch', 'text', $options)
 ->add('phraseFields', 'text', $options)
 ->add('phraseSlop', 'text', $options)
 ->add('queryPhraseSlop', 'text', $options)
 ->add('tie', 'text', $options)
 ->add('boostQuery', 'text', $options)
 ->add('boostFunctions', 'text', $options)
 ->add('boostFunctionsMult', 'text', $options)
 ->add('phraseBigramFields', 'text', $options)
 ->add('phraseBigramSlop', 'text', $options)
 ->add('phraseTrigramFields', 'text', $options)
 ->add('phraseTrigramSlop', 'text', $options)
 ->add('userFields', 'text', $options)
 ;
 }
 
 public function setDefaultOptions(OptionsResolverInterface $resolver)
 {
 $resolver->setDefaults([
 'data_class' => EDisMax::CLASS,
 ]);
 }
 }
  32. DebugController 
 $query1 = $catalogClient->createQuery($filter);
 $query2 = $catalogClient->createQuery($filter);
 
 $debugForm

    = $this->get('form.factory')
 ->createBuilder(
 'form',
 [
 'left' => $query1->getEDisMax(),
 'right' => $query2->getEDisMax(),
 'filter' => $filter,
 ],
 ['method' => 'GET']
 )
 ->add('query', 'text', ['property_path' => '[filter].query'])
 ->add('left', 'edismax')
 ->add('right', 'edismax')
 ->getForm()
 ;
 $debugForm->submit($request->query->all(), false);
 
 $catalogResult1 = $catalogClient->createSearchResults($filter, $query1);
 $catalogResult2 = $catalogClient->createSearchResults($filter, $query2);
  33. Suggestions • Suggester suggèste des noms … et non des

    recherches • Nous envisageons de créer un second core pour les suggestions
  34. Performance “killers” • group=true / group.ngroups=true / group.facet=true • Form

    choice avec 2000 choix … -> AJAX • AutoCommit trop frequent • Trop de slaves • Ne pas utiliser solr cloud -> hard commit obligatoire pour la replication