Solr et recherche dans un site ecommerce

Adrien Brault

April 07, 2014

  1. Standalone enterprise search server • Projet Apache (Comme Hadoop/CouchDB/Subversion) •

    Fonctions de recherche text avancées • Optimisé pour sites à fort traffic • API “REST” au format XML/JSON/CSV • Administration web • Replication et Sharding (SolrCloud) • “Near Real-time indexing”
  2. Pourquoi pas MySQL/ PostgreSQL ? • Performances • Facets •

    Fonctionnalités FULL Text • Support entreprise pour de la recherche
  3. Solr vs ElasticSearch • Tous les deux basés sur Lucene

    • ElasticSearch est plus récent et mieux pensé / plus “developer friendly” • Solr peut se resumer a “une interface HTTP simple a Lucene” • Avec Solr tout ne fonctionne pas avec le Sharding • solr-vs-elasticsearch.com
  4. Différents workers • Inventory • Beaucoup de message … doit

    être rapide • Popularity • Donnée calculée, donc chargée périodiquement • Main • Il faut bien quelque chose pour le reste!
  5. # Get latest “Download” link from # https://lucene.apache.org/solr 

    http://mir2.ovh.net/ ftp.apache.org/dist/lucene/solr/ 4.7.1/solr-4.7.1.tgz 
 tar -xf solr-4.7.1.tgz
 cd solr-4.7.1/example 
 java -jar start.jar
  6. <fields>
 <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
 <field name="name"

    type="text_general" indexed="true" stored="true"/>
 <field name="color" type="string" indexed="true" stored="true"/>
 <field name="price" type="float" indexed="true" stored="true"/>
 <field name="popularity" type="int" indexed="true" stored="true"/>
 <field name="inStock" type="boolean" indexed="true" stored=“true"/> <field name="isFeatured" type="boolean" indexed="true" stored="true"/>
 <dynamicField name=“t_*” type="text_general" indexed="true" stored="false"/>
 <copyField source="color" dest="t_color"/> schema.xml
  7. <types>
 <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
 <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldType name="text_general" 
 <analyzer type="index">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory"
 <filter class="solr.LowerCaseFilterFactory"/>
 <analyzer type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory"
 <filter class="solr.SynonymFilterFactory"
 synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 </types> schema.xml
  8. <updateHandler class="solr.DirectUpdateHandler2">

 <requestHandler name="/replication" class="solr.ReplicationHandler">
 <lst name="master">
 <str name="enable">${enable.master:false}</str>
 <str name="replicateAfter">commit</str>
 <str name="replicateAfter">startup</str>
 <str name="confFiles">solrconfig.xml,schema.xml</str>
 <lst name="slave">
 <str name="enable">${enable.slave:false}</str>
 <str name="masterUrl">${solr.slave.masterUrl:}</str>
 <str name="pollInterval">00:00:30</str>
 </requestHandler> solrconfig.xml
  9. <requestHandler name="/select" class="solr.SearchHandler">
 <lst name="defaults">
 <int name="rows">10</int>
 <str name="defType">edismax</str>

    <str name="qf">name t_category^50 t_color^40 t_size</str>
 <str name="bf">sum(sqrt(popularity),if(isFeatured,5,0))</str>
 <str name="tie">0.1</str>
 <str name="mm">100%</str>
 <str name="spellcheck">true</str>
 <str name="spellcheck.count">10</str>
 <arr name="last-components">
 </requestHandler> solrconfig.xml
  10. solarium/solarium • API complete pour interagir avec solr • QueryBuilder

    • Batch Updates • http://wiki.solarium-project.org/index.php/ V3:Basic_usage
  11. internations/solr-query-component use InterNations\Component\Solr\Expression \ExpressionBuilder;
 $eb = new ExpressionBuilder();

 $eb->eq('John Doe'),
 100 )
 // name:"John Doe"^100 • https://github.com/InterNations/SolrQueryComponent
  12. hautelook/solarium-cache $client = ...;
 $cache = new RedisCache(); // or

    any doctrine cache adapter
 $plugin = new CachePlugin();
 $client->registerPlugin('cache', $plugin);
 $query = $client->createSelect(array(
 'cache_lifetime' => 60,
 $result = $client->execute($query); • Bundle inclus
  13. floriansemm/solr-bundle • Principe similaire a friendsofsymfony/elastica- bundle • Trop simple

    pour notre cas cependant • synchrone • indexation trop simple
  14. Objectifs de l’architecture • IndexerBundle • Asynchrone • ETL facilement

    testable • ClientBundle • Utilisation web et api possible • API “publique” découplée de Solr/Solarium
 namespace Acme\Search\IndexerBundle\ETL\Inventory;
 class Extractor
 public function getSkuQuantity($sku)

 $qb = $this->createQB();
 ->andWhere('sku.sku = :sku')
 ->setParameter('sku', $sku)
 return $qb->execute()->fetch();
 public function getProductSkusQuantities($productId)
 $qb = $this->createQB();
 ->andWhere('sku.product = :product_id')
 ->setParameter('product_id', $productId)
 return $qb->execute()->fetchAll();
 private function createQB() {}
  16. namespace Acme\Search\IndexerBundle\ETL\Inventory;
 class Transformer
 public function updateDocument(

 ) {
 $document->setSku($sku); // PK
  17. namespace Acme\Search\IndexerBundle\ETL;
 class Loader
 public function updateSkuQuantity($sku)

    return $this->updateQuantities(
 public function updateProductSkusQuantities($productId)
 return $this->updateQuantities(
 private function updateQuantities(array $extractorResults)
 $documents = array();
 foreach ($extractorResults as $extractorResult) {
 $documents[] = $document = new Document();
 $sku = $extractorResult['sku'];
 $quantity = $extractorResult['quantity'];
 $this->itemTransformer->updateDocument($document, $sku, $quantity);
 return $this->loaderClient->sendDocuments($documents);
 namespace Acme\Search\IndexerBundle\ETL;
 class LoaderClient
 private $solarium;

 public function sendDocuments(array $documents)
 $update = $this->solarium->createUpdate();
 return $this->sendUpdate($update);
 private function sendUpdate(Query $update)
 if ($this->commitOnUpdate) {
 $result = $this->solarium->update($update);
 if ($result->getStatus() !== 0) {
 throw new UpdateErrorException($result);
 return $result;
  19. Filtre namespace Acme\Search\ClientBundle\Filter; 
 class Filter
 private $query;

 private $sizes;
 private $page;
 private $limit;
 public function setQuery($query = null) {}
 public function getQuery() {}
 public function setColors(array $colors) {}
 public function getColors() {}
 // ...
  20. Abstraire le filtre namespace Acme\Search\ClientBundle\Filter;
 class Filter

 const SORT_RELEVANCY = 'relevancy';
 const SORT_MOST_POPULAR = 'most_popular';
 const SORT_PRICE_DESC = 'price_desc';
 private $sort = self::SORT_MOST_POPULAR;
 public function getSort() {}
 public function setSort($sort) {}
  21. Client pour le Catalogue namespace Acme\Search\ClientBundle\Catalog; " class Client

 * @param Filter $filter
 * @return Result
 public function search(Filter $filter)
 $query = $this->queryFactory->createQuery($filter);
 $result = $this->client->execute($query);
 return $this->resultBuilder->create($result);
  22. QueryFactory namespace Acme\Search\ClientBundle\Solarium\Select;
 class QueryFactory
 public function createCatalogQuery(Filter

 $query = new CatalogQuery();
 $query->setQuery((new ExpressionBuilder())->all());
 if ($filter->hasFullTextQuery()) {
 return $query;
  23. Query namespace Acme\Search\ClientBundle\Solarium\Select;
 class CatalogQuery extends Query

    function init()
 $this->getExpressionBuilder()->btwnRange(0, null)
  24. namespace Acme\Search\ClientBundle\Solarium\Select;
 class CatalogQuery extends Query
 public function

    addFilterColorFilterQuery(Filter $filter)
 public function addFieldTermsFilterQuery($field, $terms)
 if (count($terms) === 0) {
  25. Resultat 
 namespace Acme\Search\ClientBundle\Catalog;
 class Result implements \IteratorAggregate

    /** @var Pagerfanta */
 private $pager;
 /** @var Product[] */
 private $products;
 /** @var FieldCount[] */
 private $colors;
 /** @var FieldCount[] */
 private $sizes;
 /** @var null|string */
 private $suggestedSpelling;
  26. public function indexAction(Request $request)
 $filter = new Filter();

    $bindingForm = $this->get('form.factory')->createNamed(
 $bindingForm->submit($request->query->all(), false);
 $catalogClient = $this->get('acme_search_client.catalog.client');
 $result = $catalogClient->search($filter);
 $viewForm = $this->get('form.factory')->createNamed(
 'method' => 'GET',
 'catalog_result' => $result,
 return array(
 'filter' => $filter,
 'form' => $viewForm->createView(),
 'result' => $result,
  27. /**
 * @Method("GET")
 * @Route("/search")
 public function searchAction(Request $request)

 $filter = new Filter();
 $bindingForm = $this->get('form.factory')->createNamed(
 $bindingForm->submit($request->query->all(), false);
 $catalogClient = $this->get('acme_search_client.catalog.client');
 $result = $catalogClient->search($filter);
 // Hateoas goodness!
 $catalogRepresentation = new CatalogRepresentation($result, $filter);
 return View::create($catalogRepresentation);
  28. namespace Acme\AppBundle\Form\Type;
 class FilterBindingType extends AbstractType
 public function

    buildForm(FormBuilderInterface $builder, array $options)
 ->add('query', 'hidden')
 ->add('sizes', 'collection', ['allow_add' => true])
 ->add('colors', 'collection', ['allow_add' => true])
 ->add('sort', 'hidden')
 ->add('page', 'hidden')
 ->add('limit', 'hidden')
 function (FormEvent $event) {
 $submittedData = $event->getData();
 if (isset($submittedData['query'])
 && !isset($submittedData['sort'])
 ) {
 $submittedData['sort'] = Filter::SORT_RELEVANCY;
  29. namespace Acme\AppBundle\Form\Type;
 class FilterType extends AbstractType
 public function

 FormBuilderInterface $builder,
 array $options
 ) {
 public function setDefaultOptions(OptionsResolverInterface $resolver)
 'data_class' => Filter::CLASS,
 'csrf_protection' => false,
 'catalog_result' => Result::CLASS,
  30. public function buildForm(
 FormBuilderInterface $builder,
 array $options
 ) {

 'choice_list' => new FieldCountChoiceList(
 'required' => false,
 'choice_list' => new FieldCountChoiceList(
 'required' => false,
 'choices' => self::getSortChoices(),
  31. use Solarium\QueryType\Select\Query\Component\EdisMax; " class EDisMaxType extends AbstractType
 public function

    buildForm(FormBuilderInterface $builder, array $options)
 $options = ['required' => false];
 ->add('queryFields', 'textarea', $options)
 ->add('queryAlternative', 'text', $options)
 ->add('minimumMatch', 'text', $options)
 ->add('phraseFields', 'text', $options)
 ->add('phraseSlop', 'text', $options)
 ->add('queryPhraseSlop', 'text', $options)
 ->add('tie', 'text', $options)
 ->add('boostQuery', 'text', $options)
 ->add('boostFunctions', 'text', $options)
 ->add('boostFunctionsMult', 'text', $options)
 ->add('phraseBigramFields', 'text', $options)
 ->add('phraseBigramSlop', 'text', $options)
 ->add('phraseTrigramFields', 'text', $options)
 ->add('phraseTrigramSlop', 'text', $options)
 ->add('userFields', 'text', $options)
 public function setDefaultOptions(OptionsResolverInterface $resolver)
 'data_class' => EDisMax::CLASS,
  32. DebugController 
 $query1 = $catalogClient->createQuery($filter);
 $query2 = $catalogClient->createQuery($filter);

    = $this->get('form.factory')
 'left' => $query1->getEDisMax(),
 'right' => $query2->getEDisMax(),
 'filter' => $filter,
 ['method' => 'GET']
 ->add('query', 'text', ['property_path' => '[filter].query'])
 ->add('left', 'edismax')
 ->add('right', 'edismax')
 $debugForm->submit($request->query->all(), false);
 $catalogResult1 = $catalogClient->createSearchResults($filter, $query1);
 $catalogResult2 = $catalogClient->createSearchResults($filter, $query2);
  33. Suggestions • Suggester suggèste des noms … et non des

    recherches • Nous envisageons de créer un second core pour les suggestions
  34. Performance “killers” • group=true / group.ngroups=true / group.facet=true • Form

    choice avec 2000 choix … -> AJAX • AutoCommit trop frequent • Trop de slaves • Ne pas utiliser solr cloud -> hard commit obligatoire pour la replication