Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Solr et recherche dans un site ecommerce

Solr et recherche dans un site ecommerce

Adrien Brault

April 07, 2014
Tweet

More Decks by Adrien Brault

Other Decks in Programming

Transcript

  1. Solr et recherche dans
    un site ecommerce
    @AdrienBrault

    View full-size slide

  2. Recherche dans un
    site ecommerce

    View full-size slide

  3. Standalone enterprise
    search server
    • Projet Apache (Comme Hadoop/CouchDB/Subversion)
    • Fonctions de recherche text avancées
    • Optimisé pour sites à fort traffic
    • API “REST” au format XML/JSON/CSV
    • Administration web
    • Replication et Sharding (SolrCloud)
    • “Near Real-time indexing”

    View full-size slide

  4. Pourquoi pas MySQL/
    PostgreSQL ?
    • Performances
    • Facets
    • Fonctionnalités FULL Text
    • Support entreprise pour de la recherche

    View full-size slide

  5. Solr vs ElasticSearch
    • Tous les deux basés sur Lucene
    • ElasticSearch est plus récent et mieux pensé / plus
    “developer friendly”
    • Solr peut se resumer a “une interface HTTP simple
    a Lucene”
    • Avec Solr tout ne fonctionne pas avec le Sharding
    • solr-vs-elasticsearch.com

    View full-size slide

  6. Pourquoi pas ElasticSearch ?

    View full-size slide

  7. PRODUCT
    SKU
    SKU
    MySQL/
    PostgreSQL
    Solr/
    ElasticSearch
    1
    1..n
    Indexation/
    Dénormalisation

    View full-size slide

  8. SI pas de
    grouping
    =
    Mauvaise
    experience
    46
    42
    42 44 46
    40

    View full-size slide

  9. Grouping
    =
    Client
    heureux

    View full-size slide

  10. ES parent/child ?
    • Certaines requêtes impossibles
    • Mais probablement plus performant qu’un simple
    grouping

    View full-size slide

  11. Hautelook et Solr
    … qui guidera en partie la suite

    View full-size slide

  12. Type événement avec MySQL

    View full-size slide

  13. Type catalog … avec Solr

    View full-size slide

  14. Main Inventory Popularity
    CRON
    MySQL Triggers
    Déclencheur
    Workers/
    Indexation
    Solr
    DOC
    DOC
    DOC

    View full-size slide

  15. Différents workers
    • Inventory
    • Beaucoup de message … doit être rapide
    • Popularity
    • Donnée calculée, donc chargée périodiquement
    • Main
    • Il faut bien quelque chose pour le reste!

    View full-size slide

  16. La suite n’est pas un
    tutoriel …
    Mais ce qui vous ferait
    gagner du temps

    View full-size slide

  17. Installation

    View full-size slide

  18. # Get latest “Download” link from
    # https://lucene.apache.org/solr

    wget http://mir2.ovh.net/
    ftp.apache.org/dist/lucene/solr/
    4.7.1/solr-4.7.1.tgz

    tar -xf solr-4.7.1.tgz

    cd solr-4.7.1/example

    java -jar start.jar

    View full-size slide

  19. Exemples à supprimer

    View full-size slide

  20. Solr … est dans le solr.war

    View full-size slide

  21. $SOLR_HOME
    Core
    Ce que vous versionnerez avec git

    View full-size slide

  22. Fichiers de config important

    View full-size slide


  23. stored="true" omitNorms="true"/>










    stored="false"/>



    sku



    schema.xml

    View full-size slide





  24. lass="solr.TextField"

    positionIncrementGap="100">



    ignoreCase="true"

    words="stopwords.txt"/>






    ignoreCase="true"

    words="stopwords.txt"/>

    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>




    schema.xml

    View full-size slide



  25. ${solr.autoCommit.maxTime:15000}

    ${solr.autoCommit.maxDocs:100000}

    true




    ${solr.autoSoftCommit.maxTime:-1}






    ${enable.master:false}

    commit

    startup

    solrconfig.xml,schema.xml



    ${enable.slave:false}

    ${solr.slave.masterUrl:}

    00:00:30


    solrconfig.xml

    View full-size slide



  26. 10


    edismax

    name t_category^50 t_color^40 t_size

    sum(sqrt(popularity),if(isFeatured,5,0))

    0.1

    100%


    true

    ...

    10




    spellcheck



    solrconfig.xml

    View full-size slide

  27. Packages PHP pour
    solr

    View full-size slide

  28. solarium/solarium
    • API complete pour interagir avec solr
    • QueryBuilder
    • Batch Updates
    • http://wiki.solarium-project.org/index.php/
    V3:Basic_usage

    View full-size slide

  29. internations/solr-query-component
    use InterNations\Component\Solr\Expression
    \ExpressionBuilder;


    $eb = new ExpressionBuilder();

    echo $eb->field(

    'name',

    $eb->boost(

    $eb->eq('John Doe'),

    100
    )

    );


    // name:"John Doe"^100
    • https://github.com/InterNations/SolrQueryComponent

    View full-size slide

  30. nelmio/solarium-bundle

    View full-size slide

  31. hautelook/solarium-cache
    $client = ...;

    $cache = new RedisCache(); // or any doctrine cache adapter


    $plugin = new CachePlugin();

    $plugin->setCache($cache);

    $client->registerPlugin('cache', $plugin);


    $query = $client->createSelect(array(

    'cache_lifetime' => 60,

    ));

    $result = $client->execute($query);
    • Bundle inclus

    View full-size slide

  32. floriansemm/solr-bundle
    • Principe similaire a friendsofsymfony/elastica-
    bundle
    • Trop simple pour notre cas cependant
    • synchrone
    • indexation trop simple

    View full-size slide

  33. Implementation dans
    Symfony2

    View full-size slide

  34. Objectifs de l’architecture
    • IndexerBundle
    • Asynchrone
    • ETL facilement testable
    • ClientBundle
    • Utilisation web et api possible
    • API “publique” découplée de Solr/Solarium

    View full-size slide

  35. Acme\Search\IndexerBundle

    View full-size slide


  36. namespace Acme\Search\IndexerBundle\ETL\Inventory;


    class Extractor

    {

    public function getSkuQuantity($sku)

    {

    $qb = $this->createQB();


    $qb

    ->andWhere('sku.sku = :sku')

    ->setParameter('sku', $sku)

    ;


    return $qb->execute()->fetch();

    }


    public function getProductSkusQuantities($productId)

    {

    $qb = $this->createQB();


    $qb

    ->andWhere('sku.product = :product_id')

    ->setParameter('product_id', $productId)

    ;


    return $qb->execute()->fetchAll();

    }


    private function createQB() {}

    }

    View full-size slide

  37. namespace Acme\Search\IndexerBundle\ETL\Inventory;


    class Transformer

    {

    public function updateDocument(

    Document $document,

    $sku,

    $quantity

    ) {

    $document->setSku($sku); // PK

    $document->setQuantity($quantity);

    }

    }

    View full-size slide

  38. namespace Acme\Search\IndexerBundle\ETL;


    class Loader

    {

    public function updateSkuQuantity($sku)

    {

    return $this->updateQuantities(

    array($this->inventoryExtractor->getSkuQuantity($sku))

    );

    }


    public function updateProductSkusQuantities($productId)

    {

    return $this->updateQuantities(

    $this->inventoryExtractor->getProductSkusQuantities($productId)

    );

    }


    private function updateQuantities(array $extractorResults)

    {

    $documents = array();

    foreach ($extractorResults as $extractorResult) {

    $documents[] = $document = new Document();

    $sku = $extractorResult['sku'];

    $quantity = $extractorResult['quantity'];


    $this->itemTransformer->updateDocument($document, $sku, $quantity);

    }


    return $this->loaderClient->sendDocuments($documents);

    }

    }

    View full-size slide


  39. namespace Acme\Search\IndexerBundle\ETL;


    class LoaderClient

    {

    private $solarium;

    private $commitOnUpdate;


    public function sendDocuments(array $documents)

    {

    $update = $this->solarium->createUpdate();

    $update->addDocuments($documents);


    return $this->sendUpdate($update);

    }


    private function sendUpdate(Query $update)

    {

    if ($this->commitOnUpdate) {

    $update->addCommit();

    }


    $result = $this->solarium->update($update);


    if ($result->getStatus() !== 0) {

    throw new UpdateErrorException($result);

    }


    return $result;

    }

    }

    View full-size slide

  40. Vendor\Search\ClientBundle

    View full-size slide

  41. Filtre
    namespace Acme\Search\ClientBundle\Filter;

    class Filter

    {

    private $query;

    private $colors;

    private $sizes;

    private $page;

    private $limit;


    public function setQuery($query = null) {}

    public function getQuery() {}


    public function setColors(array $colors) {}

    public function getColors() {}


    // ...

    }

    View full-size slide

  42. Abstraire le filtre
    namespace Acme\Search\ClientBundle\Filter;


    class Filter

    {

    // ...


    const SORT_RELEVANCY = 'relevancy';

    const SORT_MOST_POPULAR = 'most_popular';

    const SORT_PRICE_DESC = 'price_desc';


    private $sort = self::SORT_MOST_POPULAR;


    public function getSort() {}

    public function setSort($sort) {}

    }

    View full-size slide

  43. Client pour le Catalogue
    namespace Acme\Search\ClientBundle\Catalog;
    "
    class Client

    {

    /**

    * @param Filter $filter

    * @return Result

    */

    public function search(Filter $filter)

    {

    $query = $this->queryFactory->createQuery($filter);

    $result = $this->client->execute($query);


    return $this->resultBuilder->create($result);

    }

    }

    View full-size slide

  44. QueryFactory
    namespace Acme\Search\ClientBundle\Solarium\Select;


    class QueryFactory

    {

    public function createCatalogQuery(Filter $filter)

    {

    $query = new CatalogQuery();

    $query->setQuery((new ExpressionBuilder())->all());


    if ($filter->hasFullTextQuery()) {

    $query->setQuery($filter->getQuery());

    }


    $query->addFilterColorFilterQuery($filter);

    $query->addFilterSizeFilterQuery($filter);


    return $query;

    }

    }

    View full-size slide

  45. Query
    namespace Acme\Search\ClientBundle\Solarium\Select;


    class CatalogQuery extends Query

    {

    protected function init()

    {

    parent::init();


    $this

    ->createFilterQuery(Document::FIELD_QUANTITY)

    ->setQuery(

    $this->expressionBuilder->field(

    Document::FIELD_QUANTITY,

    $this->getExpressionBuilder()->btwnRange(0, null)

    )

    )

    ;

    }

    }

    View full-size slide

  46. namespace Acme\Search\ClientBundle\Solarium\Select;


    class CatalogQuery extends Query

    {

    public function addFilterColorFilterQuery(Filter $filter)

    {

    $this->addFieldTermsFilterQuery(

    Document::FIELD_COLOR,

    $filter->getColors()

    );

    }


    public function addFieldTermsFilterQuery($field, $terms)

    {

    if (count($terms) === 0) {

    return;

    }


    $this

    ->createFilterQuery($field)

    ->setQuery(

    $this->getExpressionBuilder()->field(

    $field,

    $this->getExpressionBuilder()->grp($terms)

    )

    )

    ;

    }

    }

    View full-size slide

  47. Resultat

    namespace Acme\Search\ClientBundle\Catalog;


    class Result implements \IteratorAggregate

    {

    /** @var Pagerfanta */

    private $pager;


    /** @var Product[] */

    private $products;


    /** @var FieldCount[] */

    private $colors;


    /** @var FieldCount[] */

    private $sizes;


    /** @var null|string */

    private $suggestedSpelling;

    }

    View full-size slide

  48. Utilisation web + api

    View full-size slide

  49. public function indexAction(Request $request)

    {

    $filter = new Filter();


    $bindingForm = $this->get('form.factory')->createNamed(

    '',

    'filter_binding',

    $filter

    );

    $bindingForm->submit($request->query->all(), false);


    $catalogClient = $this->get('acme_search_client.catalog.client');

    $result = $catalogClient->search($filter);


    $viewForm = $this->get('form.factory')->createNamed(

    '',

    'filter',

    $filter,

    [

    'method' => 'GET',

    'catalog_result' => $result,

    ]

    );


    return array(

    'filter' => $filter,

    'form' => $viewForm->createView(),

    'result' => $result,

    );

    }

    View full-size slide

  50. /**

    * @Method("GET")

    * @Route("/search")

    */

    public function searchAction(Request $request)

    {

    $filter = new Filter();


    $bindingForm = $this->get('form.factory')->createNamed(

    '',

    'filter_binding',

    $filter

    );

    $bindingForm->submit($request->query->all(), false);


    $catalogClient = $this->get('acme_search_client.catalog.client');

    $result = $catalogClient->search($filter);


    // Hateoas goodness!

    $catalogRepresentation = new CatalogRepresentation($result, $filter);


    return View::create($catalogRepresentation);

    }

    View full-size slide

  51. namespace Acme\AppBundle\Form\Type;


    class FilterBindingType extends AbstractType

    {

    public function buildForm(FormBuilderInterface $builder, array $options)

    {

    $builder

    ->add('query', 'hidden')

    ->add('sizes', 'collection', ['allow_add' => true])

    ->add('colors', 'collection', ['allow_add' => true])

    ->add('sort', 'hidden')

    ->add('page', 'hidden')

    ->add('limit', 'hidden')

    ;


    $builder->addEventListener(

    FormEvents::PRE_SUBMIT,

    function (FormEvent $event) {

    $submittedData = $event->getData();


    if (isset($submittedData['query'])

    && !isset($submittedData['sort'])

    ) {

    $submittedData['sort'] = Filter::SORT_RELEVANCY;

    }


    $event->setData($submittedData);

    }

    );

    }

    }

    View full-size slide

  52. namespace Acme\AppBundle\Form\Type;


    class FilterType extends AbstractType

    {

    public function buildForm(

    FormBuilderInterface $builder,

    array $options

    ) {


    }


    public function setDefaultOptions(OptionsResolverInterface $resolver)

    {

    $resolver->setDefaults([

    'data_class' => Filter::CLASS,

    'csrf_protection' => false,

    ]);

    $resolver->setRequired([

    'catalog_result'

    ]);

    $resolver->setAllowedTypes([

    'catalog_result' => Result::CLASS,

    ]);

    }

    }

    View full-size slide

  53. public function buildForm(

    FormBuilderInterface $builder,

    array $options

    ) {

    $builder

    ->add(

    'sizes',

    'choice',

    [

    'choice_list' => new FieldCountChoiceList(

    $options['catalog_result']->getSizes()

    ),

    'required' => false,

    ]

    )

    ->add(

    'colors',

    'choice',

    [

    'choice_list' => new FieldCountChoiceList(

    $options['catalog_result']->getColors()

    ),

    'required' => false,

    ]

    )

    ->add(

    'sort',

    'choice',

    [

    'choices' => self::getSortChoices(),

    ]

    )

    ;

    }

    View full-size slide

  54. Travailler sur la
    pertinence

    View full-size slide

  55. use Solarium\QueryType\Select\Query\Component\EdisMax;
    "
    class EDisMaxType extends AbstractType

    {

    public function buildForm(FormBuilderInterface $builder, array $options)

    {

    $options = ['required' => false];

    $builder

    ->add('queryFields', 'textarea', $options)

    ->add('queryAlternative', 'text', $options)

    ->add('minimumMatch', 'text', $options)

    ->add('phraseFields', 'text', $options)

    ->add('phraseSlop', 'text', $options)

    ->add('queryPhraseSlop', 'text', $options)

    ->add('tie', 'text', $options)

    ->add('boostQuery', 'text', $options)

    ->add('boostFunctions', 'text', $options)

    ->add('boostFunctionsMult', 'text', $options)

    ->add('phraseBigramFields', 'text', $options)

    ->add('phraseBigramSlop', 'text', $options)

    ->add('phraseTrigramFields', 'text', $options)

    ->add('phraseTrigramSlop', 'text', $options)

    ->add('userFields', 'text', $options)

    ;

    }


    public function setDefaultOptions(OptionsResolverInterface $resolver)

    {

    $resolver->setDefaults([

    'data_class' => EDisMax::CLASS,

    ]);

    }

    }

    View full-size slide

  56. DebugController

    $query1 = $catalogClient->createQuery($filter);

    $query2 = $catalogClient->createQuery($filter);


    $debugForm = $this->get('form.factory')

    ->createBuilder(

    'form',

    [

    'left' => $query1->getEDisMax(),

    'right' => $query2->getEDisMax(),

    'filter' => $filter,

    ],

    ['method' => 'GET']

    )

    ->add('query', 'text', ['property_path' => '[filter].query'])

    ->add('left', 'edismax')

    ->add('right', 'edismax')

    ->getForm()

    ;

    $debugForm->submit($request->query->all(), false);


    $catalogResult1 = $catalogClient->createSearchResults($filter, $query1);

    $catalogResult2 = $catalogClient->createSearchResults($filter, $query2);

    View full-size slide

  57. Retours d’experience

    View full-size slide

  58. Suggestions
    • Suggester suggèste des noms … et non des
    recherches
    • Nous envisageons de créer un second core pour
    les suggestions

    View full-size slide

  59. Performances

    View full-size slide

  60. Performance “killers”
    • group=true / group.ngroups=true /
    group.facet=true
    • Form choice avec 2000 choix … -> AJAX
    • AutoCommit trop frequent
    • Trop de slaves
    • Ne pas utiliser solr cloud -> hard commit
    obligatoire pour la replication

    View full-size slide

  61. Questions ?
    • Merci!
    • speakerdeck.com/adrienbrault/solr-et-recherche-
    dans-un-site-ecommerce
    • @AdrienBrault sur Twitter/Github

    View full-size slide