Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2018-10-16 Elasticsearch in Softonic

Softonic
October 17, 2018

2018-10-16 Elasticsearch in Softonic

How do we use in Softonic the Elastic stack:
- Logs processing
- General purpose database for real-time usage + search
- Percolated searches

Originally for https://www.meetup.com/Barcelona-Elastic-Fantastics/events/254984861/

Video of the talk (in spanish): https://www.elastic.co/videos/caso-de-uso-de-softonic-migraci-n-de-un-monolito-a-microservicios-con-elasticsearch-como-fuente-central-de-datos-spanish

Softonic

October 17, 2018
Tweet

More Decks by Softonic

Other Decks in Technology

Transcript

  1. Who are we? • 20 years old Internet property •

    Mainly a download portal • Translated to many languages: EN, DE, ES, IT... => 16! (includes exotic languages like Vietnamese)
  2. Who are we? • 20 years old Internet property •

    Mainly a download portal • Translated to many languages: EN, DE, ES, IT... => 16! (includes exotic languages like Vietnamese) • 4M daily visits, 12M daily page views • 10K docs/s written in logs in peak time on our services
  3. Who are we? Basilio Vera (@basi) • Senior Principal Software

    Engineer • 15 years working in Softonic Riccardo Piccoli (No twitter) • Senior Software Engineer • Elasticsearch Engineer II
  4. Softonic use cases A. Elastic Stack (aka ELK) used for

    logs processing B. As runtime database used by Softonic Web C. Percolator feature for auto generating internal linking
  5. Logs processing • Typical use case • Using this stack

    since 5 years ago • Evolved during time but essentially it works the same way ◦ Elasticsearch ◦ Kibana ◦ Logstash ◦ Logstash-forwarder
  6. Logs processing • Typical use case • Using this stack

    since 5 years ago • Evolved during time but essentially it works the same way ◦ Elasticsearch ◦ Kibana ◦ Logstash ◦ Logstash-forwarder -> Filebeats
  7. Logs processing NAMING CONVENTION AND DATA STRUCTURE DEFINED FOR ACCESS

    LOGS AND REQUEST LOGS: • Using different server technologies like nodejs, apache httpd, nginx. • Access logs generated in JSON format when possible • Some libraries for nodejs available: - https://github.com/softonic/hapi-error-logger - https://github.com/softonic/hapi-access-logger - https://github.com/softonic/axios-logger • One sample docker image for nginx available: - https://hub.docker.com/r/ricc/nginx-jsonlog
  8. Logs processing Follow this convention allows us to create cross-service

    kibana dashboards. https://kibana.my-site.com
  9. Logs processing HOW WE DEPLOYED IT We deploy the clusters

    using an slightly modified Helm Chart for Elasticsearch: • https://github.com/softonic/charts/tree/feature/elasticsearch-multiple-data-node-types • Waiting this PR to be accepted: https://github.com/helm/charts/pull/7819 • It just adds the feature of deploy multiple data node types • We use this feature for deploy nodes with hot/warm data ⬆Capacity ⬆Data retention ⬇$$$
  10. Logs processing TECHNICAL DETAILS ABOUT OUR LOGS ARCHITECTURE • We

    have 3 different regions where logs are produced • On each region we have a local Elasticsearch Cluster • The data is processed via Logstash • The data is sent to Logstash via Filebeats • Filebeats just collects data from containers running in Kubernetes • We have a specific cluster in Europe configured for cross cluster search
  11. Runtime database A. Legacy softonic architecture: monolith B. Moving to

    SOA: Problems with SOA C. Distributed “Materialized view”
  12. Initial product design • We have an initial product that

    grows a lot • We follow best practices like MVC • We have many independent teams working on this code base • We have many independent features in this code base
  13. B. SOA: Service oriented architecture PRO • Ownership of data

    • Best tool for the best job • Development speed • Independent deploys
  14. B. SOA: Service oriented architecture • Ownership of data •

    Best tool for the best job • Development speed • Independent deploys PRO • Public API is a contract • Slow communication • Complex dependencies CONS
  15. B. Orchestration Layer • Faster average response time PRO •

    Non cached responses still very slow • Staleness of cached responses • Complex runtime dependencies CONS
  16. C. Materialized view: maintaining cache updated • Reduced number of

    runtime dependencies • Faster non-cached responses PRO • Failure of one component (projection) means stale data for everything • Somewhat more complex logic to build state CONS (CQRS-like projection)
  17. Implementation • Notified services write operations • Persisting all events,

    ordered by operation time (not ingest time) • Partial state persisted locally • Final state in Elasticsearch, through bulk upsert operations • Initial state needs “creation”
  18. Why elasticsearch • Highly available distributed system (CP**) • Fast

    key-value store (GET /app/123 - for example for program page) • Powerful search engine (GET /app/search) • Relatively easy to scale up • Supports upsert operation (key for building state) ** some people might disagree on this
  19. Elasticsearch Deployment • Deployed in k8s • 4 shards total

    (1 shard, 3 replicas) for high read throughput • Separate master and coordinating nodes
  20. Percolator Documents and query [ { "_source": { "id": "87911e582397a1f5d545869a4cf18a1f815df675",

    "query": { "match": { "message": { "query": "m4a to mp3", "operator": "and" } } }, "meta": { "keyword": "m4a to mp3", "url": "https://free-m4a-to-mp3-converter.de.softonic.com", "volume": 20000 } } }, { "_source": { "id": "87911e582397a1f5d545869a4cf18a1f815df675", "query": { "match": { "message": { "query": "windows 10 download kostenlos", "operator": "and" } } }, "meta": { "keyword": "windows 10 download kostenlos", "url": "https://windows-10.de.softonic.com", "volume": 26000 } } } ] GET internal-links-en/_search { "query": { "percolate" : { "field" : "query", "document" : { "message" : "Skype is the most popular application on the market for making video calls..." } } } }