Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handling a tremendous amount of images with Fastly / Yamagoya Traverse 2020

Handling a tremendous amount of images with Fastly / Yamagoya Traverse 2020

Yamagoya Traverse 2020 DAY-2の登壇資料になります。

https://www.fastly.jp/yamagoya2020

5d74d743eabd2bf7d4d2f68b9d3c727d?s=128

Tatsuhiko Kubo

November 26, 2020
Tweet

Transcript

  1. Handling a tremendous amount of images with Fastly Tatsuhiko Kubo@cubicdaiya

    Yamagoya Traverse 2020 2020/11/26
  2. None
  3. What Is Mercari? ! Service start: July 2013 ! OS:

    Android, iOS *Can also be accessed by web browsers ! Usage fee: Free *Commission fee for sold items: 10% of the sales price ! Regions/languages supported: Base specs for Japan/Japanese !Total number of listings to date: More than 1.5 billion Many sellers enjoy having the items they no longer need purchased and used by buyers who need them, and buyers enjoy the feeling of hunting for treasure as they search through unique and diverse items for lucky finds. In addition to buying and selling, users actively communicate through the buyer/seller chat and the “Like” feature. The Mercari app is a C2C marketplace where individuals can easily sell used items. We want to provide both buyers and sellers with a service where they can enjoy safe and secure transactions. Mercari offers a unique customer experience, with a transaction environment that uses the payments Mercari holds in escrow, and simple and affordable shipping options.
  4. GitHub, Twitter: @cubicdaiya Name: Tatsuhiko Kubo Tech Lead, Network at

    Mercari, Inc.
  5. Responsibilities of the Network team • Ensure the Mercari Edge

    system reliability • CDN, TLS, DNS, Load Balancing, Reverse Proxy, … • Networking for Cloud and On-Premises • Routing between multiple DCs, Cloud Interconnect, … • Service Mesh • Istio, mTLS, …
  6. Topics • Fastly in Mercari • System architecture of Mercari

    with Fastly • CI/CD pipeline with Fastly • Monitoring with Fastly • Mercari Image Delivery with Fastly • Keeping high Cache Hit Ratio • Optimizing Images • Automating cache purge
  7. Fastly in Mercari

  8. Fastly in Mercari • Both static and dynamic contents are

    handled with Fastly • Images (item photo, user profile photo, …) • Static assets (JavaScript, CSS, …) • API / Web
  9. Fastly in Mercari • Scale of traffic • 300k+ RPS

    at peak • 20+ Gbps at peak • Other stats • 40+ services • 10+ TLS domains • 80+% of total traffic volume are Images
  10. Edge of Mercari JP Infrastructure API/Web Static assets (js, css,

    etc…) Image ImageFlux Amazon S3 Cloud Load Balancing GKE GCS
  11. CI/CD pipeline for Fastly Pull Request Run CI Terraform plan/apply

    Configure Store tfstate GCS
  12. Monitoring Fastly metrics

  13. Datadog Integration with Fastly + https://docs.datadoghq.com/integrations/fastly/

  14. Datadog Integration with Fastly • Fastly metrics can be shown

    and customized on Datadog • e.g. hit_ratio, requests, bandwidth, status_4xx, status_5xx, etc… • Advantages of Datadog Integration with Fastly • Easy to integrate (Only need to register Fastly API token and Fastly Service IDs) • We can combine multiple metrics and create original metrics
  15. CI/CD pipeline for Datadog dashboard and monitor Pull Request Terraform

    plan/apply Run CI by GitHub Actions Configure Store tfstate GCS
  16. Mercari Image Delivery

  17. Images on Mercari app • Images are the main content

    on a lot of screens • Timeline, Search Results, Recommened Items • Liked Items, Browse Item History • Item Details, … Timeline Item Details → A lot of images are displayed
  18. A tremendous amount of images are delivered from CDN •

    Mercari JP • Total number of listings to date: More than 1.5 billion • Up to 10 photos can be uploaded per one listed item • Displayed item photos on Mercari app are resized and transformed from JPEG to WebP on-the-fly • Cached objects on CDN increase Number of images handled by CDN snowballsʂ
  19. Mercari Image Delivery in JP ImageFlux Amazon S3

  20. Mercari Image Delivery in US Amazon S3

  21. Mercari Image Delivery in US Amazon S3 + Image Optimizer

  22. Fastly Image Optimizer in Mercari US • Originally, we used

    an internal image conversion proxy in Go • To resize, crop, convert format, … on-the-fly • We switched to Fastly Image Optimizer in 2018 • Fastly VCL was useful to keep the original manipulation rule at that time • sub vcl_recv { # absorb the difference between our proxy and Image Optimizer … set req.url = regsub(req.url, “([&\?])w=([0-9]+)”, “\1width=\2”); set req.url = regsub(req.url, “([&\?])h=([0-9]+)”, “\1height=\2”); set req.url = regsub(req.url, “([&\?])fmt=([a-z]+)”, “\1format=\2”); … }
  23. Our best practice for Image Delivery • Keep high Cache

    Hit Ratio(CHR) in any case! • Enable Origin Shielding • Set long TTL in Cache-Control: max-age=… • Optimize image while keeping appropriate quality • Balance UX and cost saving • Pay attention to the image size distribution • Automate cache purge
  24. Origin Shielding

  25. Origin Shielding • Sandwiching a POP between Edge POP and

    Origin • Cover cache miss on Edge POP • Official document • https://docs.fastly.com/en/guides/shielding
  26. Sandwich Shielding POP between Edge POP and Origin Edge POP

    Edge POP Edge POP Shielding POP ImageFlux Amazon S3 Cache Hit on Edge POP Cache Hit on Shielding POP Cache Miss
  27. Pros/Cons of Origin Shielding • Pros • Cache Hit Ratio

    improves significantly • Cons • Additional traffic fee on Shielding POP is charged
  28. Cache Hit Ratio on Fastly-Stats

  29. Cache Hit Ratio on Fastly-Stats

  30. Cache Hit Ratio on Fastly-Stats HIT RATIO does not contain

    Shielding hits
  31. Cache Hit Ratio on Fastly Stats • Hit RATIO does

    not contain Shielding hits • The same applies to hit_ratio in Historical Stats • We need to calculate Cache Hit Ratio with Shielding by combining other metrics
  32. CHR CalculationʢIf Shielding is enabledʣ Cache Hit Ratio(True) = (1

    − miss − shield requests − shield ) × 100 miss: Number of cache misses shield: Number of requests from edge to the shield POP requests: Number of Requests Processed The truth about cache hit ratios: https://www.fastly.com/blog/truth-about-cache-hit-ratios * Taking no account of number of some states like pass
  33. CHR Calculation on Datadog

  34. CHR Calculation on Datadog widget { query_value_definition { autoscale =

    false custom_unit = “%” precision = 2 request { aggregator = “avg” q = “(1-(avg:fastly.miss{${local.datadog_tag}}- avg:fastly.shield{${local.datadog_tag}})/(avg:fastly.requests{$ {local.datadog_tag}}-avg:fastly.shield{${local.datadog_tag}}))*100” } titile = “Cache Hit Rate (True)” } } Terraforming
  35. Daily CHR (Mercari Image Delivery in JP) CHR with Shielding

    hit_ratio in Historical Stats
  36. Daily CHR (Mercari Image Delivery in US) CHR with Shielding

    hit_ratio in Historical Stats
  37. Impact of Origin Shielding • Mercari Image Delivery’s Cache Hit

    Ratio improves significantly • In approximately, • JP: 96.x% -> 98.x% • US: 60~70+% -> 80~90+% CHR for a given month when Shielding is enabled Cache Hit Rate(Edge): hit_ratio in Historical Stats Cache Hit Rate(True): CHR with Shielding
  38. Why is there such a big difference in CHR between

    JP and US? • The United States is larger than Japan • Fastly has more POPs in the United States than in Japan • As the number of POP increases, CHR on the edge decreases • Japan: 3 POPs, North America: 20+ POPs • References • Fastly Network Map: https://www.fastly.com/network-map • Why having more POPs isn’t always better: https://www.fastly.com/blog/why- having-more-pops-isnt-always-better
  39. Optimizing Images

  40. Image size distribution in Mercari Image Delivery in JP 1kɿ~1KB

    10kɿ1KB~10KB 100kɿ10KB~100KB 1mɿ100KB~1MB 10mɿ1MB~10MB 100mɿ10MB~100MB 1gɿ100MB~1GB
  41. It’s useful to know the image size distribution 1kɿ~1KB 10kɿ1KB~10KB

    100kɿ10KB~100KB 1mɿ100KB~1MB 10mɿ1MB~10MB 100mɿ10MB~100MB 1gɿ100MB~1GB
  42. 100k size objects started to increase 1kɿ~1KB 10kɿ1KB~10KB 100kɿ10KB~100KB 1mɿ100KB~1MB

    10mɿ1MB~10MB 100mɿ10MB~100MB 1gɿ100MB~1GB It’s useful to know the image size distribution
  43. 100k size objects started to increase 1kɿ~1KB 10kɿ1KB~10KB 100kɿ10KB~100KB 1mɿ100KB~1MB

    10mɿ1MB~10MB 100mɿ10MB~100MB 1gɿ100MB~1GB It’s useful to know the image size distribution Detected and fixed the cause
  44. 100k size objects started to increase Detected and fixed the

    cause The cause was that JPEG is delivered instead of WebP in some microservices 1kɿ~1KB 10kɿ1KB~10KB 100kɿ10KB~100KB 1mɿ100KB~1MB 10mɿ1MB~10MB 100mɿ10MB~100MB 1gɿ100MB~1GB It’s useful to know the image size distribution
  45. Optimizing Images • Displayed images on Mercari app are resized

    and transformed from JPEG to WebP on-the-fly • Around 2017~2018, on-the-fly resizing and WebP transformation were introduced for item photo and user profile photo • We decreased the traffic volume by 30~40% at that time
  46. Why is Optimizing Images important? • To balance UX and

    cost saving for CDN • By optimizing image, the UX impacted by network latency can be improved while saving costs • Optimizing images leads to save monthly data volume for users
  47. Automating cache purge

  48. Cache purge on Slack

  49. Cache purge on Slack ᶃ Type /purge_cache URL ᶄ Build

    and transfer an API payload ᶅ Issue a cache purge API request ᶃ ᶄ ᶅ Google Cloud Functions
  50. Cache purge on Slack • Only Typing /purge_cache URL in

    Slack • Implemented by Slack Slash Commands • https://api.slack.com/interactivity/slash-commands • Finally, Google Cloud Functions in Go runs a cache purge for multiple CDNs
  51. Cloud Functions with Cloud Pub/Sub trigger

  52. Cloud Functions with Cloud Pub/Sub trigger • Cloud Functions can

    be triggered by message published to Pub/Sub topics • https://cloud.google.com/functions/docs/calling/pubsub • It’s useful to automate event-driven cache purge
  53. References • System Integration with Fastly • https://speakerdeck.com/cubicdaiya/system-integration-with-fastly • Google

    Cloud FunctionsΛ࢖ͬͯSlackͰ؆୯ʹCDN্ͷΩϟογϡΛফͤΔ Α͏ʹ͢Δ࿩ • https://engineering.mercari.com/blog/entry/2019-09-20-110000/ • CDNͰੜ͖ӬΒ͑Δݹ͍ը૾ͷΩϟογϡΛফ͢Cloud Functionsͷ࿩ • https://engineering.mercari.com/blog/entry/2019-12-05-180000/