Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analyzing the Barcode Scans of Millions of Users - The Backend Infrastructure at Scandit

Analyzing the Barcode Scans of Millions of Users - The Backend Infrastructure at Scandit

Christof Roduner

August 27, 2014
Tweet

More Decks by Christof Roduner

Other Decks in Programming

Transcript

  1. Cassandra User Group Zurich August 14, 2014 Dr. Christof Roduner

    COO and co-founder [email protected] Analyzing the Barcode Scans of Millions of Users The Backend Infrastructure at Scandit
  2. 3 AMBITION «Be the leading provider of software- based barcode

    scanning and data capture solutions for smartphones, tablets and wearables.»
  3. 4 COMPANY PROFILE  Scandit AG, founded in 2009, headquartered

    in Zurich, Switzerland  Founded by ETH Zurich and MIT alumni (former Auto-ID Center research scientists)  Specialists in mobile image processing and cloud computing  Over 12,000 licensees worldwide  Key customers include Ahold, CapitalOne, Bayer, Nasa, Coop, Intuit, Saks 5th Avenue, Tetrapak, Shopkick and GS1  Scandit Inc. with offices in San Francisco and Boston
  4. 10 SCANDIT ENABLES SW-BASED AIDC ON SMARTPHONES, TABLETS, WEARABLES Great

    scan performance Ruggedized Clunky High TCO Outdated HW and SW Only for heavy users * 60% of current customers do not actually require ruggedness. Rugged accessories / cases can provide certified drop, dust, water and fire protection and improved ergonomics ** Not limited to ID but also supporting mobile OCR, image recognition, etc. Great scan** performance Ruggedizable* Excellent UX Low TCO Modern HW and SW Dedicated and BYOD users Cloud Services
  5. 12 ENTERPRISE-GRADE BARCODE SCANNING  Proven Technology  Installed on

    25+ million devices  More than 12,000 licensees  Works on any device  Scans on 3,000+ Android devices  Even those without autofocus, with low resolution cameras (240p)  Wearables like Google Glass, Vuzix, etc.  Support for Xamarin, Titanium, Phonegap  Cloud-based management platform  “Scanalytics”: scan management and analytics (top products, categories, at- home vs. in-store, etc.)  Device management capabilities  Product information managment
  6. 17 OVERVIEW AND DATA FLOW 4DBOEJU $MPVE .PCJMF%FWJDF .PCJMF"QQ 4DBOEJU4%,

    #BSDPEF 3FDPHOJUJPO &OHJOF %FWJDF "DUJWBUJPOT 4DBOT 6TFS %FWJDFBOE -JDFOTF .BOBHFNFOU 4DBO1FSGPSNBODF "OBMZTJT 4DBOBMZUJDT &OHJOF 3FBMUJNF4DBOBMZUJDT %BTICPBSE 3FQPSUJOH&OHJOF 1SPEVDU *OGPSNBUJPO .BOBHFNFOU $BTTBOESB
  7. 18 OUR UNIQUE COMPUTER VISION ALGORITHMS 0 50 100 150

    200 250 300 0 100 200 300 400 500 600 700 brightness value position in scan line / pixels Brightness values along blurry scan line 0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 brightness value position in scan line / pixels Brightness values along sharp scan line
  8. 23 ARCHITECTURE OVERVIEW %FWJDF "DUJWBUJPOT 4DBOT #BDLFOE/PEF  "1* 4DBO1FSGPSNBODF

    "OBMZTJT 4DBOBMZUJDT &OHJOF $BTTBOESB -PBE#BMBODJOH 3BCCJU.2 #BDLFOE/PEF O "1* 4DBO1FSGPSNBODF "OBMZTJT 4DBOBMZUJDT &OHJOF $BTTBOESB 3BCCJU.2 All nodes identical
  9. 24 WHY WE CHOSE CASSANDRA  Scalability  High-volume storage

    (permanently store billions of scans)  High-volume throughput  Support large number of concurrent client requests (millions of mobile devices, each client operation creates lots of writes)  Write-heavy environment  Availability  Low maintenance - even as our customer base grows  Multiple data centers  Data model (wide rows) is a good fit
  10. 25 FOUR YEARS OF CASSANDRA USE AT SCANDIT…  We’ve

    had Cassandra in production use for almost 4 years  Linux machines  Stable and mature – no serious incidents  Upgrades mostly painless (we’ve followed most releases from 0.6.x to 2.0.9)  But a few hiccups…
  11. 26 PRODUCTION EXPERIENCE  Main challenge: Ruby community driver 

    Missing features: load balancing, retry policy  Transparent failover does not work  Rolling restart of cluster will kill your application  No support for new features (e.g., user-defined types)  Switch to JRuby and use Datastax driver…  Repair  Very important to run regularly  Standard «nodetool repair -pr» never worked for us  We found that repair coordinator hangs waiting for snapshots  Workaround: use «-par» option to prevent snapshots (undocumented)
  12. 27 PRODUCTION EXPERIENCE  Occasional table corruption  East to

    fix: remove SSTable file and repair  IP address renumbering  Changing a node’s IP address is not straightforward  We decided to play it safe and used iptables redirection instead  Any experiences or ideas?
  13. 28 TIPS FOR PRODUCTION USE  If you have to

    delete data, always delete entire rows (not columns)  Dreaded “tombstone overwhelming” exception  Aborted queries (or outright crashes pre C* 2.0.x)  Have good log monitoring in place  Cassandra is not “set up and forget”  We use Kibana  OpsCenter is your friend  Helps with debugging