Scalability @ Sale Stock

Scalability @ Sale Stock March 29th, 2016

Welcome!

Who are we?

Who are we? • Tech startup that sells mid-low women’s
fashion • Engineering team started ~1 year ago • Launched our in-house website ~8 months ago

Increase in various metrics • Revenue • Team Size •
User base • Traffic

Scalability Problems: • Iteration Speed • Code Quality • Backend
Infrastructure • etc.

Iteration Speed Scalability

GitFlow

Git Flow • Dual main branches: master & develop •
Long-living feature branches

GitFlow downsides • Isolated feature branches • Horribly painful merges
• Horribly risky deploys

Trunk-based Development

Trunk-based Development • Single main branch (master-only) • Discouragement of
long-living feature branches

Trunk-based Development gives us: • Less merge conflicts • Less
risky deploys • Faster iteration speed • Fewer dedicated non-prod environments

More frequent merges directly to master… that’s scary.

What we’re doing: • Automated test suite • Feature gating

Automated Test Suite

Automated Test Suite 1. Core Test Suite 2. Comprehensive Test
Suite 3. Continuous Production Smoke Test

Core Test Execution • Runs on every merge cycle of
our www codebase • Results decide whether we execute auto-deploy for latest merge • Optimized for the best coverage-over-speed investment ratio • Consists of hundreds of functional test cases • Runs on 20-node test cluster for speedy execution

Comprehensive Test Execution • Ultra-complete test coverage -- covers all
user usage paths • Runs on multiple devices and browsers • Runs periodically out of merge cycle

Continuous Prod Smoke Test • Runs continuously against prod environment
• Simulates real users • More sane, useful, accurate form of continuous monitoring compared to regular uptime alerting.

Feature Gating

Feature Gating • Allows code paths to be activated to
a subset of users / only employees

Codebase Scalability

SOA / Microservice Architecture • One domain → One service
• Clear engineer / team ownership • Downside: ◦ Increasing number of features and services makes for complex development & deployment

Problems: • No standards around development of many-services cluster •
No standards around production deployment of many-services cluster

Development Requirements • Download the software needed for each service
/ stack type • Run each services (preferably in topological order) • Run dependency processes (MySQL / Redis / Kafka) • Connect the services & databases properly (through env vars)

Deployment Requirements • Create & run containers for each service
• Run each services (preferably in topological order) • Scale the services properly • Connect the services & databases properly (through env vars)

ClusterGraph

ClusterGraph A data structure about how a cluster is formed
from different services.

How do we build this?

ClusterGraph • Monorepo • Microservice within top-level folders • In
each of the top-level folder, define service.yaml, which contains: ◦ name ◦ stack ◦ dependency list (list of other service’s names) ◦ database dependencies ◦ etc. • The service.yamls of all the services are then used to statically build the cluster graph

ClusterGraph • This also means is cluster graph is versionable
per git commit • Can technically do atomic graph refactoring per single commit

ssi • Internal command-line program • Able to construct cluster
graph out of our source code • Execute them locally for development • Instantiate databases

komandan • Production-stage executor of ClusterGraph • Uses Kubernetes under-the-hood

Kubernetes

komandan • Stores multiple cluster graph versions • Can deploy
complete cluster in ~15 seconds • Revert in the same amount of time • Handles service discovery through env var injection

komandan • Since it’s so cheap (and fast) to create
new clusters, it’s possible to do: ◦ Transient clusters for test suite executions ◦ Transient clusters for open PRs

Why is this important? • Development of complex clusters are
more productive • Deployment of complex clusters are simpler and more robust • Allows us to build more features, quicker

Thomas Diong

Scaling Sale Stock with Products

Machine Learning & AI Products • NLP • Recommender System

Customer Behavior • Customers are mostly outside of cities •
Don’t own desktop or laptop • First computer is a low-end Android, terrible internet connection • Buying behavior is still on offline shops, risk-averse • Understanding of purchase is through a conversation

AI Needs to be Able To • Indonesian Language •
Natural • Understand eCommerce context

Usual Customer’s Chat

AI Needs to be Able To • Indonesian Language •
Natural • Understands eCommerce Context • Speaks Alay

Process Preprocessing - Tokenize - Vectorize Learning - Deep learning
(Tensorflow) Output - Word by word generation until end of line

Usual Customer’s Chat

Current Limitations

Recommender System

Personalization • Over 20k SKUs and increasing • Different types
of items. Muslim wear, dress, skirts, tops, bottom, bags, shoes, accessories etc • Different people have very differing taste • Customer complain about not finding things they like

Recommender System • Many ways to do it • Costly
and time-consuming to experiment, iterate with different methods

Recommender System Ideals • Add new models from new data
points • Improve existing models • Continuously A/B Test

Modular Design W1(item-to-item similarity score) + W2(Interest in Item Based
on View) + W3(Interest in Item Based on Historical Transaction) + … + etc ∑

Advantages 1) Each individual modules can be used to build
other interesting projects outside of Recommender System - “Produk Menarik Lain” - Marketing Push 2) Improvement or addition of modules independent of each other 3) Aggressively AB test continuously without having to rebuild

Next On Recommender • Online learning

SALESTOCK DATA INFRASTRUCTURE

File Storage 1. FILE STORAGE HDFS - Scalable distributed file
system for fast read/write and fault tolerant. - Data locality for faster access.

File Storage Data Management & ETL 2. DATA MANAGEMENT &
ETL Hive - Define tables, partitions, bucketing, and file formats used for specific requirements. - Translate SQL into MapReduce jobs. - Can write UDF for custom requirements.

File Storage Data Management & ETL Random Read / Write
3. RANDOM READ / WRITE HBase - Consistent random read/write on top of HDFS. - Flexibility on key distribution and column design. - Apache Phoenix for SQL skin.

File Storage Data Management & ETL Random Read / Write
IMPALA SQL Query & ETL 4. SQL QUERY & ETL Impala - Translate SQL into MPP jobs. - Uses Hive Metastore & UDF. - Does not use MapReduce to process query. - Can read files from HDFS/HBase/S3.

Complex ETL + Machine Learning File Storage Data Management &
ETL Random Read / Write IMPALA SQL Query & ETL 5. COMPLEX ETL + MACHINE LEARNING Spark - In memory processing, faster and easier to express parallel processing compared to MapReduce. - Can read/write from multiple sources, HDFS/HBase/S3.

Front End Portal Complex ETL + Machine Learning File Storage
Data Management & ETL Random Read / Write IMPALA SQL Query & ETL 6. FRONT END PORTAL Hue - Since Impala is used a lot by non-developers, we need a good GUI to help them use it easily. - Besides that, also have a decent HDFS/HBase explorer. - Can query RDBMS if needed.

Job Scheduling Front End Portal Complex ETL + Machine Learning
File Storage Data Management & ETL Random Read / Write IMPALA SQL Query & ETL 7. JOB SCHEDULING Azkaban - Good DAG visualization. - Simple job configuration. - Easier to inspect logs in case of exception happens.

8. ARCHIVING AWS S3 Archiving Job Scheduling Front End Portal
Complex ETL + Machine Learning File Storage Data Management & ETL Random Read / Write IMPALA SQL Query & ETL

9. DATA INGESTION Kafka + Spark Streaming MySQL + Sqoop
IMPALA - Import MySQL tables to Hive tables - Real time data stream

We’re Hiring!

We’re Hiring! Some of our team members hail from:

We’re Hiring! Positions: DevOps Engineer Front-end Engineer Back-end Engineer Quality
Assurance Engineer Data Scientist Data Infrastructure Engineer Business Intelligence Analyst

We’re Hiring! Competitive salary Company shares Option for working remotely
Relocation support Skills development support Health benefits cover family Lunch and meals provided Flexible working hours Career development Periodic team gathering Regular company hackathon

Reach us at: [email protected]

Thanks for your time!

Scalability @ Sale Stock

Scalability @ Sale Stock

More Decks by Sale Stock Engineering

Other Decks in Technology

Featured

Transcript