new users each month • Static storage is over 150TB of data – Adding over 1TB of files every day • 3 Data centers + 2 Clouds (Google AE, Amazon) – Around 300 servers • Over 40,000,000 Server API calls per day • ~400 people work at wix – Over 100 people in R&D 20:37
feature centric life • Make small and frequent release as soon as possible • Automate everything – TDD/CI/CD • Measure Everything –A/B test every new feature –Monitor real KPIs (business, not CPU) 20:37
is responsible from product idea to 1M active users – Remove every obstacle in the developer’s path – Big cultural change from waterfall – affects the whole company 20:40
Git without being fully tested • Before fixing a bug first write a test to reproduce the bug • Cover legacy (untested) systems with Integration tests 20:37
on development –TDD slows down development – With TDD we write more code (product + test code). • Current Test Count (U-Tests + IT-Tests) – over 10,000 20:42
products faster – Removes fear of change – Easier to enter some one else’s project –Do we really need QA? (Yes, they code tests) – Writing a feature is 10-30% slower, 45-90% less bugs – 50% faster to reach production. –Considerably faster time to fix bugs 20:37
checkout to a random computer. • Tests that cannot be debugged on a developer machine will never consistently run for any period of time • Tests should run fast • Tests have to be readable – They are the project’s specs • Fixture is evil! 20:47
Specific users or group of users • Percentage of traffic • By GEO • By Language • By user-agent • User Profile based • By context (site id or some kind of hash on site id) 20:37
test system load – New database flows/migration – Refactoring that may affect performance and memory usage • By Url parameter – Enable internal testing – Product acceptance – Faking GEO • By FT cookie value – Testing – When working with API on a single page application 20:37
We open the new feature to a % of users – Define KPIs to check if the new feature is better or worse – If it is better, we keep it – If worse, we check why and improve – If we find flaws, the impact is just for % of our users (kind of Feature Toggle) • An interesting site effect on product • How many times did you have the conversion “what is better”? – Put the menu on top / on the side • Well, how about building both and A/B Testing? 20:37
user – Toss is randomly determined – Can not guarantee persistent experience if changing browser • Registered User – Toss is determined by the user ID – Guarantee toss persistency across browsers – Allows setting additional tossing criteria (for example new users only) – Only use this for sections that a user has to be authenticated 20:37
test parentage of users with optional filters – New Users Only (Registered users only) – By language – By GEO – By Browser – user-agent – OS – Any other criteria you have on your users 20:37
a value of a test for validation – Helps support experience what users experiencing • Override methods – Via URL parameter – Via cookie • Pause tests • Start/Stop Test • Bots always get “A” 20:37
tests – Legacy code usually covered by IT tests • Refactor from inside out – Small iterations with tests – Refactor small methods - make sure the tests don’t break – Deploy often • Re-write from the outside in – Write from scratch – Code duplication sometimes needed (temporary) – Protected by Feature Toggle 20:37
by feature toggle 1. Write to old / Read from old 2. Write to both / Read from old 3. Write to both / Read from new, fallback to old 4. Write to new / Read from new, fallback to old 5. Eagerly migrate data in the background 6. Write to new / Read from new 20:37
one and install on it the new version. It is not active yet • Do self test • Activate the new server it is passes self test • Continue deploying the other servers, a few at a time, checking each one with self test A 1.1 B 1.1 A 1.1 B 1.2 A 1.1 A 1.1 B 1.1 B 1.1 A 1.1 A 1.1 B 1.1 B 1.2 A 1.1 B 1.2 A 1.1 A 1.1 B 1.1 B 1.2 A 1.1 B 1.1 A 1.1 A 1.1 B 1.1 B 1.2
run a self test before deploying the next server. • Checking server configuration and topology – Make sure database is accessible (DB connection string) – Is the schema the one I expect – Access required local resources (data files, other config files, templates, etc’) – Access remote resources – RPC / REST endpoints reachable and operational • Server will refuse requests unless it passes the self test • Allow a way to skip self test (and continue deployment) 20:37
release a new version of one • Now Rollback the other… 20:37 A 1.1 B 1.2 A 1.2 B 1.1 A 1.1 A 1.1 B 1.1 B 1.2 A 1.2 A 1.1 B 1.1 B 1.1 A 1.1 B 1.1 A 1.1 A 1.1 B 1.1 B 1.1 A 1.0 A 1.2 A 1.1 B 1.2 B 1.1 B 1.2 A 1.2 A 1.2 A 1.1 B 1.2 B 1.1 B 1.0