Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling a million databases on 000webhost

D9958f8c7158fde442c374dab746facb?s=47 Balys
September 26, 2017

Scaling a million databases on 000webhost

000webhost handles millions of user queries on close to million unique databases. In this talk we will present the obstacles we encountered on such scale. We will show why we moved away from the traditional webserver->dbserver to a more dynamic architecture using HAProxy, ProxySQL and MariaDB in LXC containers. We will present our model, query routing logic and the outcome from the collaboration with ProxySQL developer Rene.

D9958f8c7158fde442c374dab746facb?s=128

Balys

September 26, 2017
Tweet

Transcript

  1. Scaling a million databases @

  2. 000Webhost.com No.1 Free Hosting Provider > 15+ million users >

    1 million databases > 300+ million reqs/day > IPv6-only infrastructure > SSD-only
  3. Powered by Hostinger > 3+ million active users > 2

    million databases > 300+ million reqs/day
  4. The Before, our Database problem

  5. Manual logical sharding for databases • Tied to hardware resources

    • Not scalable The Before, Our Database Problem 1
  6. Uneven load on servers • Different workloads on servers •

    Spiky performance The Before, Our Database Problem 2
  7. Error-prone migrations • Involving downtime • All users use hardcoded

    mysql hostnames • Change of DNS required The Before, Our Database Problem 3
  8. Difficult Maintenance • Huge blast radius if downtime is scheduled

    • Hard to upgrade The Before, Our Database Problem 4
  9. What If we could just use localhost, always?

  10. The Solution Initial Draft of Architecture

  11. No more manual routing Initial Draft of Architecture • Route

    clients using ProxySQL • Distribute load to ProxySQL instances using HAProxy • Just use localhost
  12. Automate LB via migration • Heterogeneous hardware • Any user

    can be on any server Initial Draft of Architecture
  13. Easier to maintain! • Immutable infrastructure • Rebuild LXC containers

    periodically • Easy to add new features • Easy to upgrade • Easy to rollback (or rollforward) Initial Draft of Architecture
  14. Evaluations

  15. • Too many databases acl_get() too heavy for show databases

    Physical MariaDB Servers Evaluations Samples: 15K of event 'cycles', Event count (approx.): 41470603374 Children Self Command Shared Object Symbol + 97.89% 0.00% mysqld mysqld [.] mysql_execute_command + 97.89% 0.00% mysqld mysqld [.] execute_sqlcom_select + 97.89% 0.00% mysqld mysqld [.] handle_select + 97.89% 0.00% mysqld mysqld [.] mysql_select + 97.89% 0.00% mysqld mysqld [.] JOIN::exec + 97.89% 0.00% mysqld mysqld [.] JOIN::exec_inner + 97.89% 0.00% mysqld mysqld [.] get_schema_tables_result + 97.88% 0.03% mysqld mysqld [.] fill_schema_schemata + 72.06% 71.89% mysqld libc-2.17.so [.] __strcmp_sse42 + 19.57% 18.73% mysqld mysqld [.] acl_get + 5.77% 5.77% mysqld mysqld [.] strcmp@plt Take a look at acl_get() global elapsed; probe process("/usr/sbin/mysqld").function("acl_get").return { elapsed += gettimeofday_ms() - @entry(gettimeofday_ms()); } probe end { printf("%d ms\n", elapsed); } output: 16052 ms (~16s). Summary ========= Root user has all rights, thus it doesn't check for acl_get(). Regular user has strict rights, thus it takes to return on every table..
  16. Ok, Lets use containers. Evaluations • OpenVZ → LXC ◦

    Old kernel, missing a lot of features ◦ Initially we used CephFS and OpenVZ kernel did not support it fully • CephFS → Ceph RBD • Ditched Ceph for SSD eventually
  17. mysql-proxy (deprecated) • Rewrite show databases queries using information for

    IS • Could not route by username, so it was useless on our case • Weak performance Evaluations
  18. MaxScale -> 000scale for 000webhost • Probably OK, but.. •

    Stores all configuration/user data in files, which did not scale with millions of users • So we did not evaluate it further at the time Evaluations
  19. Final Solution

  20. ProxySQL • Looked very promising • Could route by username

    to different shards (instances) • Rewrite queries based on regex • Apply limits to users • Save configuration to SQLite • Native clustering on the way Evaluations
  21. Routing solution with ProxySQL Route users from ProxySQL to exact

    LXC container (use database username as routing key) Evaluations MySQL [(none)]> select * from mysql_servers where hostgroup_id=1018\G *************************** 1. row *************************** hostgroup_id: 1018 hostname: 2a02:4780:bad:6::21 port: 3306 status: ONLINE weight: 1 compression: 0 max_connections: 10000 max_replication_lag: 0 use_ssl: 0 max_latency_ms: 0 comment: 1 row in set (0.00 sec) MySQL [(none)]> select * from mysql_users where default_hostgroup=1018 limit 1\G *************************** 1. row *************************** username: id10000_wp password: XXXXXXXXXXXXXXXX active: 1 use_ssl: 0 default_hostgroup: 1018 default_schema: NULL schema_locked: 0 transaction_persistent: 1 fast_forward: 0 backend: 1 frontend: 1 max_connections: 50 1 row in set (0.00 sec)
  22. Issues we had with ProxySQL • Each user modifications required

    ProxySQL to reload (`LOAD MYSQL USERS`) all users to memory which was very expensive operation and took up to 20 minutes and increasing • After few iterations Rene improved this from 20min+ to sub-second ◦ https://github.com/sysown/proxysql/commit/61396cbcdb58578576871a43e0e3b208dcee1962 ◦ https://github.com/sysown/proxysql/commit/17f9953a1836e7b30c61528ddda2fd0552e9dffe ◦ https://github.com/sysown/proxysql/commit/f550cf1340047b65147be83df75251f640897291 • To fulfill all requirement for 000webhost project we added IPv6 support for ProxySQL https://github.com/sysown/proxysql/pull/726 https://github.com/sysown/proxysql/pull/781 Evaluations
  23. ‘\0’