Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling a million databases on 000webhost

Balys
September 26, 2017

Scaling a million databases on 000webhost

000webhost handles millions of user queries on close to million unique databases. In this talk we will present the obstacles we encountered on such scale. We will show why we moved away from the traditional webserver->dbserver to a more dynamic architecture using HAProxy, ProxySQL and MariaDB in LXC containers. We will present our model, query routing logic and the outcome from the collaboration with ProxySQL developer Rene.

Balys

September 26, 2017
Tweet

Other Decks in Technology

Transcript

  1. Scaling a million databases
    @

    View Slide

  2. 000Webhost.com
    No.1 Free Hosting Provider
    > 15+ million users
    > 1 million databases
    > 300+ million reqs/day
    > IPv6-only infrastructure
    > SSD-only

    View Slide

  3. Powered by
    Hostinger
    > 3+ million active users
    > 2 million databases
    > 300+ million reqs/day

    View Slide

  4. The Before, our Database problem

    View Slide

  5. Manual logical sharding for databases
    ● Tied to hardware resources
    ● Not scalable
    The Before, Our Database Problem 1

    View Slide

  6. Uneven load on servers
    ● Different workloads on
    servers
    ● Spiky performance
    The Before, Our Database Problem 2

    View Slide

  7. Error-prone migrations
    ● Involving downtime
    ● All users use hardcoded
    mysql hostnames
    ● Change of DNS required
    The Before, Our Database Problem 3

    View Slide

  8. Difficult Maintenance
    ● Huge blast radius if
    downtime is scheduled
    ● Hard to upgrade
    The Before, Our Database Problem 4

    View Slide

  9. What If we could just use localhost, always?

    View Slide

  10. The Solution
    Initial Draft of Architecture

    View Slide

  11. No more manual routing
    Initial Draft of Architecture
    ● Route clients using
    ProxySQL
    ● Distribute load to
    ProxySQL instances using
    HAProxy
    ● Just use localhost

    View Slide

  12. Automate LB via migration
    ● Heterogeneous hardware
    ● Any user can be on any
    server
    Initial Draft of Architecture

    View Slide

  13. Easier to maintain!
    ● Immutable infrastructure
    ● Rebuild LXC containers
    periodically
    ● Easy to add new features
    ● Easy to upgrade
    ● Easy to rollback (or
    rollforward)
    Initial Draft of Architecture

    View Slide

  14. Evaluations

    View Slide

  15. ● Too many databases
    acl_get() too heavy for
    show databases
    Physical MariaDB Servers
    Evaluations
    Samples: 15K of event 'cycles', Event count (approx.): 41470603374
    Children Self Command Shared Object
    Symbol
    + 97.89% 0.00% mysqld mysqld
    [.] mysql_execute_command
    + 97.89% 0.00% mysqld mysqld
    [.] execute_sqlcom_select
    + 97.89% 0.00% mysqld mysqld
    [.] handle_select
    + 97.89% 0.00% mysqld mysqld
    [.] mysql_select
    + 97.89% 0.00% mysqld mysqld
    [.] JOIN::exec
    + 97.89% 0.00% mysqld mysqld
    [.] JOIN::exec_inner
    + 97.89% 0.00% mysqld mysqld
    [.] get_schema_tables_result
    + 97.88% 0.03% mysqld mysqld
    [.] fill_schema_schemata
    + 72.06% 71.89% mysqld libc-2.17.so
    [.] __strcmp_sse42
    + 19.57% 18.73% mysqld mysqld
    [.] acl_get
    + 5.77% 5.77% mysqld mysqld
    [.] strcmp@plt
    Take a look at acl_get()
    global elapsed;
    probe process("/usr/sbin/mysqld").function("acl_get").return
    {
    elapsed += gettimeofday_ms() - @entry(gettimeofday_ms());
    }
    probe end { printf("%d ms\n", elapsed); }
    output:
    16052 ms (~16s).
    Summary
    =========
    Root user has all rights, thus it doesn't check for acl_get().
    Regular user has strict rights, thus it takes to return on every
    table..

    View Slide

  16. Ok, Lets use containers.
    Evaluations
    ● OpenVZ → LXC
    ○ Old kernel, missing a lot of features
    ○ Initially we used CephFS and OpenVZ kernel did
    not support it fully
    ● CephFS → Ceph RBD
    ● Ditched Ceph for SSD eventually

    View Slide

  17. mysql-proxy (deprecated)
    ● Rewrite show databases queries using information for IS
    ● Could not route by username, so it was useless on our
    case
    ● Weak performance
    Evaluations

    View Slide

  18. MaxScale -> 000scale for 000webhost
    ● Probably OK, but..
    ● Stores all configuration/user data in files, which
    did not scale with millions of users
    ● So we did not evaluate it further at the time
    Evaluations

    View Slide

  19. Final Solution

    View Slide

  20. ProxySQL
    ● Looked very promising
    ● Could route by username to different
    shards (instances)
    ● Rewrite queries based on regex
    ● Apply limits to users
    ● Save configuration to SQLite
    ● Native clustering on the way
    Evaluations

    View Slide

  21. Routing solution with ProxySQL
    Route users from ProxySQL
    to exact LXC container
    (use database username as
    routing key)
    Evaluations
    MySQL [(none)]> select * from mysql_servers where
    hostgroup_id=1018\G
    *************************** 1. row
    ***************************
    hostgroup_id: 1018
    hostname: 2a02:4780:bad:6::21
    port: 3306
    status: ONLINE
    weight: 1
    compression: 0
    max_connections: 10000
    max_replication_lag: 0
    use_ssl: 0
    max_latency_ms: 0
    comment:
    1 row in set (0.00 sec)
    MySQL [(none)]> select * from mysql_users where
    default_hostgroup=1018 limit 1\G
    *************************** 1. row
    ***************************
    username: id10000_wp
    password: XXXXXXXXXXXXXXXX
    active: 1
    use_ssl: 0
    default_hostgroup: 1018
    default_schema: NULL
    schema_locked: 0
    transaction_persistent: 1
    fast_forward: 0
    backend: 1
    frontend: 1
    max_connections: 50
    1 row in set (0.00 sec)

    View Slide

  22. Issues we had with ProxySQL
    ● Each user modifications required ProxySQL to reload (`LOAD MYSQL USERS`) all users to memory
    which was very expensive operation and took up to 20 minutes and increasing
    ● After few iterations Rene improved this from 20min+ to sub-second
    ○ https://github.com/sysown/proxysql/commit/61396cbcdb58578576871a43e0e3b208dcee1962
    ○ https://github.com/sysown/proxysql/commit/17f9953a1836e7b30c61528ddda2fd0552e9dffe
    ○ https://github.com/sysown/proxysql/commit/f550cf1340047b65147be83df75251f640897291
    ● To fulfill all requirement for 000webhost project we added IPv6 support for ProxySQL
    https://github.com/sysown/proxysql/pull/726
    https://github.com/sysown/proxysql/pull/781
    Evaluations

    View Slide

  23. ‘\0’

    View Slide