Upgrade to Pro — share decks privately, control downloads, hide ads and more …

libSQL: Taking Sqlite To The Moon

libSQL: Taking Sqlite To The Moon

SQLite is everywhere. The original implementation's stability and compatibility guarantees are hard to beat. It's one of the reasons why forks have come and gone. This session explores a new fork that consistently delivered fantastic works.

libSQL is one of the most fabulous forks ever of SQLite. It allows for features like user-defined WASM functions, a server mode with PostgreSQL wire protocol compatibility, virtual WAL and bottomless storage. This session dives into the modifications and features of libSQL and the new ways SQLite can be useful.

Prior knowledge of SQLite usage and internals is appreciated though not required. A hands-on demo of libSQL features will be provided. The session will also touch on the subject of how this fork is a different take by analyzing the motivation, team profile, and previous track record.

Abdur-Rahmaan Janhangeer

July 21, 2023
Tweet

More Decks by Abdur-Rahmaan Janhangeer

Other Decks in Programming

Transcript

  1. libSQL:
    Taking Sqlite
    To The Moon

    View Slide

  2. View Slide

  3. About Me:
    compileralchemy.com

    View Slide

  4. slides

    View Slide

  5. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 1/41
    libSQL: Taking SQLite to the Moon
    Preparing SQLite for the upcoming decades
    ABDUR-RAHMAAN JANHANGEER
    JUL 21, 2023
    Share
    Few pieces of software enjoy a success as wild as SQLite. It’s now shipped by default on Mac OS,
    Windows, Android, iOS and many flavors of Linux. It’s used by countless apps in a wide range of
    industries including aviation, military and space. It’s unsurprisingly the most widely used and
    deployed database. Not only that, it’s also in the top 5 most widely used and deployed software
    library across all languages at any point in human history.
    This is not possible without strong engineering ethics. Being used in several critical fields where
    human lives could be at risk means bending to the stringent rules of the industry in question.
    Aviation for example had relatively insane rules for shipping softwares even before the era of
    TDD. And SQLite, despite being run by volunteers incorporated those standards early on. The
    project on top of this provides long-time support and a guarantee of backward compatibility.
    The massive adoption of SQLite by the biggest players in the tech field is a testimony to the
    value of the project, but, it’s also an extraordinary burden placed on the developers not to bring
    in mediocrity.

    View Slide

  6. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 2/41
    A fork of SQLite, if it can dare be forked, is pure madness if they intend to be serious about it.
    They would need to provide the same guarantees as SQLite. Stable, reliable, and dependable
    software means the introduction of careful, slow-moving changes. Innovation might sometimes
    be a radical change of components to unleash more powers, which in turn spells out instability, a
    huge put-off. The team behind the fork must be competent. The SQLite team with decades of
    experience, codebase intimacy and field specialization already pushes improvements to the limit.
    To be able to propose alternatives, you must have a good notion of databases to craft an even
    better product. And wisdom. If a fork of a car is a plane, there are two drawbacks: car users will
    most probably abandon it as it’s no longer a car. And, since it’s a plane, it will be compared to
    planes, it must be up to the level and satisfy plane users. The leadership team must be able to
    find a purpose to the project, fuel motivation for a long time to come and make sure great people
    remain around even if nobody adopts the project, a hard challenge to answer.
    And indeed, many projects do not fork SQLite. They add components over SQLite and package it
    as an improved project. Forks of SQLite exist. They come and crumble. But, sometimes ago an
    unusual fork appeared: libSQL. This post discusses libSQL and it’s jaw-dropping team, the
    technical challenges they overcame, the adoption strategy they put in place, the production
    validation scheme they concocted, the ongoing development, the coma of future roadmap
    checkpoints, how features resonate more than fully with a cloud-dominated industry and why it’s
    the grandest, noblest and most soundly engineered forks of sqlite of all time.

    View Slide

  7. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 3/41
    Author of vlcn.io on libSQL
    The legend that is SQLite

    View Slide

  8. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 4/41
    To understand libSQL, we must first understand SQLite. The inexorable pervasion of SQLite
    calls for a look at the maintainers, vision, codebase and coding standards. We’ll also have a brief
    overview of the internals.
    The SQLite devs currently aim to support the project through 2050. Whatever happens, they
    commit to support the project. They also have the source code automatically replicated in several
    locations around the world. The project has extensive testing both code-wise and end-user-wise.
    Many Open Source projects have so many users that they don’t worry about unit tests. SQLite
    introduced aviation-grade tests to ensure high-quality software. Throughout the many
    interactions with big companies, the deployment at massive scale contributed to solve bugs that
    the testing suite could not ever catch. Thus, SQLite enjoys top notch excellence. You can read
    more about the evolution from a hobby project to where it is today. They also pledge not to give
    in to new fashion trends just for the sake of it, labeling under the “old school“ point.
    Every machine-code branch instruction is tested in both directions. Multiple times. On
    multiple platforms and with multiple compilers. This helps make the code robust for future
    migrations. The intense testing also means that new developers can make experimental
    enhancements to SQLite and, assuming legacy tests all pass, be reasonably sure that the
    enhancement does not break legacy.
    The codebase is a pleasure to read. It is well-commented. The project is also well documented.
    Well-commented means that you can sit down and read the source code like a book. This project
    is the personification of how good Open Source projects should be. It is a robust piece, but it’s
    one we can recommend any beginner to start with. Even people who don’t know C can browse
    and follow along. It’s a long read for sure but very enjoyable.

    View Slide

  9. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 5/41
    The SQLite source code is over 35% comment. Not boiler-plate comments, but useful
    comments that explain the meaning of variables and objects and the intent of methods and
    procedures. The code is designed to be accessible to new programmers and maintainable over
    a span of decades.
    SQLite devs and particularly the author has a solid background in programming. He worked at
    bell labs and did his Phd in compiler craft. SQLite was designed with a bytecode engine from the
    start. The author is not afraid to spin up his own implementations. He is dissatisfied with Git
    and similar CVS systems. He wrote his own, named Fossil, used by the project. He likes to build
    his own stuffs. It trickles down to the database engine, B-tree layer, parser (he is allergic to
    YACC, BISON) and even the editor he writes SQLite in. This shows his incredible mastery over a
    wide range of topics as well as a bold spirit. Generally people avoid this. But, for SQLite, it
    turned out to be a predictable advantage in terms of stability as they don’t have to worry about
    3rd party codes surprise breaking changes and being license-worry-free.
    As expected, SQLite has a consortium to preserve and keep the project ongoing. It has a team of
    3 developers. Yes, not a crowd-full driven project. It’s also one of the dark points. SQLite rarely
    accepts contributions. It is maybe warranted by the fear of sacrificing quality. The source is open
    but the project is not open to participation. There is no contribution guide to begin with. It’s a
    model that works, we agree. It challenges the popular concept of Open Source being open to all
    and everything under the sun, even poor quality contributions, just for the sake of being open.
    How libSQL masterfully navigates the challenges of
    forking

    View Slide

  10. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 6/41
    libSQL began with strong footholds. First and foremost, the team has prior experience with the
    Linux kernel, being on the contributing as well as maintaining side. They also enrolled top
    ScyllaDB contributors with members having 500 to 1000+s commits. ScyllaDB is the production
    db which Discord sings it’s praise in the trillion messages handling post.
    The people who forked libSQL come primarily from a company called ChiselStrike Turso. But,
    libSQL is an independent project, accepting outside contributions. To this date, libSQL has seen
    many outside contributions.
    To test out libSQL’s worth, libSQL is used by Turso’s edge service. This brings in feedback about
    the state of the fork in production. There is at least one serious adopter of libSQL.

    View Slide

  11. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 7/41
    Turso brings SQLite to the edge. This is a fine testimony to the feats that libSQL is capable of.
    People ranted a lot when libSQL was announced. The project was deemed as a startup fad with
    no serious work and a fork doomed to fail. Some 09 months later, it’s functional with validated
    efficiency and pleasant community interaction experience.

    View Slide

  12. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 8/41
    It also periodically merges in the advancements on main SQLite.
    To understand the brillance of libSQL, we need to understand how SQLite works first. The SQL
    code is passed through the library which in turn deals with the files. It’s as illustrated below.
    How SQLite works

    View Slide

  13. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 9/41

    View Slide

  14. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 10/41
    This big picture of things is enough to conceptually understand SQLite. We’ll zoom in on
    specific aspects as we focus on libSQL features.
    When we talk of libSQL, we are talking of 2 main projects:
    The fork of SQLite
    The server version setup on top of the fork, called sqld
    The features of libSQL

    View Slide

  15. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 11/41
    A wire protocol is just a way to talk with the server.
    The client takes the query and uses the driver to send it to the server.
    Wire protocols

    View Slide

  16. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 12/41
    Once the connection is establish then exchanges take place between the client and server.
    libSQL supports some wire protocols including Postgres and http.

    View Slide

  17. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 13/41
    libSQL allows you to communicate with it’s server via tools implementing the Postgres wire
    protocol.
    This works perfectly.
    Using the feature involves spinning up sqld and using psql (A terminal-based client for postgres)
    to connect.
    Psql does not detect it’s not really Postgres at all!
    Postgres wire protocol
    $ sqld -d foo.db -p 127.0.0.1:5432 --http-listen-addr=127.0.0.1:8000
    $ psql -q postgres://127.0.0.1
    appinv=>

    View Slide

  18. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 14/41
    However, many tools relying on the protocol also look for internal postgres table. So, this option
    is limited in use.
    Hrana (from Czech "hrana", which means "edge") is a libSQL protocol to connect to sqld via web
    sockets.
    Tcp connections are banned from many edge runtimes like cloudflare workers, but not web
    sockets. Hrna is a simple protocol, requiring fewer rountrips than postgres wire.
    Servers traditionally connect to a server.
    The Hrana protocol

    View Slide

  19. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 15/41
    The sqld using Hrana connects through a stream.
    Traditional servers have connection pools.

    View Slide

  20. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 16/41
    sqld’s stream supports multiple sql streams. Only once connection is needed to support
    concurrent requests.
    Though intended to be used with sqld, Hrana can also be used with SQLite via a proxy. Hrana
    protocol defines arbitrary strings with optional parameters, making it easily adaptable to SQL
    dialects.
    Hrana operates on top of the web socket protocol as a sub-protocol. It is stated in the Sec-
    WebSocket-Protocol.

    View Slide

  21. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 17/41
    The HTTP protocol allows us to stream Hrana specs where there is little or no web-socket
    support. Requests and reponses might look like this.
    The HTTP protocol

    View Slide

  22. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 18/41
    sqld groups requests together using baton values in streams. A baton value is a value sent by the
    server which should be sent back by the client.

    View Slide

  23. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 19/41

    View Slide

  24. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 20/41
    This makes it very convenient to communicate with Hrnana with curl or code simple clients.
    Here is a json example.
    There are two modes (read this) by which SQLite ensures data is not corrupted and can be
    recovered in case of a power loss. These are the Rollback and WAL modes. They are mutually
    $ curl -s -d "{\"statements\": [\"SELECT * from databases;\"] }" \
    http://127.0.0.1:8000
    [[{"name":"libsql"}]]
    Virtual WAL

    View Slide

  25. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 21/41
    exclusive. The WAL mode is more powerful but, for compatibility reasons, the Rollback mode
    has not been removed. The WAL mode allows reading and single writer writing at the same time.
    To understand what a virtual WAL means, it’s good to understand how the Virtual File System
    (VFS) or OS layer works. Since SQLite is available on many operating systems, writing files to the
    system is done by a call to the VFS layer. Windows and Unix VFS is included by default.
    It is also possible to author our own VFS if we want to support another operating system for
    example.

    View Slide

  26. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 22/41
    Sometimes we might want to do additional tasks with an established VFS. In this case we wrap
    the VFS with our VFS along with our code and register our own VFS. Such a VFS is called a
    shim.

    View Slide

  27. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 23/41
    Virtualizing the WAL operations lets us have our own backends, just like we do with VFS. We
    can specify where to write our WAL files and what to do when these files are being written. Let’s
    say we want to write them to AWS, we can. libSQL provides some virtual WAL methods
    libSQL (sqld) offers a way to backup WAL files to S3 for backup.
    The steps for bottomless WAL is as follows:
    1. libSQL writes WAL frames
    2. These frames are asynchronously replicated to S3 in the background, in batches
    3. When a CHECKPOINT operation occurs in libSQL (checkpoint is an op which compacts
    WAL frames to drop ones that are out-of-date), we also upload a compressed snapshot of the
    main database file to S3
    4. The snapshot files can be managed through the bottomless-cli
    libsql_wal_methods_find
    libsql_wal_methods_register
    libsql_wal_methods_unregister
    Bottomless WAL

    View Slide

  28. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 24/41
    To understand how it works, we must first understand what consists of a WAL file.
    A WAL file can have 0 or more frames. A frame consists of a header and a page.
    After 1000 frames or 10s or as configured, the frames are gziped and sent to s3.

    View Slide

  29. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 25/41
    A checkpoint operation is one where the contents of the WAL file is applied to the database.
    After a checkpoint operation, a database snapshot is sent to s3.

    View Slide

  30. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 26/41
    The s3 is organized jnto namespaces called generations. A generation is the period between
    checkpoint operations. The most recent generation is at the top.

    View Slide

  31. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 27/41
    To do a backup, we specify a snapshot and pull the frames.

    View Slide

  32. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 28/41

    View Slide

  33. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 29/41
    To apply the transactions, libSQL makes use of a transaction cache to verify transactions.

    View Slide

  34. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 30/41
    Now, when applying the transactions, if it is too big to fit in memory, it is flushed to a temporary
    file.

    View Slide

  35. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 31/41

    View Slide

  36. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 32/41
    libSQL has 2 counters which when they tally, a checkpoint operation occurs. libSQL has WAL
    method xframes to increment counter if transaction is committed.
    libSQL xcheckpoint to delay checkpoint until a task is completed.
    A virtual table is an object which does not mirror a table on disk.
    libSQL exposes a callback xPreparedSql that allows a virtual table implementation to receive
    the SQL string submitted by the application for execution.
    SQLite allows you to define your own function using low-level C api.
    Virtual table callback
    User-defined WASM functions

    View Slide

  37. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 33/41
    libSQL in addition allows you to define your functions in WASM.
    We can write and compile our functions using any language we want to WASM. Then we copy
    paste it in our source code.
    You can try it live.
    libSQL can also be replicated to the edge. Bringing data to the edge currently setting up a
    primary database and replicas.
    CREATE FUNCTION doGood LANGUAGE wasm AS ''
    SELECT doGood(col) ... -- use here
    Replication: Bringing data to the edge

    View Slide

  38. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 34/41
    Replicas can be server-based or non-server-based called embedded replicas.

    View Slide

  39. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 35/41
    Data requested is read from the closest replica.

    View Slide

  40. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 36/41
    Once a write occurs, the replica sends data to the primary which replicates the data.

    View Slide

  41. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 37/41
    This occurs as follows:
    1. The client sends the data to the replica
    2. The replica sends the data to the primary
    3. The primary replicates the data everywhere

    View Slide

  42. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 38/41
    libSQL operates on a pull-based mechanism. The replica routinely asks for new frames.
    Replication mechanism

    View Slide

  43. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 39/41
    Replicas also keep log files.

    View Slide

  44. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 40/41
    The log files are routinely compacted.

    View Slide

  45. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon
    https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e 41/41
    © 2023 Abdur-Rahmaan Janhangeer ∙ Privacy ∙ Terms ∙ Collection notice
    Substack is the home for great writing

    View Slide

  46. [email protected]
    @osdotsystem
    github.com/abdur-rahmaanj
    compileralchemy.com
    Thank you

    View Slide