Upgrade to Pro — share decks privately, control downloads, hide ads and more …

libSQL: Taking Sqlite To The Moon

libSQL: Taking Sqlite To The Moon

SQLite is everywhere. The original implementation's stability and compatibility guarantees are hard to beat. It's one of the reasons why forks have come and gone. This session explores a new fork that consistently delivered fantastic works.

libSQL is one of the most fabulous forks ever of SQLite. It allows for features like user-defined WASM functions, a server mode with PostgreSQL wire protocol compatibility, virtual WAL and bottomless storage. This session dives into the modifications and features of libSQL and the new ways SQLite can be useful.

Prior knowledge of SQLite usage and internals is appreciated though not required. A hands-on demo of libSQL features will be provided. The session will also touch on the subject of how this fork is a different take by analyzing the motivation, team profile, and previous track record.

Abdur-Rahmaan Janhangeer

July 21, 2023
Tweet

More Decks by Abdur-Rahmaan Janhangeer

Other Decks in Programming

Transcript

  1. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    1/41 libSQL: Taking SQLite to the Moon Preparing SQLite for the upcoming decades ABDUR-RAHMAAN JANHANGEER JUL 21, 2023 Share Few pieces of software enjoy a success as wild as SQLite. It’s now shipped by default on Mac OS, Windows, Android, iOS and many flavors of Linux. It’s used by countless apps in a wide range of industries including aviation, military and space. It’s unsurprisingly the most widely used and deployed database. Not only that, it’s also in the top 5 most widely used and deployed software library across all languages at any point in human history. This is not possible without strong engineering ethics. Being used in several critical fields where human lives could be at risk means bending to the stringent rules of the industry in question. Aviation for example had relatively insane rules for shipping softwares even before the era of TDD. And SQLite, despite being run by volunteers incorporated those standards early on. The project on top of this provides long-time support and a guarantee of backward compatibility. The massive adoption of SQLite by the biggest players in the tech field is a testimony to the value of the project, but, it’s also an extraordinary burden placed on the developers not to bring in mediocrity.
  2. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    2/41 A fork of SQLite, if it can dare be forked, is pure madness if they intend to be serious about it. They would need to provide the same guarantees as SQLite. Stable, reliable, and dependable software means the introduction of careful, slow-moving changes. Innovation might sometimes be a radical change of components to unleash more powers, which in turn spells out instability, a huge put-off. The team behind the fork must be competent. The SQLite team with decades of experience, codebase intimacy and field specialization already pushes improvements to the limit. To be able to propose alternatives, you must have a good notion of databases to craft an even better product. And wisdom. If a fork of a car is a plane, there are two drawbacks: car users will most probably abandon it as it’s no longer a car. And, since it’s a plane, it will be compared to planes, it must be up to the level and satisfy plane users. The leadership team must be able to find a purpose to the project, fuel motivation for a long time to come and make sure great people remain around even if nobody adopts the project, a hard challenge to answer. And indeed, many projects do not fork SQLite. They add components over SQLite and package it as an improved project. Forks of SQLite exist. They come and crumble. But, sometimes ago an unusual fork appeared: libSQL. This post discusses libSQL and it’s jaw-dropping team, the technical challenges they overcame, the adoption strategy they put in place, the production validation scheme they concocted, the ongoing development, the coma of future roadmap checkpoints, how features resonate more than fully with a cloud-dominated industry and why it’s the grandest, noblest and most soundly engineered forks of sqlite of all time.
  3. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    4/41 To understand libSQL, we must first understand SQLite. The inexorable pervasion of SQLite calls for a look at the maintainers, vision, codebase and coding standards. We’ll also have a brief overview of the internals. The SQLite devs currently aim to support the project through 2050. Whatever happens, they commit to support the project. They also have the source code automatically replicated in several locations around the world. The project has extensive testing both code-wise and end-user-wise. Many Open Source projects have so many users that they don’t worry about unit tests. SQLite introduced aviation-grade tests to ensure high-quality software. Throughout the many interactions with big companies, the deployment at massive scale contributed to solve bugs that the testing suite could not ever catch. Thus, SQLite enjoys top notch excellence. You can read more about the evolution from a hobby project to where it is today. They also pledge not to give in to new fashion trends just for the sake of it, labeling under the “old school“ point. Every machine-code branch instruction is tested in both directions. Multiple times. On multiple platforms and with multiple compilers. This helps make the code robust for future migrations. The intense testing also means that new developers can make experimental enhancements to SQLite and, assuming legacy tests all pass, be reasonably sure that the enhancement does not break legacy. The codebase is a pleasure to read. It is well-commented. The project is also well documented. Well-commented means that you can sit down and read the source code like a book. This project is the personification of how good Open Source projects should be. It is a robust piece, but it’s one we can recommend any beginner to start with. Even people who don’t know C can browse and follow along. It’s a long read for sure but very enjoyable.
  4. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    5/41 The SQLite source code is over 35% comment. Not boiler-plate comments, but useful comments that explain the meaning of variables and objects and the intent of methods and procedures. The code is designed to be accessible to new programmers and maintainable over a span of decades. SQLite devs and particularly the author has a solid background in programming. He worked at bell labs and did his Phd in compiler craft. SQLite was designed with a bytecode engine from the start. The author is not afraid to spin up his own implementations. He is dissatisfied with Git and similar CVS systems. He wrote his own, named Fossil, used by the project. He likes to build his own stuffs. It trickles down to the database engine, B-tree layer, parser (he is allergic to YACC, BISON) and even the editor he writes SQLite in. This shows his incredible mastery over a wide range of topics as well as a bold spirit. Generally people avoid this. But, for SQLite, it turned out to be a predictable advantage in terms of stability as they don’t have to worry about 3rd party codes surprise breaking changes and being license-worry-free. As expected, SQLite has a consortium to preserve and keep the project ongoing. It has a team of 3 developers. Yes, not a crowd-full driven project. It’s also one of the dark points. SQLite rarely accepts contributions. It is maybe warranted by the fear of sacrificing quality. The source is open but the project is not open to participation. There is no contribution guide to begin with. It’s a model that works, we agree. It challenges the popular concept of Open Source being open to all and everything under the sun, even poor quality contributions, just for the sake of being open. How libSQL masterfully navigates the challenges of forking
  5. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    6/41 libSQL began with strong footholds. First and foremost, the team has prior experience with the Linux kernel, being on the contributing as well as maintaining side. They also enrolled top ScyllaDB contributors with members having 500 to 1000+s commits. ScyllaDB is the production db which Discord sings it’s praise in the trillion messages handling post. The people who forked libSQL come primarily from a company called ChiselStrike Turso. But, libSQL is an independent project, accepting outside contributions. To this date, libSQL has seen many outside contributions. To test out libSQL’s worth, libSQL is used by Turso’s edge service. This brings in feedback about the state of the fork in production. There is at least one serious adopter of libSQL.
  6. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    7/41 Turso brings SQLite to the edge. This is a fine testimony to the feats that libSQL is capable of. People ranted a lot when libSQL was announced. The project was deemed as a startup fad with no serious work and a fork doomed to fail. Some 09 months later, it’s functional with validated efficiency and pleasant community interaction experience.
  7. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    8/41 It also periodically merges in the advancements on main SQLite. To understand the brillance of libSQL, we need to understand how SQLite works first. The SQL code is passed through the library which in turn deals with the files. It’s as illustrated below. How SQLite works
  8. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    10/41 This big picture of things is enough to conceptually understand SQLite. We’ll zoom in on specific aspects as we focus on libSQL features. When we talk of libSQL, we are talking of 2 main projects: The fork of SQLite The server version setup on top of the fork, called sqld The features of libSQL
  9. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    11/41 A wire protocol is just a way to talk with the server. The client takes the query and uses the driver to send it to the server. Wire protocols
  10. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    12/41 Once the connection is establish then exchanges take place between the client and server. libSQL supports some wire protocols including Postgres and http.
  11. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    13/41 libSQL allows you to communicate with it’s server via tools implementing the Postgres wire protocol. This works perfectly. Using the feature involves spinning up sqld and using psql (A terminal-based client for postgres) to connect. Psql does not detect it’s not really Postgres at all! Postgres wire protocol $ sqld -d foo.db -p 127.0.0.1:5432 --http-listen-addr=127.0.0.1:8000 $ psql -q postgres://127.0.0.1 appinv=>
  12. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    14/41 However, many tools relying on the protocol also look for internal postgres table. So, this option is limited in use. Hrana (from Czech "hrana", which means "edge") is a libSQL protocol to connect to sqld via web sockets. Tcp connections are banned from many edge runtimes like cloudflare workers, but not web sockets. Hrna is a simple protocol, requiring fewer rountrips than postgres wire. Servers traditionally connect to a server. The Hrana protocol
  13. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    15/41 The sqld using Hrana connects through a stream. Traditional servers have connection pools.
  14. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    16/41 sqld’s stream supports multiple sql streams. Only once connection is needed to support concurrent requests. Though intended to be used with sqld, Hrana can also be used with SQLite via a proxy. Hrana protocol defines arbitrary strings with optional parameters, making it easily adaptable to SQL dialects. Hrana operates on top of the web socket protocol as a sub-protocol. It is stated in the Sec- WebSocket-Protocol.
  15. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    17/41 The HTTP protocol allows us to stream Hrana specs where there is little or no web-socket support. Requests and reponses might look like this. The HTTP protocol
  16. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    18/41 sqld groups requests together using baton values in streams. A baton value is a value sent by the server which should be sent back by the client.
  17. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    20/41 This makes it very convenient to communicate with Hrnana with curl or code simple clients. Here is a json example. There are two modes (read this) by which SQLite ensures data is not corrupted and can be recovered in case of a power loss. These are the Rollback and WAL modes. They are mutually $ curl -s -d "{\"statements\": [\"SELECT * from databases;\"] }" \ http://127.0.0.1:8000 [[{"name":"libsql"}]] Virtual WAL
  18. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    21/41 exclusive. The WAL mode is more powerful but, for compatibility reasons, the Rollback mode has not been removed. The WAL mode allows reading and single writer writing at the same time. To understand what a virtual WAL means, it’s good to understand how the Virtual File System (VFS) or OS layer works. Since SQLite is available on many operating systems, writing files to the system is done by a call to the VFS layer. Windows and Unix VFS is included by default. It is also possible to author our own VFS if we want to support another operating system for example.
  19. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    22/41 Sometimes we might want to do additional tasks with an established VFS. In this case we wrap the VFS with our VFS along with our code and register our own VFS. Such a VFS is called a shim.
  20. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    23/41 Virtualizing the WAL operations lets us have our own backends, just like we do with VFS. We can specify where to write our WAL files and what to do when these files are being written. Let’s say we want to write them to AWS, we can. libSQL provides some virtual WAL methods libSQL (sqld) offers a way to backup WAL files to S3 for backup. The steps for bottomless WAL is as follows: 1. libSQL writes WAL frames 2. These frames are asynchronously replicated to S3 in the background, in batches 3. When a CHECKPOINT operation occurs in libSQL (checkpoint is an op which compacts WAL frames to drop ones that are out-of-date), we also upload a compressed snapshot of the main database file to S3 4. The snapshot files can be managed through the bottomless-cli libsql_wal_methods_find libsql_wal_methods_register libsql_wal_methods_unregister Bottomless WAL
  21. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    24/41 To understand how it works, we must first understand what consists of a WAL file. A WAL file can have 0 or more frames. A frame consists of a header and a page. After 1000 frames or 10s or as configured, the frames are gziped and sent to s3.
  22. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    25/41 A checkpoint operation is one where the contents of the WAL file is applied to the database. After a checkpoint operation, a database snapshot is sent to s3.
  23. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    26/41 The s3 is organized jnto namespaces called generations. A generation is the period between checkpoint operations. The most recent generation is at the top.
  24. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    29/41 To apply the transactions, libSQL makes use of a transaction cache to verify transactions.
  25. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    30/41 Now, when applying the transactions, if it is too big to fit in memory, it is flushed to a temporary file.
  26. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    32/41 libSQL has 2 counters which when they tally, a checkpoint operation occurs. libSQL has WAL method xframes to increment counter if transaction is committed. libSQL xcheckpoint to delay checkpoint until a task is completed. A virtual table is an object which does not mirror a table on disk. libSQL exposes a callback xPreparedSql that allows a virtual table implementation to receive the SQL string submitted by the application for execution. SQLite allows you to define your own function using low-level C api. Virtual table callback User-defined WASM functions
  27. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    33/41 libSQL in addition allows you to define your functions in WASM. We can write and compile our functions using any language we want to WASM. Then we copy paste it in our source code. You can try it live. libSQL can also be replicated to the edge. Bringing data to the edge currently setting up a primary database and replicas. CREATE FUNCTION doGood LANGUAGE wasm AS '<paste WASM bytecodes here>' SELECT doGood(col) ... -- use here Replication: Bringing data to the edge
  28. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    36/41 Once a write occurs, the replica sends data to the primary which replicates the data.
  29. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    37/41 This occurs as follows: 1. The client sends the data to the replica 2. The replica sends the data to the primary 3. The primary replicates the data everywhere
  30. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    38/41 libSQL operates on a pull-based mechanism. The replica routinely asks for new frames. Replication mechanism
  31. 7/21/23, 8:04 AM libSQL: Taking SQLite to the Moon https://compileralchemy.substack.com/p/f5ee57d9-3127-45d5-88c8-4d55034d3c1e

    41/41 © 2023 Abdur-Rahmaan Janhangeer ∙ Privacy ∙ Terms ∙ Collection notice Substack is the home for great writing