Adventures in the land of Language Servers

Jakub Kozłowski | Lambda Days | June 5, 2023 Adventures
in the land of Language Servers

Who's this talk for? If you... • Want to build
tooling for your language • Want to understand more about your language tooling, or • Just want to learn      This talk is for you

Storytime - meet Jane. 👩💻

Jane is tasked with building a language... • A small
DSL meant to be used in an existing application • The application is JVM-based (Scala) • the compiler also needs to work on the JVM • 2-3 weeks later, the compiler is done and embedded in the application • Jane's life is good! • until...

😳 Jane starts getting users And they want editor support

Editor support Users want a smooth editing experience • Analysis
(error highlighting, go-to-de fi nition, ...) • Refactoring (rename, remove unused, ...) • Completions

Editor support Options? • Writing a special editor ❌ •
Users would have to learn a new editor • A lot of frontend work for a small language • Integrating with an existing editor ✅ • Which one(s)?

Integrating with existing editors Users are asking for IntelliJ •
IntelliJ is Java-based ✅ • Can easily be integrated with Jane's compiler • Still requires learning IntelliJ's extension API... • Some time later, the extension is up and running • Jane's life is good • until...

😭 More users And this time they want VS Code

VS Code support • VS Code is Node.js-based ❌ •
interop with JVM-based tooling will be di ffi cult • Another extension API to learn 😩

Node.js <-> JVM interop Options? • Rewrite the compiler •
Transpile the compiler • Shell out ✅

Jane decides to shell out 🐚 Client-server architecture • The
editor extension (language frontend) will start a server in a separate process • A compilation server (language backend) will run on the JVM ☕ • The editor extension will send RPCs to the server ☎ • When the editor is closed, the extension will kill the server process 💀

How it'll all work • The editor extension (language frontend)
will: • Listen to user events 👂 (hover, clicks, keyboard shortcuts...) • Ask the server for information 🙋 (completions at position...) • Provide parameters 🧱 (cursor position, fi le contents...) • Apply actions and present results 🏋 (complete statement, go to de fi nition...) Client responsibilities

How it'll all work • The language backend (compilation server)
will: • Await requests from the extension ⏳ • Use parameters to analyze the fi les and derive information 🧐 • Respond to the client 🗣 Server responsibilities

Jane needs an API for the communication • Jane wants
to call it "Language Server Protocol", but a quick Google search tells her it's taken... • She decides to call it "Language Backend Protocol" ✅ • It'll use HTTP over local TCP ✅ trait LanguageBackend { def definition( cursor: Position, file: Path ): Location def rename( position: Position, file: Path, newName: String ): List[TextEdit] //... }

Jane implements the changes • Implementing language backend • Implementing
VS Code extension (client) • Adapting the IntelliJ extension (in-memory client) • Jane's life is good • until...

🤔 Jane gets a DM Meet Trish

Trish has a similar issue • Her team is maintaining
their own language • They also got asked to support multiple editors • Trish is thinking of doing the same (a language backend) • She's asking for guidance • Jane gets an idea 💡

The frontend knows (almost) nothing ...about the language • It
knows what fi le extension the language uses • It knows how to launch the server, so... • The editors can talk to any language backend • Trish decides to use Jane's Language Backend Protocol ✅

There's one problem... Trish's language needs different features • Trish
needs support for "call hierarchy" • Jane can't implement that - the language has no functions (no hierarchy) • The problem can be generalized • Server features will become optional ✅ • Servers will describe their features to the client ✅

Server capabilities A means for a server to describe what
it can do • A new method in the protocol: initialize • Called by the client after server launches • The server responds with a list of supported methods • Unsupported features aren't advertised by the client ✅ • Servers and clients can add features at their own pace ✅ • Only features that make sense have to be implemented ✅

Jane and Trish get to work... • Jane's server declares
its capabilities • Trish's new methods are added to the protocol • Trish implements a language backend • The extensions are updated to hide unsupported features • The extensions can be con fi gured to launch di ff erent backends • Jane's life is good • until...

😱 User complaint Some actions are visibly slow...

Users can't see any feedback In the middle of an
action • Some actions are inherently slow ⏳ • Sometimes you just need to tell the user to wait a little more, or ask them for con fi rmation • Only the client can initiate calls ↪ • The protocol doesn't give the server the ability to respond with intermediate results ↩

Jane decides to add feedback The protocol needs changing •
The backend must be able to call the client • HTTP is not enough • Jane decides to use JSON-RPC • Both sides can run requests on a single channel 🔄 (duplex) • Can use std I/O, TCP, WebSockets, ...

A new API is created Requests to the client will
use a new part of the protocol trait LanguageFrontend { def showMessage( tpe: "ERROR" | "WARNING" | "INFO", text: String ): IO[Unit] //... }

Updates to the initialize request • Not every client can
handle features like noti fi cations • Clients will advertise their capabilities in the initialize request • Servers will adjust behavior based on client capabilities trait LanguageBackend { def initialize( clientCapabilities: ClientCapabilities ): ServerCapabilities //... }

Jane gets to work • The editor extensions are updated
to advertise their capabilities • The server and extensions add support for the server -> client requests • Language backends can now provide estimations, noti fi cations, logs ✅ • Jane's life is good • until...

💀 User complaint The editors are often breaking...

Users can't do anything without saving the file Jane immediately
knows the cause of the issue • The protocol identi fi es fi les by their disk path • Editors don't save fi les on each keystroke (disk I/O is slow) • The fi les aren't always up to date with the editor state ❌ • The protocol needs to account for unsaved fi les 💾 trait LanguageBackend { def definition( cursor: Position, file: Path ): Location def rename( position: Position, file: Path, newName: String ): List[TextEdit] //... }

How to support unsaved files? • Passing fi le text
in every request? • Asking the client for the latest text on-demand? • wasteful, non-incremental • What if there are many unsaved fi les? (e.g. no autosave) • Jane has an idea 💡

Syncing text New methods in the protocol • New methods:
onChanged/onSaved/onClosed • When the fi le changes, editor extension sends updates • These can be patches (if the server is capable) or entire fi les • The server will keep these in memory • When the fi le is closed/saved, the extension informs the server • The server can delete these from memory

Bonus points Caching / pre-computing • The server is told
about all fi le changes, so it can cache results of analysis • A change in one fi le could trigger diagnostics in other fi les • Diagnostics, completions etc. can be computed on fi le change, and before they're requested

Jane and Trish get to work... • Servers will listen
to events and use the in-memory fi le caches if available • Editor extensions will send fi le changes • Jane's life is good • until...

🥰 Giving back Why keep this to ourselves?

Jane wants to open source the protocol • The idea
is generic enough to handle many more languages • The community might integrate more editors • More features could be included in the editors

Jane gets to work • Travels in time back to
~2015 • Joins Microsoft • Works on the Language Server Protocol • Jane's life is good!

Jane will return (Not really, this is the end)

🔄 LSP - a summary

No LSP: M * N integrations Source: https://code.visualstudio.com/api/language-extensions/language-server-extension-guide

Yes LSP: M + N integrations Source: https://code.visualstudio.com/api/language-extensions/language-server-extension-guide

Language Server Protocol • A common speci fi cation for
language features in editors and tools • Supported by most modern editors • JSON-RPC 2.0 (with headers) for communication • Bi-directional information exchange • Capability mechanism

Example: go to definition https://github.com/kubukoz/badlang

Go to definition in LSP

🤺 Challenges I've faced

🔎 Parsing

Parsing Making text structured

Parsing • Necessary to do virtually anything • Must keep
track of locations/ranges in the input

Parsing, more honest Ranges included

Parsing test harness Derive tests from directory structure

Parsing Graceful failure • Features should work even when parsing
doesn't succeed • This is actually really hard

Tree-sitter • Parser generator tool • Fast • Incremental •
Graceful error handling • Bindings for WASM, Node, Rust, ∞ https://tree-sitter.github.io/tree-sitter

More ideas for graceful parsing "we want parsing to always
succeed at producing some kind of structured result. The result can contain error nodes inside it, but the error nodes don't have to replace the entire result"    "(...) every string matches a rule, by adding rules for erroneous inputs" https://duriansoftware.com/joe/constructing-human-grade-parsers

🖨 Formatting

Formatting • Many ways to do this • Main problem
(for me): keeping code comments • Keep them while parsing, or • Don't fully parse (work on token stream)

Formatting Tests - you're gonna need these https://github.com/kubukoz/smithy-playground (FormattingTests.scala)

🔬 Testing

Testing The pyramid still applies

Unit testing Pure logic - no LSP in sight •
Fine-grained • Data-in, data-out, no state, no fi les • Follow generally accepted best practices

Component testing e.g. testing an entire diagnostic request handler •
Input: e.g. string, position • Output: LSP-like* • Still not actual LSP

Integration testing Testing with LSP and a workspace • Doesn't
have to involve JSON-RPC • No need to run in a separate process • De fi nitely want to use real fi les, state (if the server has any)

Server End to end testing Covering everything the server is
doing • Run the actual server process • Send requests with an LSP client

End to end testing Includes an editor • E ff
ectively, tests the extension • Launch editor, execute its commands, assert on result • 1 test per feature*editor, if you're paranoid • 1 test per editor if you're not

🐞 Debugging

Debugging Hints • Reproduce and minimize • For the fastest
feedback loop, write a test • Use your server runtime's debugging tools • Check out Langoustine Tracer • ...println debugging

Debugging

📚 Library choice

Library choice • Standard implementations from M$ are in Node.js
• Lsp4j for Java, libs for Haskell, Rust, ... • Until recently, no good library for Scala

LSP in Scala Langoustine • Cross-platform, Scala- fi rst API,
functional, no re fl ection, no mutability, auto- cancellation) • Easy setup with LangoustineApp • Still in the early stages Complete example of an LSP server ->

Resources • Slides/contact: linktr.ee/kubukoz • My YouTube: yt.kubukoz.com • Example
server: github.com/kubukoz/badlang • LSP website: microsoft.github.io/language-server-protocol • Langoustine: github.com/neandertech/langoustine • LSP history (old): github.com/microsoft/language-server-protocol/wiki/ Protocol-History

Adventures in the land of Language Servers

Adventures in the land of Language Servers

More Decks by Jakub Kozłowski

Other Decks in Programming

Featured

Transcript