Slide 1

Slide 1 text

Tickling not too thick ticks! A story of bitcoins, asyncio, and websockets Giovanni Lanzani @gglanzani

Slide 2

Slide 2 text

Who is Giovanni — Born and raised in Italy — He claims to be doctor Theoretical Physics — Surprisingly Leiden University backs his claim — Nobody else believes it — Past: Powerpoint wizard at KPMG — Present: Senior Shoe Designer at GoDataDriven

Slide 3

Slide 3 text

This talk is about a — Side project — Fun with new things in Python and tech land — Something you can try at home if you don't dabble with real money — Although real money was made with this tool

Slide 4

Slide 4 text

Python 3.4 introduced async.io — New module for writing single-threaded concurrent code using coroutines — It supports pluggable event loops and ships with one (we will use it) — Since Python 3.5 we gained the async/await keywords to work with asyncio

Slide 5

Slide 5 text

What the heck is an event loop — Imagine giving a list of tasks to a (single) friend — It processes them sequentially — But each task can say: do A, wait a minute, then do B — Our friend will do A, and then immediately start with the next task in the queue — At the end of the minute, B will be picked up

Slide 6

Slide 6 text

Illusion of parallelism — Only one thing at a time is handled by our friend — But for us, we have the impression things happened on parallel — Because our friend did other things while we waited — Our friend is a Python program using asyncio

Slide 7

Slide 7 text

This is stupid — Why did we say to wait a minute? — I want it done now! — Well, usually we wait on I/O — Disk, network, other systems (such as a database)

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

import asyncio import datetime now = datetime.datetime.now async def display_date(): end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) await asyncio.sleep(1) loop = asyncio.get_event_loop() # Blocking call which returns when the display_date() coroutine is done loop.run_until_complete(display_date()) loop.close()

Slide 10

Slide 10 text

$ python pydata_scratch.py 2018-05-11 11:32:13.469731 2018-05-11 11:32:14.470254 2018-05-11 11:32:15.473507 2018-05-11 11:32:16.478435 2018-05-11 11:32:17.479408 2018-05-11 11:32:18.483605

Slide 11

Slide 11 text

— This is even more useless — I could do the same thing just by doing import datetime import time now = datetime.datetime.now def display_date(): end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) time.sleep(1) display_date()

Slide 12

Slide 12 text

Well, it's useful when you add more async tasks! ... async def hey(): intercalations = ['hey', 'let me speak', 'wait a second', 'oh, f* it'] for intercalation in intercalations: print(intercalation) await asyncio.sleep(1.5) ... # I ask the event loop to run two tasks concurrently! loop.run_until_complete(asyncio.gather(display_date(), hey()))

Slide 13

Slide 13 text

$ python pydata_scratch.py hey 2018-05-11 11:33:28.424171 2018-05-11 11:33:29.428733 let me speak 2018-05-11 11:33:30.430536 wait a second 2018-05-11 11:33:31.432197 2018-05-11 11:33:32.436205 oh, f* it 2018-05-11 11:33:33.440074

Slide 14

Slide 14 text

Enough async for now! Bitcoin now — Ok, I won't offend anyone — No Bitcoin/crypto explanation here — But let's do something fun — Create a bot that collects data from Bitcoin exchanges; — And signals when an arbitrage opens up.

Slide 15

Slide 15 text

We can do market-neutral trading — Short-selling for the higher price — Long-buying for the lower price — You don't have a risk on BTC fluctuations — You don't need to transfer funds between exchanges — You do need a buffer in each exchange!!! — More about this later — Or: https://github.com/butor/blackbird/issues/100

Slide 16

Slide 16 text

There is a number of Bitcoin exchanges we can use — Kraken — Bitstamp — GDAX — Bitonic — … — …… — ………

Slide 17

Slide 17 text

(At least) two of them offer websockets — The WebSocket protocol enables interaction between a client and a server with lower overheads, facilitating two-ways real-time transfers — This is much better than polling an API continuosly — By polling an API, you can lose trades — Using websockets, the exchange notifies the client — (Websockets are also used in tech as Jupyter.)

Slide 18

Slide 18 text

We could architecture our solution like this pydata_async_insert.py emailer.py (Out of scope) GDAX Postgres … Bitonic

Slide 19

Slide 19 text

Websockets and asyncio are a match made in heaven Instead of writing: result = make_call() # get the data await asyncio.sleep(n) # wait before calling again we can do result = await websocket.recv() While the server doesn't send anything, asyncio can let the program do other things!

Slide 20

Slide 20 text

In Python there's a nice library for websockets — It'called ( ! ) websockets; — https://websockets.readthedocs.io/en/stable/

Slide 21

Slide 21 text

import asyncio import websockets async def hello(address, port): # No http(s) but ws(s) async with websockets.connect(f'ws://{address}:{port}') as websocket: # here we assume we can just start listening! greeting = await websocket.recv() print("< {}".format(greeting)) asyncio.get_event_loop().run_until_complete(hello())

Slide 22

Slide 22 text

Ok, tell me how to connect to an exchange — Find the docs! — Read the docs! — Bitonic: — Address is wss://api.bl3p.eu/ — Then /// — For example wss://api.bl3p.eu/1/BTCEUR/trades

Slide 23

Slide 23 text

async def get_bitonic_async(*, insert_function): bitonic_address = "wss://api.bl3p.eu/1/BTCEUR/trades" async with websockets.connect(bitonic_address) as websocket: while True: message_str = await websocket.recv() message = json.loads(message_str) response = create_bitonic_response(response=message) await insert_function(response)

Slide 24

Slide 24 text

GDAX is a bit more complex — Full docs at https://docs.gdax.com — They first require a subscription message subscribe = json.dumps({ "type": "subscribe", "product_ids": [ "BTC-EUR" ], "channels": [ { "name": "ticker", "product_ids": [ "BTC-EUR" ] } ] })

Slide 25

Slide 25 text

We can send the subscription async def get_gdax_async(*, insert_function): gdax_ws_address = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) ...

Slide 26

Slide 26 text

After the subscription we can listen to tickers async def get_gdax_async(*, insert_function): gdax_ws_address = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) while True: message_str = await asyncio.wait_for(websocket.recv(), WAIT_TIMEOUT) message = json.loads(message_str) # the first message is not a ticker, but we'll ignore it for now response = create_gdax_response(response=message, pair="BTC-EUR") await insert_function(response)

Slide 27

Slide 27 text

Ok, where do we store this data? I already gave it away: Postgres! pydata_async_insert.py emailer.py (Out of scope) GDAX Postgres … Bitonic

Slide 28

Slide 28 text

Async/sync? — Since we're in async land, inserting into the database should also happen asynchronously — (Okay, could happen asynchronously) — My favorite Postgres client is synchronous (psycopg2) — Enter asyncpg!

Slide 29

Slide 29 text

asyncpg — Developed mostly by Yury Selivanov (Python core dev) and Elvis Pranskevichus (EdgeDB); — https://github.com/MagicStack/asyncpg — Basic example (from the website): async def run(): conn = await asyncpg.connect(user='user', password='password', database='database', host='127.0.0.1') values = await conn.fetch('''SELECT * FROM mytable''') await conn.close() loop = asyncio.get_event_loop() loop.run_until_complete(run())

Slide 30

Slide 30 text

In our case we don't have to write much async def insert_ticker(tick: NamedTuple, *, pool: Pool, table: str) -> None: fields = tick._fields # this is definitely uglier than psycopg2 placeholders = ['${}'.format(i) for i, _ in enumerate(fields, 1)] query = 'INSERT INTO {} ({}) VALUES ({})'.format( table, ', '.join(fields), ', '.join(placeholders)) async with pool.acquire() as connection: # yes, this is a thing! async with connection.transaction(): await connection.execute(query, *tick)

Slide 31

Slide 31 text

Is it done already? — One extra detail: we're handling time series — Postgres can handle it just fine, but can we do better?

Slide 32

Slide 32 text

Timescale: Time series Postgres! — Open source extension on top of Postgres — Already used in production at a number of companies — Since it's an extension to Postgres, you don't need to change anything (almost) — It enables easier querying of data too (easier dropping as well, higher performance, etc.)

Slide 33

Slide 33 text

Timescale goes a bit beyond the talk — But say you have a table where ts is a timestamp with timezone — Then you can enable Timescale magic with this SELECT create_hypertable('ticker', 'ts', chunk_time_interval => interval '1 day'); — Timescale will create an hypertable, where querying is optimized intra day — Not the end of the world to span 2 days, but change the interval if you span weeks

Slide 34

Slide 34 text

More Timescale goodness — They keep data ordered on disk even when the incoming data is not ordered (if you're querying multiple exchanges this is important); — They do it efficiently because of partitioning

Slide 35

Slide 35 text

They have a cool logo!

Slide 36

Slide 36 text

Get latest values per exchange SELECT LAST(ask_price, ts) ask, # last by timestamp, new! LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ask | bid | ts | exchange ---------+---------+----------------------------+---------- | 7145.65 | 2018-05-11 13:46:29+00 | Bitonic 7166.01 | 7166.00 | 2018-05-11 13:46:52.005+00 | GDAX (2 rows)

Slide 37

Slide 37 text

We can now already write all the logic of the mailer! WITH latest_spreads AS ( SELECT LAST(ask_price, ts) ask, LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ) ...

Slide 38

Slide 38 text

... SELECT (sell_to.bid - buy_from.ask) AS spread, ROUND((sell_to.bid - buy_from.ask) / buy_from.ask * 100, 2) AS ask_pct, buy_from.ask, sell_to.exchange sell_to_exchange, buy_from.exchange buy_from_exchange FROM latest_spreads sell_to CROSS JOIN latest_spreads buy_from WHERE (sell_to.bid - buy_from.ask > (sell_to.bid * 0.0025 + buy_from.ask * 0.0025)) AND (buy_from.ts - sell_to.ts BETWEEN INTERVAL '-5 seconds' AND INTERVAL '5 seconds')

Slide 39

Slide 39 text

What a gigantic query, is the thing fast? — 4ms (data for a day fits in RAM, and that's all you care for most queries!) — (The network delay are at least two orders of magnitude slower!) — Query on a (hyper)table with 15M rows, 4GB — Smallest and crappiest VM Google offers with a single CPU

Slide 40

Slide 40 text

Show us the real code!

Slide 41

Slide 41 text

I thought you were cool, no Docker? # This is to build the cryptage image # docker build -t cryptage . FROM python:3.6.2 ENV PYTHONUNBUFFERED 1 RUN mkdir /code ADD . /code/ WORKDIR /code RUN pip install .

Slide 42

Slide 42 text

# docker-compose.yaml version: '3' services: db: image: timescale/timescaledb volumes: - ./data/timescaledb:/var/lib/postgresql/data ports: - "5432:5432" command: postgres bitonic: image: cryptrage:latest command: python3 /code/async_insert.py bitonic volumes: - .:/code depends_on: - db env_file: - timescale.env gdax: image: cryptrage:latest command: python3 /code/async_insert.py gdax volumes: - .:/code depends_on: - db env_file: - timescale.env

Slide 43

Slide 43 text

Demo

Slide 44

Slide 44 text

Final disclaimer and open points — Do not use this on production — https://github.com/butor/blackbird is much more mature! — They still have a tick thick disclaimer! — Do not overestimate the amount of money that can be made — You will destroy the order book (and profitability) if you just go 100 BTC short/long

Slide 45

Slide 45 text

Is it possible to go 100 BTC short/long? — Yes, because we don't own them — We own the risk — But the more we short, the lower the price will get — The more we "long", the higher the price — You need to "tickle"

Slide 46

Slide 46 text

This is the order book (buy) — The more you buy, the higher the price

Slide 47

Slide 47 text

This is the order book (sell) — The more you sell, the lower the price

Slide 48

Slide 48 text

So buying a lot, reduces the margins to 0 — Assuming you have a price different between exchanges of 1pct — 0.5pct usually goes into transaction costs — 0.5pct is your margin — Buying 100 increases the buy price by 0.8pct — Selling 100 decreases the sell price by 0.5pct — You're losing money if you do this!

Slide 49

Slide 49 text

Visually Selling here will drive the red line below the blue Buying here will drive the blue line abow the red

Slide 50

Slide 50 text

Thanks — Questions — We're hiring data and machine learning engineers — Data scientists as well — [email protected] — @gglanzani