Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Giovanni Lanzani - Tickling not too thick ticks

Avatar for pydata pydata
May 27, 2018

Giovanni Lanzani - Tickling not too thick ticks

The talk was given at PyData Amsterdam 2018.

It is a story of combining asyncio and websockets to create a bot to find arbitrage opportunities between crypto exchanges.

Avatar for pydata

pydata

May 27, 2018
Tweet

More Decks by pydata

Other Decks in Technology

Transcript

  1. Tickling not too thick ticks! A story of bitcoins, asyncio,

    and websockets Giovanni Lanzani @gglanzani
  2. Who is Giovanni — Born and raised in Italy —

    He claims to be doctor Theoretical Physics — Surprisingly Leiden University backs his claim — Nobody else believes it — Past: Powerpoint wizard at KPMG — Present: Senior Shoe Designer at GoDataDriven
  3. This talk is about a — Side project — Fun

    with new things in Python and tech land — Something you can try at home if you don't dabble with real money — Although real money was made with this tool
  4. Python 3.4 introduced async.io — New module for writing single-threaded

    concurrent code using coroutines — It supports pluggable event loops and ships with one (we will use it) — Since Python 3.5 we gained the async/await keywords to work with asyncio
  5. What the heck is an event loop — Imagine giving

    a list of tasks to a (single) friend — It processes them sequentially — But each task can say: do A, wait a minute, then do B — Our friend will do A, and then immediately start with the next task in the queue — At the end of the minute, B will be picked up
  6. Illusion of parallelism — Only one thing at a time

    is handled by our friend — But for us, we have the impression things happened on parallel — Because our friend did other things while we waited — Our friend is a Python program using asyncio
  7. This is stupid — Why did we say to wait

    a minute? — I want it done now! — Well, usually we wait on I/O — Disk, network, other systems (such as a database)
  8. import asyncio import datetime now = datetime.datetime.now async def display_date():

    end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) await asyncio.sleep(1) loop = asyncio.get_event_loop() # Blocking call which returns when the display_date() coroutine is done loop.run_until_complete(display_date()) loop.close()
  9. — This is even more useless — I could do

    the same thing just by doing import datetime import time now = datetime.datetime.now def display_date(): end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) time.sleep(1) display_date()
  10. Well, it's useful when you add more async tasks! ...

    async def hey(): intercalations = ['hey', 'let me speak', 'wait a second', 'oh, f* it'] for intercalation in intercalations: print(intercalation) await asyncio.sleep(1.5) ... # I ask the event loop to run two tasks concurrently! loop.run_until_complete(asyncio.gather(display_date(), hey()))
  11. $ python pydata_scratch.py hey 2018-05-11 11:33:28.424171 2018-05-11 11:33:29.428733 let me

    speak 2018-05-11 11:33:30.430536 wait a second 2018-05-11 11:33:31.432197 2018-05-11 11:33:32.436205 oh, f* it 2018-05-11 11:33:33.440074
  12. Enough async for now! Bitcoin now — Ok, I won't

    offend anyone — No Bitcoin/crypto explanation here — But let's do something fun — Create a bot that collects data from Bitcoin exchanges; — And signals when an arbitrage opens up.
  13. We can do market-neutral trading — Short-selling for the higher

    price — Long-buying for the lower price — You don't have a risk on BTC fluctuations — You don't need to transfer funds between exchanges — You do need a buffer in each exchange!!! — More about this later — Or: https://github.com/butor/blackbird/issues/100
  14. There is a number of Bitcoin exchanges we can use

    — Kraken — Bitstamp — GDAX — Bitonic — … — …… — ………
  15. (At least) two of them offer websockets — The WebSocket

    protocol enables interaction between a client and a server with lower overheads, facilitating two-ways real-time transfers — This is much better than polling an API continuosly — By polling an API, you can lose trades — Using websockets, the exchange notifies the client — (Websockets are also used in tech as Jupyter.)
  16. Websockets and asyncio are a match made in heaven Instead

    of writing: result = make_call() # get the data await asyncio.sleep(n) # wait before calling again we can do result = await websocket.recv() While the server doesn't send anything, asyncio can let the program do other things!
  17. In Python there's a nice library for websockets — It'called

    ( ! ) websockets; — https://websockets.readthedocs.io/en/stable/
  18. import asyncio import websockets async def hello(address, port): # No

    http(s) but ws(s) async with websockets.connect(f'ws://{address}:{port}') as websocket: # here we assume we can just start listening! greeting = await websocket.recv() print("< {}".format(greeting)) asyncio.get_event_loop().run_until_complete(hello())
  19. Ok, tell me how to connect to an exchange —

    Find the docs! — Read the docs! — Bitonic: — Address is wss://api.bl3p.eu/ — Then /<version>/<market>/<channel> — For example wss://api.bl3p.eu/1/BTCEUR/trades
  20. async def get_bitonic_async(*, insert_function): bitonic_address = "wss://api.bl3p.eu/1/BTCEUR/trades" async with websockets.connect(bitonic_address)

    as websocket: while True: message_str = await websocket.recv() message = json.loads(message_str) response = create_bitonic_response(response=message) await insert_function(response)
  21. GDAX is a bit more complex — Full docs at

    https://docs.gdax.com — They first require a subscription message subscribe = json.dumps({ "type": "subscribe", "product_ids": [ "BTC-EUR" ], "channels": [ { "name": "ticker", "product_ids": [ "BTC-EUR" ] } ] })
  22. We can send the subscription async def get_gdax_async(*, insert_function): gdax_ws_address

    = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) ...
  23. After the subscription we can listen to tickers async def

    get_gdax_async(*, insert_function): gdax_ws_address = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) while True: message_str = await asyncio.wait_for(websocket.recv(), WAIT_TIMEOUT) message = json.loads(message_str) # the first message is not a ticker, but we'll ignore it for now response = create_gdax_response(response=message, pair="BTC-EUR") await insert_function(response)
  24. Ok, where do we store this data? I already gave

    it away: Postgres! pydata_async_insert.py emailer.py (Out of scope) GDAX Postgres … Bitonic
  25. Async/sync? — Since we're in async land, inserting into the

    database should also happen asynchronously — (Okay, could happen asynchronously) — My favorite Postgres client is synchronous (psycopg2) — Enter asyncpg!
  26. asyncpg — Developed mostly by Yury Selivanov (Python core dev)

    and Elvis Pranskevichus (EdgeDB); — https://github.com/MagicStack/asyncpg — Basic example (from the website): async def run(): conn = await asyncpg.connect(user='user', password='password', database='database', host='127.0.0.1') values = await conn.fetch('''SELECT * FROM mytable''') await conn.close() loop = asyncio.get_event_loop() loop.run_until_complete(run())
  27. In our case we don't have to write much async

    def insert_ticker(tick: NamedTuple, *, pool: Pool, table: str) -> None: fields = tick._fields # this is definitely uglier than psycopg2 placeholders = ['${}'.format(i) for i, _ in enumerate(fields, 1)] query = 'INSERT INTO {} ({}) VALUES ({})'.format( table, ', '.join(fields), ', '.join(placeholders)) async with pool.acquire() as connection: # yes, this is a thing! async with connection.transaction(): await connection.execute(query, *tick)
  28. Is it done already? — One extra detail: we're handling

    time series — Postgres can handle it just fine, but can we do better?
  29. Timescale: Time series Postgres! — Open source extension on top

    of Postgres — Already used in production at a number of companies — Since it's an extension to Postgres, you don't need to change anything (almost) — It enables easier querying of data too (easier dropping as well, higher performance, etc.)
  30. Timescale goes a bit beyond the talk — But say

    you have a table where ts is a timestamp with timezone — Then you can enable Timescale magic with this SELECT create_hypertable('ticker', 'ts', chunk_time_interval => interval '1 day'); — Timescale will create an hypertable, where querying is optimized intra day — Not the end of the world to span 2 days, but change the interval if you span weeks
  31. More Timescale goodness — They keep data ordered on disk

    even when the incoming data is not ordered (if you're querying multiple exchanges this is important); — They do it efficiently because of partitioning
  32. Get latest values per exchange SELECT LAST(ask_price, ts) ask, #

    last by timestamp, new! LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ask | bid | ts | exchange ---------+---------+----------------------------+---------- | 7145.65 | 2018-05-11 13:46:29+00 | Bitonic 7166.01 | 7166.00 | 2018-05-11 13:46:52.005+00 | GDAX (2 rows)
  33. We can now already write all the logic of the

    mailer! WITH latest_spreads AS ( SELECT LAST(ask_price, ts) ask, LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ) ...
  34. ... SELECT (sell_to.bid - buy_from.ask) AS spread, ROUND((sell_to.bid - buy_from.ask)

    / buy_from.ask * 100, 2) AS ask_pct, buy_from.ask, sell_to.exchange sell_to_exchange, buy_from.exchange buy_from_exchange FROM latest_spreads sell_to CROSS JOIN latest_spreads buy_from WHERE (sell_to.bid - buy_from.ask > (sell_to.bid * 0.0025 + buy_from.ask * 0.0025)) AND (buy_from.ts - sell_to.ts BETWEEN INTERVAL '-5 seconds' AND INTERVAL '5 seconds')
  35. What a gigantic query, is the thing fast? — 4ms

    (data for a day fits in RAM, and that's all you care for most queries!) — (The network delay are at least two orders of magnitude slower!) — Query on a (hyper)table with 15M rows, 4GB — Smallest and crappiest VM Google offers with a single CPU
  36. I thought you were cool, no Docker? # This is

    to build the cryptage image # docker build -t cryptage . FROM python:3.6.2 ENV PYTHONUNBUFFERED 1 RUN mkdir /code ADD . /code/ WORKDIR /code RUN pip install .
  37. # docker-compose.yaml version: '3' services: db: image: timescale/timescaledb volumes: -

    ./data/timescaledb:/var/lib/postgresql/data ports: - "5432:5432" command: postgres bitonic: image: cryptrage:latest command: python3 /code/async_insert.py bitonic volumes: - .:/code depends_on: - db env_file: - timescale.env gdax: image: cryptrage:latest command: python3 /code/async_insert.py gdax volumes: - .:/code depends_on: - db env_file: - timescale.env
  38. Final disclaimer and open points — Do not use this

    on production — https://github.com/butor/blackbird is much more mature! — They still have a tick thick disclaimer! — Do not overestimate the amount of money that can be made — You will destroy the order book (and profitability) if you just go 100 BTC short/long
  39. Is it possible to go 100 BTC short/long? — Yes,

    because we don't own them — We own the risk — But the more we short, the lower the price will get — The more we "long", the higher the price — You need to "tickle"
  40. So buying a lot, reduces the margins to 0 —

    Assuming you have a price different between exchanges of 1pct — 0.5pct usually goes into transaction costs — 0.5pct is your margin — Buying 100 increases the buy price by 0.8pct — Selling 100 decreases the sell price by 0.5pct — You're losing money if you do this!
  41. Visually Selling here will drive the red line below the

    blue Buying here will drive the blue line abow the red
  42. Thanks — Questions — We're hiring data and machine learning

    engineers — Data scientists as well — [email protected] — @gglanzani