Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Giovanni Lanzani - Tickling not too thick ticks

Giovanni Lanzani - Tickling not too thick ticks

The talk was given at PyData Amsterdam 2018.

It is a story of combining asyncio and websockets to create a bot to find arbitrage opportunities between crypto exchanges.

pydata

May 27, 2018
Tweet

More Decks by pydata

Other Decks in Technology

Transcript

  1. Tickling not too thick ticks! A story of bitcoins, asyncio,

    and websockets Giovanni Lanzani @gglanzani
  2. Who is Giovanni — Born and raised in Italy —

    He claims to be doctor Theoretical Physics — Surprisingly Leiden University backs his claim — Nobody else believes it — Past: Powerpoint wizard at KPMG — Present: Senior Shoe Designer at GoDataDriven
  3. This talk is about a — Side project — Fun

    with new things in Python and tech land — Something you can try at home if you don't dabble with real money — Although real money was made with this tool
  4. Python 3.4 introduced async.io — New module for writing single-threaded

    concurrent code using coroutines — It supports pluggable event loops and ships with one (we will use it) — Since Python 3.5 we gained the async/await keywords to work with asyncio
  5. What the heck is an event loop — Imagine giving

    a list of tasks to a (single) friend — It processes them sequentially — But each task can say: do A, wait a minute, then do B — Our friend will do A, and then immediately start with the next task in the queue — At the end of the minute, B will be picked up
  6. Illusion of parallelism — Only one thing at a time

    is handled by our friend — But for us, we have the impression things happened on parallel — Because our friend did other things while we waited — Our friend is a Python program using asyncio
  7. This is stupid — Why did we say to wait

    a minute? — I want it done now! — Well, usually we wait on I/O — Disk, network, other systems (such as a database)
  8. import asyncio import datetime now = datetime.datetime.now async def display_date():

    end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) await asyncio.sleep(1) loop = asyncio.get_event_loop() # Blocking call which returns when the display_date() coroutine is done loop.run_until_complete(display_date()) loop.close()
  9. — This is even more useless — I could do

    the same thing just by doing import datetime import time now = datetime.datetime.now def display_date(): end_time = now() + datetime.timedelta(seconds=5.0) while now() > end_time: print(now()) time.sleep(1) display_date()
  10. Well, it's useful when you add more async tasks! ...

    async def hey(): intercalations = ['hey', 'let me speak', 'wait a second', 'oh, f* it'] for intercalation in intercalations: print(intercalation) await asyncio.sleep(1.5) ... # I ask the event loop to run two tasks concurrently! loop.run_until_complete(asyncio.gather(display_date(), hey()))
  11. $ python pydata_scratch.py hey 2018-05-11 11:33:28.424171 2018-05-11 11:33:29.428733 let me

    speak 2018-05-11 11:33:30.430536 wait a second 2018-05-11 11:33:31.432197 2018-05-11 11:33:32.436205 oh, f* it 2018-05-11 11:33:33.440074
  12. Enough async for now! Bitcoin now — Ok, I won't

    offend anyone — No Bitcoin/crypto explanation here — But let's do something fun — Create a bot that collects data from Bitcoin exchanges; — And signals when an arbitrage opens up.
  13. We can do market-neutral trading — Short-selling for the higher

    price — Long-buying for the lower price — You don't have a risk on BTC fluctuations — You don't need to transfer funds between exchanges — You do need a buffer in each exchange!!! — More about this later — Or: https://github.com/butor/blackbird/issues/100
  14. There is a number of Bitcoin exchanges we can use

    — Kraken — Bitstamp — GDAX — Bitonic — … — …… — ………
  15. (At least) two of them offer websockets — The WebSocket

    protocol enables interaction between a client and a server with lower overheads, facilitating two-ways real-time transfers — This is much better than polling an API continuosly — By polling an API, you can lose trades — Using websockets, the exchange notifies the client — (Websockets are also used in tech as Jupyter.)
  16. Websockets and asyncio are a match made in heaven Instead

    of writing: result = make_call() # get the data await asyncio.sleep(n) # wait before calling again we can do result = await websocket.recv() While the server doesn't send anything, asyncio can let the program do other things!
  17. In Python there's a nice library for websockets — It'called

    ( ! ) websockets; — https://websockets.readthedocs.io/en/stable/
  18. import asyncio import websockets async def hello(address, port): # No

    http(s) but ws(s) async with websockets.connect(f'ws://{address}:{port}') as websocket: # here we assume we can just start listening! greeting = await websocket.recv() print("< {}".format(greeting)) asyncio.get_event_loop().run_until_complete(hello())
  19. Ok, tell me how to connect to an exchange —

    Find the docs! — Read the docs! — Bitonic: — Address is wss://api.bl3p.eu/ — Then /<version>/<market>/<channel> — For example wss://api.bl3p.eu/1/BTCEUR/trades
  20. async def get_bitonic_async(*, insert_function): bitonic_address = "wss://api.bl3p.eu/1/BTCEUR/trades" async with websockets.connect(bitonic_address)

    as websocket: while True: message_str = await websocket.recv() message = json.loads(message_str) response = create_bitonic_response(response=message) await insert_function(response)
  21. GDAX is a bit more complex — Full docs at

    https://docs.gdax.com — They first require a subscription message subscribe = json.dumps({ "type": "subscribe", "product_ids": [ "BTC-EUR" ], "channels": [ { "name": "ticker", "product_ids": [ "BTC-EUR" ] } ] })
  22. We can send the subscription async def get_gdax_async(*, insert_function): gdax_ws_address

    = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) ...
  23. After the subscription we can listen to tickers async def

    get_gdax_async(*, insert_function): gdax_ws_address = "wss://ws-feed.gdax.com" async with websockets.connect(gdax_ws_address) as websocket: await websocket.send(subscribe) while True: message_str = await asyncio.wait_for(websocket.recv(), WAIT_TIMEOUT) message = json.loads(message_str) # the first message is not a ticker, but we'll ignore it for now response = create_gdax_response(response=message, pair="BTC-EUR") await insert_function(response)
  24. Ok, where do we store this data? I already gave

    it away: Postgres! pydata_async_insert.py emailer.py (Out of scope) GDAX Postgres … Bitonic
  25. Async/sync? — Since we're in async land, inserting into the

    database should also happen asynchronously — (Okay, could happen asynchronously) — My favorite Postgres client is synchronous (psycopg2) — Enter asyncpg!
  26. asyncpg — Developed mostly by Yury Selivanov (Python core dev)

    and Elvis Pranskevichus (EdgeDB); — https://github.com/MagicStack/asyncpg — Basic example (from the website): async def run(): conn = await asyncpg.connect(user='user', password='password', database='database', host='127.0.0.1') values = await conn.fetch('''SELECT * FROM mytable''') await conn.close() loop = asyncio.get_event_loop() loop.run_until_complete(run())
  27. In our case we don't have to write much async

    def insert_ticker(tick: NamedTuple, *, pool: Pool, table: str) -> None: fields = tick._fields # this is definitely uglier than psycopg2 placeholders = ['${}'.format(i) for i, _ in enumerate(fields, 1)] query = 'INSERT INTO {} ({}) VALUES ({})'.format( table, ', '.join(fields), ', '.join(placeholders)) async with pool.acquire() as connection: # yes, this is a thing! async with connection.transaction(): await connection.execute(query, *tick)
  28. Is it done already? — One extra detail: we're handling

    time series — Postgres can handle it just fine, but can we do better?
  29. Timescale: Time series Postgres! — Open source extension on top

    of Postgres — Already used in production at a number of companies — Since it's an extension to Postgres, you don't need to change anything (almost) — It enables easier querying of data too (easier dropping as well, higher performance, etc.)
  30. Timescale goes a bit beyond the talk — But say

    you have a table where ts is a timestamp with timezone — Then you can enable Timescale magic with this SELECT create_hypertable('ticker', 'ts', chunk_time_interval => interval '1 day'); — Timescale will create an hypertable, where querying is optimized intra day — Not the end of the world to span 2 days, but change the interval if you span weeks
  31. More Timescale goodness — They keep data ordered on disk

    even when the incoming data is not ordered (if you're querying multiple exchanges this is important); — They do it efficiently because of partitioning
  32. Get latest values per exchange SELECT LAST(ask_price, ts) ask, #

    last by timestamp, new! LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ask | bid | ts | exchange ---------+---------+----------------------------+---------- | 7145.65 | 2018-05-11 13:46:29+00 | Bitonic 7166.01 | 7166.00 | 2018-05-11 13:46:52.005+00 | GDAX (2 rows)
  33. We can now already write all the logic of the

    mailer! WITH latest_spreads AS ( SELECT LAST(ask_price, ts) ask, LAST(bid_price, ts) bid, LAST(ts, ts) ts, exchange FROM ticker WHERE ts > now() - INTERVAL '1 minute' GROUP BY exchange ) ...
  34. ... SELECT (sell_to.bid - buy_from.ask) AS spread, ROUND((sell_to.bid - buy_from.ask)

    / buy_from.ask * 100, 2) AS ask_pct, buy_from.ask, sell_to.exchange sell_to_exchange, buy_from.exchange buy_from_exchange FROM latest_spreads sell_to CROSS JOIN latest_spreads buy_from WHERE (sell_to.bid - buy_from.ask > (sell_to.bid * 0.0025 + buy_from.ask * 0.0025)) AND (buy_from.ts - sell_to.ts BETWEEN INTERVAL '-5 seconds' AND INTERVAL '5 seconds')
  35. What a gigantic query, is the thing fast? — 4ms

    (data for a day fits in RAM, and that's all you care for most queries!) — (The network delay are at least two orders of magnitude slower!) — Query on a (hyper)table with 15M rows, 4GB — Smallest and crappiest VM Google offers with a single CPU
  36. I thought you were cool, no Docker? # This is

    to build the cryptage image # docker build -t cryptage . FROM python:3.6.2 ENV PYTHONUNBUFFERED 1 RUN mkdir /code ADD . /code/ WORKDIR /code RUN pip install .
  37. # docker-compose.yaml version: '3' services: db: image: timescale/timescaledb volumes: -

    ./data/timescaledb:/var/lib/postgresql/data ports: - "5432:5432" command: postgres bitonic: image: cryptrage:latest command: python3 /code/async_insert.py bitonic volumes: - .:/code depends_on: - db env_file: - timescale.env gdax: image: cryptrage:latest command: python3 /code/async_insert.py gdax volumes: - .:/code depends_on: - db env_file: - timescale.env
  38. Final disclaimer and open points — Do not use this

    on production — https://github.com/butor/blackbird is much more mature! — They still have a tick thick disclaimer! — Do not overestimate the amount of money that can be made — You will destroy the order book (and profitability) if you just go 100 BTC short/long
  39. Is it possible to go 100 BTC short/long? — Yes,

    because we don't own them — We own the risk — But the more we short, the lower the price will get — The more we "long", the higher the price — You need to "tickle"
  40. So buying a lot, reduces the margins to 0 —

    Assuming you have a price different between exchanges of 1pct — 0.5pct usually goes into transaction costs — 0.5pct is your margin — Buying 100 increases the buy price by 0.8pct — Selling 100 decreases the sell price by 0.5pct — You're losing money if you do this!
  41. Visually Selling here will drive the red line below the

    blue Buying here will drive the blue line abow the red
  42. Thanks — Questions — We're hiring data and machine learning

    engineers — Data scientists as well — [email protected] — @gglanzani