Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building scalable monitoring infrastructure from scratch

Arseny Reutov
December 02, 2023
2.2k

Building scalable monitoring infrastructure from scratch

In this talk we will share our experience of creating a transaction monitoring solution for the EVM-compatible networks. Starting from a standalone Rust application that queries the blockchain RPCs, and ending with a scalable solution that can handle thousands of transactions per second, we will cover all the steps that will explain how to catch the DeFi exploits before they happen. The technology stack is based on Apache Flink, a popular framework to perform stateful computations on streaming data. We believe it hasn't found yet widespread usage in the blockchain security, while it has solid capabilities to process the transactions, logs, traces and all the available on-chain information in real-time. We will also share a set of detection rules that can be used to spot potential exploits, as well as the techniques to prevent attacks on the DeFi protocols. And of course we will share our experience of running this solution in production for the last months, starting from findings and ending with the operation costs and lessons learned.

Arseny Reutov

December 02, 2023
Tweet

Transcript

  1. [0x0] Letʼs catch smart contract exploits [0x1] Architecture [0x2] Approaches

    to implementation [0x3] Exploit detection techniques [0x4] Results Agenda
  2. Letʼs catch smart contract exploits Goal detect DeFi exploits Requirements

    declarative and concise rules with unit testing and great maintainability Approach monitor activity of addresses with anonymous funding and look for specific patterns in call traces
  3. ETL - Extract, Transform, Load ethereum-etl • can push data

    to message queues • disassembles bytecode with evm-dasm • written in Python cryo • relatively new project • extracts storage and balance diffs • written in Rust
  4. Data enrichment So we forked ethereum-etl • added bulk extraction

    with eth_blockReceipts • to calculate balance changes integrated Defillama API • added support for Geth style traces in streaming mode • integrated gigahorse-toolchain instead of evm-dasm
  5. Bytecode analysis gigahorse-toolchain • powers Dedaubʼs bytecode decompiler • written

    in Datalog • allows to create custom rules (yay) heimdall • pretty new but powerful tool to analyze bytecode • written in Rust • generates Solidity code
  6. Exploit bytecode analysis • Replaced evm-dasm with gigahorse-toolchain in ethereum-etl

    • Created custom analysis rules in Datalog • Detect exploits based on specific bytecode features
  7. Exploit features • Not many functions (usually < 10) •

    High rate of unknown selectors • Presence of flashloan selectors • Lots of external calls • No emitted events • Debugging symbols (e.g. console.log) • SELFDESTRUCT • CREATE2
  8. Stream analysis Apache Kafka • Supported by ethereum-etl • Straightforward

    and reliable Apache Flink • Enables real-time stream processing • Performs stateful computations expressed in SQL • Supports Complex Event Processing (CEP)
  9. Data processing workflow 1) Describe sources using DDL: transactions, contracts,

    logs, token_transfers, etc 2) Define data sinks for alerts 3) Execute continuous SQL queries
  10. 13

  11. Source DDL example CREATE TABLE {{ NETWORK }}_logs ( log_index

    int, transaction_hash string, transaction_index int, address varchar, data string, topics array<string>, block_timestamp int, block_number int, block_hash varchar, proc_time as PROCTIME(), event_time as TO_TIMESTAMP_LTZ(block_timestamp, 0), WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND ) WITH ( 'connector' = 'kafka', 'topic' = '{{ NETWORK }}.logs', 'properties.bootstrap.servers' = '{{ KAFKA_BROKER }}', 'properties.group.id' = 'defimon', 'scan.startup.mode' = 'latest-offset', 'format' = 'json' )
  12. What can we do now? • Join streams, e.g. traces

    with logs • Aggregate data using tumble, hop and session windows • Continuously calculate top N • Detect sequences of events (CEP) • and more • Track withdrawals from Tornado, Fixed Float, Railgun, etc • Track new contract deployments by these addresses • Track any intermediate addresses in between • Calculate total transfer sums within a transaction or some period of time • Match specific patterns in call traces
  13. Exploit detection example: selfdestruct after proxy upgraded SELECT '{{ NETWORK

    }}' AS network, 'CRITICAL' AS severity, 'selfdestruct_after_upgraded' AS attack_type, /* ... */ FROM {{ NETWORK }}_logs AS l JOIN {{ NETWORK }}_traces AS t ON t.transaction_hash = l.transaction_hash /* keccak('Upgraded(address)') */ WHERE l.topics[1] = '0xbc7cd75a20ee27fd9adebab32041f755214dbc6bffa90cc0225b39da2e5c2d3b' AND t.trace_type = 'suicide' AND l.address = t.from_address AND l.event_time = t.event_time
  14. Exploit detection example: reentrancy MATCH_RECOGNIZE ( PARTITION BY transaction_hash PATTERN

    (ERC_CALLBACK ANY_CALL*? REENTER_CALL ANY_CALL_AGAIN*? ERC_CALLBACK_AGAIN) WITHIN INTERVAL '2' SECOND DEFINE /* 1) ERC callback call */ ERC_CALLBACK AS call_type = 'call' AND ( SUBSTRING(input FROM 1 FOR 10) = '0x150b7a02' OR /* ERC721: onERC721Received */ SUBSTRING(input FROM 1 FOR 10) = '0xf23a6e61' OR /* ERC1155: onERC1155Received */ SUBSTRING(input FROM 1 FOR 10) = '0xb124c41b' OR /* ERC677: callAfterTransfer */ SUBSTRING(input FROM 1 FOR 10) = '0x0023de29' /* ERC777: tokensReceived */ ), /* 2) any contract calls in between */ ANY_CALL AS POSITION(ERC_CALLBACK.trace_address IN ANY_CALL.trace_address) = 1, /* 3) exploit contract re-enters from ERC callback */ REENTER_CALL AS call_type = 'call' AND REENTER_CALL.to_address = ERC_CALLBACK.from_address AND POSITION(ERC_CALLBACK.trace_address IN REENTER_CALL.trace_address) = 1, /* 4) any contract calls in between */ ANY_CALL_AGAIN AS POSITION(REENTER_CALL.trace_address IN ANY_CALL_AGAIN.trace_address) = 1, /* 5) vulnerable contract calls ERC callback again */ ERC_CALLBACK_AGAIN AS call_type = 'call' AND SUBSTRING(input FROM 1 FOR 10) = SUBSTRING(ERC_CALLBACK.input FROM 1 FOR 10) AND POSITION(REENTER_CALL.trace_address IN ERC_CALLBACK_AGAIN.trace_address) = 1 AND ERC_CALLBACK_AGAIN.to_address = ERC_CALLBACK.to_address )
  15. 18

  16. Results • Scalable pipeline deployed in AWS • Real-time alerts

    in dashboard and Telegram • Extracting and analyzing all transactions from Ethereum and BSC networks • Detected >50 exploits since May including Curve 0-day