Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How a Hedge Fund uses MongoDB

How a Hedge Fund uses MongoDB

Covers: MongoDB for message passing, BSON logging, Real-time Monitoring

Roman Shtylman

June 13, 2011
Tweet

More Decks by Roman Shtylman

Other Decks in Programming

Transcript

  1. Agenda • About Athena Capital Research • 3 uses of

    MongoDB at Athena ◦ Dropcopy ◦ BSON Logging ◦ Realtime Monitoring • Wrap-Up • Questions
  2. Athena Capital Research • Strong focus on technical talent and

    technology ◦ 90% of employees come from engineering, math, or hard science backgrounds • Quantitative investment manager ◦ math • Automated trading ◦ robots • C++ ◦ speed • Open source stack ◦ freedom
  3. MongoDB at Athena • Lots of unstructured data • Many

    sources of data • Want to be able to query quickly • Not everything goes into a database • Avoid creating schema after schema
  4. Dropcopy • Third parties require near-real-time reporting of trading activity

    ◦ Accounting ◦ Risk management ◦ Compliance • Exchanges provide a "drop-copy" ◦ FIX protocol • Scrub the messages and forward to said third party ◦ MongoDB for message passing
  5. FIX Protocol • Financial Information eXchange • Key/value based ASCII

    ◦ Header + body + trailer ◦ Key is numeric (maps to some "standard" name) ◦ Value is string • Good fit for MongoDB ◦ Key / value ◦ Flexible document sizes ◦ easier to query than SQL alternatives
  6. Architecture • We have incoming FIX session (drop copy) •

    Need to have outgoing FIX session MongoDB acts as the glue (message passing layer) 1. Incoming drop copy -> FIX log file 2. fix2json 3. MongoDB 4. Tail cursor 5. Client
  7. Drop side • C++ client application for the drop copy

    connection ◦ Known system and can be kept database free ◦ QuickFix • fix2json ◦ Tail reading of output FIX log files ◦ Easy to represent fix as json and subsequently bson ◦ Keep db inserts independent of FIX connection • Downsides of combining ◦ Re-population ◦ Data will not be resent
  8. MongoDB setup • Capped collection ◦ Natural index • Data

    is purged daily using a simple MongoDB shell script • Important to keep tabs on the data size if your data requirements change often ◦ Mitigated intraday if you are constantly reading ◦ Critical if you want full replay • Easy to reconcile with Drop FIX logs
  9. Outgoing side • C++ FIX application ◦ QuickFix • Tail

    cursor ◦ Handling restarts • Select only required fields • Filter and alter any field before sending • Outgoing message log in FIX • Easily handle different clients
  10. Benefits • Full copy of incoming data for querying ◦

    Aggregation queries • Easy replay ◦ Client disconnects • Easy verification
  11. BSON Logging • Event logging ◦ Independent of std::cout •

    Relevant for tracking down problems and keeping records • Logging time is "wasted" time • Previous logging solution was slow ◦ XML based ◦ String conversions • XML is easy to read after logging
  12. BSON Benefits • Binary with loose document format ◦ Defined

    by the app during logging • Internal data format for MongoDB ◦ mongorestore • Exists sequentially in flat files • Easily rendered as json • Numbers: ◦ original XML implementation: 1k ops/s ◦ improved XLM implementation: 3k ops/s ◦ first pass BSON implementation: ~20k ops/s ◦ current BSON implementation: ~30k ops/s
  13. BSON Gotchas • BSON timestamp type is int64_t milliseconds •

    BSON not a standalone library ◦ Highly coupled to MongoDB c++ driver • Like MongoDB, schema-less ◦ Just something to remember if creating post-processing tools
  14. Realtime Monitoring • Log entries are similar to one another

    ◦ Some can have extra fields • Each machine contains independent logs ◦ Each log could be a different format ◦ Daemon to read and insert into MongoDD ▪ Central location, no hunting when problems happen • Real-time monitoring and alerting ◦ Human intervention required • Web based tools to "tail" view log entries ◦ WebSockets
  15. Wrap-Up • "Realtime" is relative ◦ Benchmark to meet your

    needs • Disjoint pieces can be less prone to failure • Other MongoDB uses ◦ Contribute to LuaMongo driver ◦ BSON code contributions ◦ Bugfixes