Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building responsive Symbology & Suggest web service | Andrei Palchys, Alex Kosau

Building responsive Symbology & Suggest web service | Andrei Palchys, Alex Kosau

Andrei Palchys, Alex Kosau
Meetup #2

More Decks by Minsk MongoDB User Group

Other Decks in Technology

Transcript

  1. Building  responsive   Symbology  &  Suggest  web   service  

    with  MongoDB   Andrei  Palchys,   Alex  Kosau  
  2. IntroducAon   •  Customer:  Thomson  Reuters   •  Business  domain:

     Financial  markets   •  Goal:  Implement    Next-­‐Gen  financial  web   services   •  The  project  started:  July  2011   •  Team:  1  team  lead,  5+1  developers,  2  QA  
  3. Architecture   Sources   ETL   Search   Engine  

    Web   services   Front   End   Desktop   Sources   ETL   The  New  Web  Services   Desktop   Old   New  
  4. Reasons  to  write  the  new  web   services   • 

    Bad  performance   •  Expensive  for  scaling  or  extending   •  Not  easy  to  manage  some  type  of  data  
  5. Requirements  for  the  web  new   services   •  Performance

        95%  Symbology  requests  should  fit  in  50ms.   95%  Suggest  requests  should  fit  in  25ms.   •  Use  normalized  data   •  Use  less  memory  as  much  as  possible   •  Fast  data  loading  into  DB     •  Windows  environment  and  .Net  plaZorm  
  6. • Microso[  SQL  Server     •  13  ms,  too  slow

        • Oracle  TimesTen     •  RelaAonal   •  Completely  in-­‐memory:  guaranteed  latency  but  slow  startup   •  Expensive     • McObject’s  ExtremeDb   •  Object  DB   •  NaAve  C  interface:  designed  for  performance   •  Ultra  reliability   •  SAll  expensive     What  we  considered  from   commercial  databases  
  7. • Redis     • Hbase   • CouchDB   • RavenDB    

    All  these  databases  miss  one  of  the  requirements     What  we  considered  from  free   databases  
  8. MongoDB   •  Document-­‐oriented   •  Simple  use  (decent  interface

     for  .NET   available)   •  Simple  maintenance  (monitoring,  replicaAon,   sharding)   •  Data  is  stored  in-­‐memory  once  used.   •  1ms  average  response  Ame   •  Cross-­‐plaZorm  (naAve  Windows  support)    
  9. Web  services   •  Symbology  Web  Service   Provides  reference

     data  about  financial  instruments,  via   symbols,  codes  or  instrument  names   •  Suggest  Web  Service    
  10. Databases   •  Symbology  DB  –  about  30GB  of  data

      •  Suggest  DB  –  about  22  GB  of  data   Symbology  WS   Suggest  WS   Symbology  DB   Suggest  WS   Suggest  DB  
  11. • 6  clusters  all  around  the  world,  in  replica   set.

      • 2  of  them  are  also  used  to  load  data.   • 128GB  of  memory  per  server       Hardware setup (planned)  
  12. •  Fast  search  by  full  key   •  Minimize  the

     space  taken  by  the  data,  since   we  need  it  to  fit  into  RAM   •  Data  is  Text  only  (no  pictures  etc)   •  Full  document  required  always   •  Only  some  fields  are  used  to  query  data,  and  these  fields  are  short  (3..10  symbols)   •  New  fields  should  be  easily  added  to  the  “queryable”  list   •  Composite  queries  are  needed  someAmes   •  AB  and  CD  and  not  EF  or  GH   •  Fast  data  loading   Symbology  DB:  challenge  
  13. Map  the  names  of  the  document  fields  to  ints  

      RIC  -­‐>  1   Name  -­‐>  2     {      "1":    "GOOG.O",      "2":    "Google"   }   Symbology  DB:  soluAon  
  14. Unite  all  queryable  fields  into  arrays     •  Query

     syntax  is  the  same   •  Single  index  –  less  space  occupied   •  Easy  to  add  new  searchable  data       "queryablefields":[{                  "k":  1,    "v":  "MSFT.O"          },{                  "k":  2,    "v":  "Microsoft  Inc."          }   ]   Symbology  DB:  soluAon  
  15. Compress  not  queryable  data  and  store  as  a   single

     field  (binary  data)     •  Encode  with  Protocol  Buffers  or  MsgPack     –  In  our  case,  MsgPack  2x  faster  than  Protobuf   •  Zip  with  Snappy     –  Fastest  algorithm  in  the  world.   {            "b"  :   BinData(0,"CgcxMDkwMzcwEgZ1cztJQk0xAAAAAAAA8D86A 05ZU0IXTmV3IFl   vcmsgU3RvY2sgRXhjaGFuZ2VZAAAAAAAA8D9gAXABeAGJAQA AAAAAAPA/ogEFNDc0MU6qAQU0NzQxTrI…“)     }   Symbology  DB:  soluAon  
  16. Symbology  DB:  soluAon   Change  ETL  output  format  to  json

     and  insert   directly  to  MongoDB     It  helped  to  decrease  loading  Ame  from  9h  to   1h.  
  17. •  Fast  search  by  parAal  text   •  Keep  only

     top  50  enAAes  per  term   •  Generate  Suggest  DB  from  exisAng   Symbology  DB   Suggest  DB:  challenge  
  18. Use  “Inverted”  index  for  fast  search  by  parAal   text

          {“term”:  “g”,  “references”:[…]},   {“term”:  “go”,  “references”:[…]},   {“term”:  “goo”,  “references”:[…]},   {“term”:  “goog”,  “references”:[…]},       Suggest  DB:  soluAon  
  19. Generate  Suggest  DB  from  exisAng  Symbology   DB   • 

    About  750  mln  temporary  documents   •  MongoDB  Map  Reduce  is  too  slow   •  All  MongoDB  based  algorithms  takes  a  lot  of   Ame   Use  Amazon  ElasAc  MapReduce!   Suggest  DB:  soluAon  
  20. -­‐  Use  IBsonSerializer  interface  instead  of   BsonElement  akributes  

    -­‐  TBD   -­‐  TBD   -­‐  TBD   .Net  MongoDB  driver:  perfomance   Ap  &  tricks