Elasticsearch in Equity Finance

December 16, 2015

Ian Maclean | Goldman Sachs | Vice President, Technology | Tokyo | December 16, 2015


  1. 2015/12 1 1   Goldman Sachs Engineering   GS.com/Engineering Dec,

    Goldman  Sachs   Elas%c  in  Equity  Finance   Asia  Equi%es  Engineering   Ian  Maclean,  Vice  President   December  16,  2015  
    Elastic @ GS Grass Roots Usage of Open Source n Developer  driven  culture  that  promotes  the  use  of  open  source   n Significant  usage  by  late  2014:  700+  nodes   n Centralized  Elas+csearch  engineering  team  formed  mid  2014   n Diverse  use  cases   –  Search:  workflow  search,  trade  /  order  search  …   –  Document  Search:  legal  documents,  candidate  resume,  source  code   –  Metrics:  JVM,  Network,  App  Usage,  Alerts,  Transac+on  volumes  …   –  Making  Real  +me  transac+on  data  queryable   –  Data  Analy+cs  :  Order  Flow  Dashboards,  Analysis      
  3. 2015/12 3 5   Center of Excellency Model Standardized Software

    Packaging n Internally  build  open  source  code  from  GitHub   n Enhanced  with  GS  Plugins  for  Security  and  Backup   n Include  other  open  source  plugins  and  tools     6   Center of Excellency Model Support n Elas+csearch  install  Inventory   n Centralized  monitoring  and  metrics   n Governance  on  proper  usage   n Elas+c  Vendor  Support     –  Global  support   –  Design  review   –  Performance  tuning   –  Patching   n Integra+on  with  internal  code  base  using  custom  language  wrappers    
  4. 2015/12 4 7   Equities Engineering Use Cases High Performance

    Order Search (Problem) n  Currently  order  transac+on  data  is  persisted  into  Sybase  databases   n  The  total  transac+onal  volume  is  so  large  that  DB  instances  need  to  be  split  into  many   stripes   n  Longer  +me  range  and  aggregated  queries  very  difficult  and  slow  -­‐  hours  in  some  cases.   n  Which  means  extrac+ng  meaningful  analy+cs  from  the  data  is  difficult   n  Different  sources  for  Historical  and  Real  Time  data  means  no  code  sharing       8   Equities Engineering Use Cases High Performance Order Search (Solution)     n Extract  de-­‐normalized  views  of  Historical  data  into  Elas+csearch   n Intra-­‐day  data  indexed  from  live  transac+on  feed   n Unified  schema  –  querying  historic  and  live  data  from  a  single   source   n U+lize  ES  Aggrega+ons  for  fast  analy+c  queries      
  5. 2015/12 5 9   Equities Engineering Use Cases Management Analytics

    – Sharp-X n High  level  dashboard  showing  per-­‐market  analysis  of  Order  Flow  data   n Replaced  and  greatly  expanded  upon  a  legacy  equivalent   n Aggregated  queries  across  both  Historical  and  live  upda+ng  data   n Ability  to  query  the  latest  transac+on  state  cri+cal   n Previous  Implementa+ons  relied  on  real  +me  transac+on  callbacks  to  perform  the   aggrega+ons.  Lots  of  custom  code   n U+lizing  the  Real  +me  feed  to  ES  and  aggrega+ons  for  querying  resulted  in  a   dras+cally  simplified  architecture  and  code  base       10   Equities Engineering Use Cases Management Analytics – Sharp-X (Continued ) Equity  Order  Flow  Dashboards  
  6. 2015/12 6 11   Equities Engineering Use Cases Lessons Learnt

    / Next steps n Ease  of  deployment  and  horizontal  scaling  are  game  changers   n Moving  from  the  Rela+onal  mental  model  takes  some  adjustment.  To  noSql  and  a   completely  new  query  language.   n Living  without  easy  joining  means  thinking  more  about  the  data  model  up  front   n Working  with  a  fast  moving  technology  comes  with  risks  and  challenges   n Elas+c’s  auto-­‐schema  feature  is  useful  for  development  but  can  cause  problems  in  a   produc+on  system.   n Indexes  are  low  cost  and  easy  to  re-­‐create.  Types  can't  be  easily  re-­‐created  without   re-­‐crea+ng  the  index   n Expanded  use  Of  Elas+csearch  in  other  problem  domains   n Plans  to  replace  the  rela+onal  data  sources  with  Hadoop.  Retaining  Elas+csearch  as   the  high  speed  query  engine  on  top.       12   Note
