Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keynote: collaboration between Opening.io and the
 Irish Centre for High-end Computing (ICHEC)

Keynote: collaboration between Opening.io and the
 Irish Centre for High-end Computing (ICHEC)

A perspective on successful embedding of industry and academia research within the Irish technology ecosystem, Irish Supercomputing SMEs, Thu 26 January 2017

Adrian Mihai - opening.io

January 25, 2017
Tweet

More Decks by Adrian Mihai - opening.io

Other Decks in Research

Transcript

  1. Irish Supercomputing SMEs
 —
 
 Keynote: collaboration between Opening.io and

    the
 Irish Centre for High-end Computing (ICHEC). a perspective on successful embedding of industry and academia research within the Irish technology ecosystem
  2. Data science meets recruitment
 —
 
 We enable ideal candidate

    discovery for any given sets of job requirements within large numbers
 of applicants by deploying natural language understanding of resumes, context and industry
 data points - and explain the decision via supportive briefs, infographics and various fact-checks. Opening: technology background Parallel computing clusters enabling high volume recruitment workflows
 —
 
 Opening deploys a high-throughput, end-to-end resume processing pipeline enabling parsing, analysis, auto-classification of resumes in real time. We operate stand-alone or augment job boards, recruitment agencies, enterprise workflows and applicant tracking system platforms. A high level overview of our platform is here: 
 https://speakerdeck.com/amorroxic/opening-dot-io-system-architecture
  3. accurate | industry agnostic | scalable | fault tolerant |

    hot-pluggable Data mining
 —
 
 Reactivity
 —
 
 API
 —
 
 Decoupling
 —
 
 AI
 —
 
 Main objectives Context 2015, a young Irish company of three working on a product catering to the recruitment space, with a focus on areas defining recruitment at its core: positions (monitoring job boards) and talent (resume analysis).
 
 It’s a rather hard problem to tackle programmatically - as we do -, as research in AI was only rather recently boosted by computing power and talent in consequence is scarce. Bootstrapping our way to the market we had the opportunity of an 
 EI Innovation Grant - which gave us the opportunity to connect with ICHEC. The interfacing was incredibly smooth, a process managed throughout all stages - discovery, understanding, analysis - to implementation, data modeling and audit reporting - by a really dedicated team, whom we are truly grateful to. 
 Thank you for all your support!
  4. Over a period of six months* we audited our research

    and constantly optimized our intelligence processes, methodology and algorithms. The ICHEC team added an unparalleled amount of expertise - we ended up prototyping together, graphing and modeling data, discussing accuracy and exploring competing alternatives. The process ended with an audit report accompanied by supporting analysis data, code improvements and observations which enables us today to offer a product enabling the market at large to level the playing field in a space dominated by a select few major players. *Aug 2015 - Feb 2016 accurate | industry agnostic | scalable | fault tolerant | hot-pluggable Collaboration with ICHEC
  5. Benefits and outcome Successful?
 —
 
 Live product, enabling major

    agencies in Dublin Market: The same assumptions we validated at the time turned out to products defining their categories in various other markets: 
 Alexa, Tesla, Google Translate. Technology: Compliant with the new EU Right-to-Explain directive - explaining AI behavior. Scales both horizontally / vertically under actual real life load across computing cores and I/O boundaries.
 Runs on OS X, Windows or on the web, powered by a private cloud infrastructure. Powering CareerZoo Dublin (careerzoo.ie) in March 2017 NDRC Catalyzer company Launched in Sep 2016, team grew to six In direct contact with major intl. entities (US / UK / Germany / Spain / Finland / Eastern Europe)
  6. Faster iterations / market responsiveness Ability to ship fast is

    crucial. HPC enables faster prototyping and deeper understanding of the problem domain, allowing for a great degree of flexibility in exploring various approaches while tuning a product
 in response to changing industry requirements. Better performing software By mitigating the risks in maintaining legacy infrastructure we’re free to focus on optimizing performance while scaling computation to levels accessible until recently to large entities mostly - at a fraction of the cost. Ability to avoid lock-in and reliance on 3rd party ecosystems - compute grids are now available as on-demand, platform as a service services. Autonomy Lower constraints Big data analytical processes very often require specialized hardware and technological prowess. Augmenting such processes with machine processing power allows faster convergence of results with lower constraints on practical implementation details. Key enablers and competitive advantage
  7. There are various approaches tackling a typically difficult problem, in

    my opinion two approaches will have enormous value in rapid deployment of production ready, incredibly stable deployments: immutability and functional programming / monadic compositions. Big data doesn’t necessarily have to mean “big” as in storage - it can also mean smaller chunks but magnitudes of order times faster or with unspecified constraints / boundaries in regards to resource allocation. Problem domain concurrency deals with resource contention, parallelism distributes state for parallel processing - there is no magic “high performance all-purpose cluster” yet, it’s all related to the problem domain. Distributed processing Patterns in data AI identifies and learns patterns in data and predicts output upon detecting such patterns in new, previously unseen data. Stripped out of myth
 deep learning is just a way of asking millions of non-linear questions from training sets in unison, grouping answers in certain ways and continuously adapting questions until a system generalizes to understand latent features in data which most closely influence the observed effect. Thoughts on High Performance Computing (HPC)