Speaker: Louise McCluskey, kdb+ Engineer
Description: “Big Data Analytics with Time-Series Database kdb+" followed by a Kx for Dashboards Demo – watch cool visualizations of HUGE Data sets.
4 • Subsidiary of First Derivatives plc • 13 Global Offices Inc. NYC, Singapore, London & Tokyo • Large user community About us NORTH AMERICA AFRICA AUSTRALIA & NZ ASIA UK & EUROPE • Widely adopted in financial services over two decades • Now in - Hi-Tech Manufacturing, Utilities, Telco, Energy, Life Sciences, Earth Observation • Software & industry solutions, consulting and implementation services
5 • World’s fastest time-series columnar database • Streaming, real-time and historical data in one platform • Runs on Linux, Windows, Solaris, and MacOS • Runs on commodity hardware, cloud, edge devices/appliances • Expressive query (qsql) and programming language (q) • In-memory compute engine for Complex Event Processing • Column-level compression • Integrates easily into legacy systems for performance augmentation • Multi-core / Multi-processor / Multi-thread / Multi-server Core technology kdb+ column based time-series database with in-built programming language q
6 • Processing & analysis of large volumes of real-time and historical time series data • Extreme performance - low latency • Scalability without requiring significant infrastructure change • Provide the fastest, most efficient, most flexible tools and dashboards • Worldwide leader in high-volume, high-performance databases Known for
8 select open: first price, high: max price, low: min price, close: last price from trade where date = 2013.05.01, sym=`VOD.L Sample q query open high low close 83.85 85.9 83.28 85.45
11 Kx Architecture File & DB Sources File & DB Sources Scalability, High Availability, and Fault Tolerance Native Lambda/HTAP Architecture Stream for Kx Application Framework Kx for Flow Kx for Surveillance Kx for Algo Kx for DaaS Kx for Utilities Kx for Cyber Kx for Pharma Vertical Market Solutions Kx for Sensors Kx for Telco Real-Time Sources Core (kdb+) In Memory Database Historical Database In Memory Database Historical Database q language & qsql scripting Develop, configure, deploy, and manage solutions Control for Kx Monitor for Kx Scan, monitor and alerting of issues in software and hardware Third-Party Interoperability Pub/sub, SOA, ODBC, JDBC, web sockets R, Python, MatLab, Java, C#, C/C++ Analyst for Kx Query, explore, transform, and import without programming Dashboards for Kx Build real-time visualizations for multiple devices File & DB Sources Batch Loader Batch Loader Stream Feed Handler Stream Feed Handler Stream Engine Ticker Plant Stream and Ingestion Engine Complex Event Processing • Queries • Transforms • Alerts • Control signals • Notifications • Micro-services
14 Kx Performance Snippets Trusted by 19/20 World’s Top Investment Banks 500 KB profile (L1/L2 Cache) Process & store 4.5 million events/ second/core Ingest data at 10 million records/ second/core Streaming 1.6 TB of Data Daily Search in- memory tables at 4 billion records/ second/core
15 • InfluxData published public benchmarks (data and software) against: • MongoDB • Cassandra • ElasticSearch • OpenTSDB • Kx applied identical methodologies and run tests to generate their own performance measurements Transitive Comparisons
17 Query Definitions # Definition Kdb+ vs InfluxDB vs… Data Spanning 1 Return maximum value, by minute, in a 1-hour time frame, for 1 host Cassandra 1 day 2 Return maximum value, by minute, in a 12-hour time frame, for 1 host Cassandra 1 day 3 Return maximum value, by minute, in a 12-hour time frame, for 8 hosts Cassandra 1 day 4 Return maximum value, by minute, in a 1-hour time frame, for 1 host (4 days) ElasticSearch 4 days 5 Return maximum value, by minute, in a 1-hour time frame, for 1 host MongoDB 6 hours 6 Return maximum value, by minute, in a 1-hour time frame, for 8 hosts OpenTSDB 4 hours
19 Query Rate: Kdb+ vs InfluxDB vs MongoDB Queries per second 0 32,500 65,000 97,500 130,000 Raspberry Pi MacBook Server 1-Core Server 4-Cores Server 8-Cores InfluxDB MongoDB 2850 2614 122666 107810 56649 63138 7693 Kdb+
20 Query Rate: Kdb+ vs InfluxDB vs ElasticSearch Queries per second 0 10,000 20,000 30,000 40,000 50,000 60,000 Raspberry Pi MacBook Server 1-Core Server 4-Cores Server 8-Cores InfluxDB ElaslcSearch 79 3600 53682 34905 12455 24266 1333 Kdb+
23 • The NASA MERRA-2 data is available at: https://disc.sci.gsfc.nasa.gov/uui/ datasets?keywords=%22MERRA-2%22. • Roughly 250TB of data • Data divided across 100 datasets • Measurements from world-wide gridpoints from 1980-2016 • .nc4 (Network Common Data) file type Demo 1 – Geographical Data
24 • inst1_2d_lfo_Nx – Land Surface Forcing's: • 9 variables, measured at almost 5 million grid points • 1 years (2005) worth of daily data • This takes up 97GB on disc and is over 1.82 billion rows in table form Example Data Sets
26 Lets run a big query across all 1.82 billion rows: For a given location, find the change in pressure for every point of time for the year, and extract where the change in pressure is greater than a specified threshold. There are roughly 150 million points per month select from (select month,time,PS,SPEEDLML,delta:{0,1_deltas x}PS from lfo1 where month within(2005.01m;2005.12m),lat=30,lon=-90) where not delta within(-500;500) (this query takes around 8 seconds) Land Surface Forcing’s Demo
28 Land Surface Forcing’s Demo Using colour to identify the intensity of SPEEDML you can see that when the pressure drops dramatically, the wind speed really picks up.
32 • Data is available at: https://data.cityofnewyork.us/Public-Safety/NYPD-Motor- Vehicle-Collisions/h9gi-nx95 • Over 1 million rows of historical data between 2012 and 2017 • Weather Data Available at: https://www7.ncdc.noaa.gov/CDO/dataproduct (up to 2013) • Daily Summary Data for a station located in Central Park Demo 2 – NYC Motor Vehicle Collisions
37 Benchmarks o STAC benchmarks: https://stacresearch.com/kx; includes independently verified benchmarks of the technology using common capital markets use cases. o Intel solution brief: http://www.intel.com/content/www/us/en/processors/xeon/real-time-financial- analysis-with-kx-systems-brief.html o Gartner paper on Kx technology: https://kx.com/gartner-download.php o Community o Kx Wiki: http://code.kx.com/wiki/Main_Page o Kx Community: http://kxcommunity.com/ o Kx Github: http://kxsystems.github.io/ Resources