Speaker: Louise McCluskey, kdb+ Engineer
Description: “Big Data Analytics with Time-Series Database kdb+" followed by a Kx for Dashboards Demo – watch cool visualizations of HUGE Data sets.
Offices Inc. NYC, Singapore, London & Tokyo • Large user community About us NORTH AMERICA AFRICA AUSTRALIA & NZ ASIA UK & EUROPE • Widely adopted in financial services over two decades • Now in - Hi-Tech Manufacturing, Utilities, Telco, Energy, Life Sciences, Earth Observation • Software & industry solutions, consulting and implementation services
and historical data in one platform • Runs on Linux, Windows, Solaris, and MacOS • Runs on commodity hardware, cloud, edge devices/appliances • Expressive query (qsql) and programming language (q) • In-memory compute engine for Complex Event Processing • Column-level compression • Integrates easily into legacy systems for performance augmentation • Multi-core / Multi-processor / Multi-thread / Multi-server Core technology kdb+ column based time-series database with in-built programming language q
and historical time series data • Extreme performance - low latency • Scalability without requiring significant infrastructure change • Provide the fastest, most efficient, most flexible tools and dashboards • Worldwide leader in high-volume, high-performance databases Known for
Sources Scalability, High Availability, and Fault Tolerance Native Lambda/HTAP Architecture Stream for Kx Application Framework Kx for Flow Kx for Surveillance Kx for Algo Kx for DaaS Kx for Utilities Kx for Cyber Kx for Pharma Vertical Market Solutions Kx for Sensors Kx for Telco Real-Time Sources Core (kdb+) In Memory Database Historical Database In Memory Database Historical Database q language & qsql scripting Develop, configure, deploy, and manage solutions Control for Kx Monitor for Kx Scan, monitor and alerting of issues in software and hardware Third-Party Interoperability Pub/sub, SOA, ODBC, JDBC, web sockets R, Python, MatLab, Java, C#, C/C++ Analyst for Kx Query, explore, transform, and import without programming Dashboards for Kx Build real-time visualizations for multiple devices File & DB Sources Batch Loader Batch Loader Stream Feed Handler Stream Feed Handler Stream Engine Ticker Plant Stream and Ingestion Engine Complex Event Processing • Queries • Transforms • Alerts • Control signals • Notifications • Micro-services
Banks 500 KB profile (L1/L2 Cache) Process & store 4.5 million events/ second/core Ingest data at 10 million records/ second/core Streaming 1.6 TB of Data Daily Search in- memory tables at 4 billion records/ second/core
• MongoDB • Cassandra • ElasticSearch • OpenTSDB • Kx applied identical methodologies and run tests to generate their own performance measurements Transitive Comparisons
Spanning 1 Return maximum value, by minute, in a 1-hour time frame, for 1 host Cassandra 1 day 2 Return maximum value, by minute, in a 12-hour time frame, for 1 host Cassandra 1 day 3 Return maximum value, by minute, in a 12-hour time frame, for 8 hosts Cassandra 1 day 4 Return maximum value, by minute, in a 1-hour time frame, for 1 host (4 days) ElasticSearch 4 days 5 Return maximum value, by minute, in a 1-hour time frame, for 1 host MongoDB 6 hours 6 Return maximum value, by minute, in a 1-hour time frame, for 8 hosts OpenTSDB 4 hours
second 0 32,500 65,000 97,500 130,000 Raspberry Pi MacBook Server 1-Core Server 4-Cores Server 8-Cores InfluxDB MongoDB 2850 2614 122666 107810 56649 63138 7693 Kdb+
datasets?keywords=%22MERRA-2%22. • Roughly 250TB of data • Data divided across 100 datasets • Measurements from world-wide gridpoints from 1980-2016 • .nc4 (Network Common Data) file type Demo 1 – Geographical Data
measured at almost 5 million grid points • 1 years (2005) worth of daily data • This takes up 97GB on disc and is over 1.82 billion rows in table form Example Data Sets
rows: For a given location, find the change in pressure for every point of time for the year, and extract where the change in pressure is greater than a specified threshold. There are roughly 150 million points per month select from (select month,time,PS,SPEEDLML,delta:{0,1_deltas x}PS from lfo1 where month within(2005.01m;2005.12m),lat=30,lon=-90) where not delta within(-500;500) (this query takes around 8 seconds) Land Surface Forcing’s Demo
1 million rows of historical data between 2012 and 2017 • Weather Data Available at: https://www7.ncdc.noaa.gov/CDO/dataproduct (up to 2013) • Daily Summary Data for a station located in Central Park Demo 2 – NYC Motor Vehicle Collisions
of the technology using common capital markets use cases. o Intel solution brief: http://www.intel.com/content/www/us/en/processors/xeon/real-time-financial- analysis-with-kx-systems-brief.html o Gartner paper on Kx technology: https://kx.com/gartner-download.php o Community o Kx Wiki: http://code.kx.com/wiki/Main_Page o Kx Community: http://kxcommunity.com/ o Kx Github: http://kxsystems.github.io/ Resources