of hours • Orchestration increases the uptime and lower O&S costs • Hundreds of users using Elastic without any performance issues • We have actionable visualization through the use of Kibana • Control cloud costs by keeping 6 months hot and 5 years searchable Key Takeaways
Be specific. • Engineer in reverse Understand Constraints • Chase bottlenecks • “Instrument everything” • Don’t be afraid to go deep Scale Up • Only after optimization • Get parallel: Network, Compute, Storage Elastic Engineering Core Principals Works 60% of the time, every time
Use tools at your disposal: - Node Stats - Hot Threads - htop, nload, iostats - visualVM to profile heap Slow JDBC transfers JSON serialization was expensive Let’s write our own app
} Result were impressive: 1 Core: ~10k EPS 40 Cores, multiproc’d, threaded: ~100,000 EPS cx_Oracle great JSON support complete, granular control now scale up!
Be careful! - impact infrastructure - degrade networks - disrupt storage - cascading compute - stress hardware* *Trust us, Pay attention to high/critical temperature warning sensors on bare metal boxes
Coordinating Nodes JSON { } High level challenges in the cloud: Network bottlenecks Reliability not guaranteed - many outages Getting data *into* cloud storage Absolutely use an orchestration tool!
"best_compression", "refresh_interval": "-1", // disable refresh "number_of_shards": "10", // shoot for 40-60gb shards "number_of_replicas": "0", // danger! danger! "translog.durability": “async”, // fsync commit in the background "index.merge.scheduler.max_thread_count": “1" // spinning disks } } High indexing rates require optimizing settings and mappings Getting fast ingest speeds index settings for archive data: Caution! these settings are for short-term indexing! Don’t use for 30, 60, 90 day production data.
contains contains such a rich and wonderful data set Cyber Analysts: When was the first time we ever saw communication with an IP address? Approx time and frequency of unusual port activity? Network Operations: When was the last time a network device was updated, or patched? When was a box last seen on the network? Secret Squirrels / Information Assurance: request for user attributable data Validate security controls for systems The usual response: We can only go back X days
enterprise logging online and searchable. “hot”/“warm”. After X days, daily indices are snapshot’ed to “cold” storage Most often, analysts want to search since the beginning of time IOI allows us to quickly search metadata about documents stored in offline indices. Serves as a starting point for further investigation, rehydration
as list of valuable, low-cardinality fields. source and dest IP source and dest ports user names host, network (enclave) names email address event IDs Note: ‘event name’ is a poor choice, high cardinality
Run IOI aggs snapshot old index alias new index Iterate through your list of “high-value” fields: run terms aggregation get the values and counts store them back into a new index for that day with a note in the doc that the data is archived Add newly-created summary index to enterprise logging alias
Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 39 Please attribute Elastic with a link to elastic.co