ideal sweet spot, but 32 GB and 16 GB machines are also common. • Consider giving 32 GB to Elasticsearch and letting Lucene use the rest of memory via the OS filesystem cache. All that memory will cache segments and lead to blisteringly fast full-text search (50% for heaps, and 50% for Lucene). https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
Disable _source – actual JSON that was used as the indexed document • Disable _all – includes the text of one or more other fields within the document indexed • Disable analyzer и norms – to compute the score of a document relatively • Use doc_values – live on disk instead of in heap memory
disable refresh_interval and refresh manually • Temporary disable replication • Delay or disable flushes • Increase thread pool size for index and bulk operations • Use templates for creating new indices
}/#{ file_mask }").sort day_filepaths.each_slice(MAX_PROCESS_COUNT) do |day_filepaths| Parallel.each(day_filepaths, in_processes: day_filepaths.size) do |day_filepath| index(day_filepath) end end https://github.com/grosser/parallel
@user_attrs_by_id ||= begin users_attr_values = User.pluck(*USER_ATTR_NAMES) user_attrs_by_id = users_attr_values.inject({}) do |result, user_attr_values| result[user_attr_values.first] = Hash[USER_ATTR_NAMES.zip(user_attr_values)] result end end end def serialized_reading(reading) user = user_attrs_by_id[reading["user_id"]] { from: reading["from"].to_f, to: reading["to"].to_f, size: reading["size"].to_i, read_at: Time.at(reading["read_at"].to_i / 1_000).utc, user_id: user["id"], user_birthday_at: user["birthday_at"], user_gender: user["gender"].downcase, ... } end