10 Key Performance Indicators - Christian Kvalheim, 10gen

Key Performance Indicators Christian Kvalheim 10gen

Agenda •The Players •Tools •Performance Indicators

Speed MongoDB is a high-performance database, but how do I
know that I’m getting the best performance

Players • Memory – Memory mapped ﬁled, OS memory handling
• Locks – Global write lock, going db level in 2.2 • Disk IO – IOPS, Latency • Network – Bandwidth

mongostat

The stats • Flushes • Mapped memory • Virtual memory
size (Vsize) • Resident memory • Faults • Locked percentage

db.serverStatus > db.serverStatus(); { "host" : “MacBook.local",
"version" : "2.0.1", "process" : "mongod", "uptime" : 619052, // Lots more stats... }

Proﬁler > db.setProfilingLevel(2); { "was" : 0, "slowms" : 100,
"ok" : 1 }

Proﬁling • 3 levels – off – slower than x
ms – all • Capped collection, 1MB default • Some performance overhead but minimal

Proﬁler > db.system.profile.find() {
"ts" : ISODate("2011-‐09-‐30T02:07:11.370Z"), "op" : "query", "ns" : "docs.spreadsheets", "nscanned" : 20001, "nreturned" : 1, "responseLength" : 241, "millis" : 1407, "client" : "127.0.0.1", "user" : "" }

Monitoring Service • MMS: 10gen.com/try-mms • Nagios • Munin

INDICATORS

Slow Operations Sun May 22 19:01:47 [conn10] query docs.spreadsheets ntoreturn:100
reslen: 510436 nscanned:19976 { username: “Hackett, Bernie”} nreturned:100 147ms

Replication lag • replication lag is difference in time between
the primaries last operation and the last operation the secondary committed

2.Replication Lag

PRIMARY> rs.status()

PRIMARY> rs.status() {

Virtual memory

Resident Memory > db.serverStatus().mem {
"bits" : 64, // Need 64, not 32 "resident" : 7151, // Physical memory "virtual" : 14248, // Files + heap "mapped" : 6942 // Data files }

Resident Memory > db.stats() {
"db" : "docs", "collections" : 3, "objects" : 805543, "avgObjSize" : 5107.312096312674, "dataSize" : 4114159508, // ~4GB "storageSize" : 4282908160, // ~4GB "numExtents" : 33, "indexes" : 3, "indexSize" : 126984192, // ~126MB "fileSize" : 8519680000, // ~8.5GB "ok" : 1 }

Optimal Resident Memory indexSize + storageSize <= RAM

Page Faults > db.serverStatus().extra_info { "note" : "fields
vary by platform", “heap_usage_bytes” : 210656, “page_faults” : 2381 }

Page Faults • The number of times the OS needs
to read and write a new page of data into memory • Very high number indicates thrashing – OS spends more time reading/writing data to disk than doing work

Write Lock Percentage > db.serverStatus().globalLock {
"totalTime" : 2809217799, "lockTime" : 13416655, "ratio" : 0.004775939766854653, }

Write lock percentage • the total amount of time the
server spent in global write lock during the last sample period (one second)

Concurrency • One writer or many readers • Global RW
Lock • Yields on long-running ops and if we’re likely to go to disk.

High Lock Percentage? You’re Probably Paging!

Queues • Tells you how many connections are waiting for
reading or writing

6.Reader and Writer Queues

> db.serverStatus().globalLock

> db.serverStatus().globalLock {

> db.serverStatus().globalLock {
"totalTime" : 2809217799,

Current Op • db.currentOp() let’s you see the current operation
executing • db.killOp(id) lets you kill a blocking long running operation

6.Reader and Writer Queues

> db.currentOp()

> db.currentOp() {

Background Flushing • Tells you how often the data is
written to disk • A high value might indicate IO performance issue – Might happen with network attached storage • Lower the time for ﬂushing to disk to write less data more often

Background Flushing > db.serverStatus().backgroundFlushing {
"flushes" : 5634, "total_ms" : 83556, "average_ms" : 14.830670926517572, "last_ms" : 4, "last_finished" : ISODate ("2011-‐09-‐30T03:30:59.052Z") }

Disk Considerations • Raid • SSD • SAN?

Connections > db.serverStatus().connections { "current" : 7, "available" : 19993
}

Connections • Each connection takes up heap space • The
more connections the more context switching for the CPU • Clean up your connections after use

Network Speed > db.serverStatus().network { "bytesIn" : 877291, "bytesOut" :
846300, "numRequests" : 9186 }

Network Speed • Application might saturate connection leaving little replication
bandwidth • Slow interconnect between app and db servers might limit your performance – Measure available bandwidth between servers, scp can be used for a sanity check of this. • If a problem bond connections, get 10Gbp cards. • Control the write speed doing

Fragmentation db.spreadsheets.stats() {
"ns" : "docs.spreadhseets", "size" : 8200046932, // ~8GB "storageSize" : 11807223808, // ~11GB "paddingFactor" : 1.4302, "totalIndexSize" : 345964544, // ~345MB "indexSizes" : { "_id_" : 66772992, “username_1_filename_1” : 146079744, “username_1_updated_at_1” : 133111808 }, "ok" : 1 }

Fragmentation • Padding factor is the extra space MongoDB allocates
for each document growth when saving documents – doc is 1000 bytes – padding factor 1.5 – total memory allocated 1500 bytes for doc

Fragmentation Padding factor >2 is the Horror Number

storageSize / size > 2 • Might not be reclaiming
free space fast enough • Padding factor might not be correctly calibrated db.spreadsheets.runCommand(“compact”)

paddingFactor > 2 • You might have the wrong data
model • You might be growing documents too much – embedded documents • Should review Schema Design

Summary • Ensuring dataset in memory is important – Avoid
page faults • Find slow queries – Minimize time spent in write lock • Make sure you don’t ﬂood Mongo with connections • Ensure you padding factor is < 2 – Check you schema design

Weʼre Hiring Engineers, Sales, Evangelist, Marketing, Support, Developers @mongodb_jobs http://linkd.in/joinmongo

Weʼre Always Around For Conferences, Appearances and Meetups 10gen.com/events @mongodb
h2p://bit.ly/mongo8

10 Key Performance Indicators - Christian Kvalh...

10 Key Performance Indicators - Christian Kvalheim, 10gen

More Decks by mongodb

Other Decks in Technology

Featured

Transcript