Indexing and Performance

Indexing fundamentals and considerations

The importance of proper indexing

Indexes in MongoDB are B-tree data structures that dramatically reduce the resource (i.e., RAM, CPU, and I/O) requirements of querying. Proper indexing is critical; even a single unindexed query is enough to cause significant performance degradation.

When a query is not indexed, MongoDB performs a full collection scan, iterating through every single document in the collection to find results. In contrast, an indexed query performs and scales much better because it inspects far fewer documents.

Without sufficient indexing, the hardware costs for a performant database increase significantly. Importantly, because un-indexed queries use more resources than needed, they also impact the performance of other operations–even if those operations are indexed.

Limiting the number of documents to scan using indexes

Single-field indexes

A query like {"name": "Adam"} may seem straightforward, but if it’s not indexed, it could perform quite badly (the larger the collection, the worse the query will perform).

Rather than examining all the documents in a collection, create a Single Field Index on this “name” field:

db.collection.createIndex( {“name”: 1} )

Now, no matter how large the collection, only documents matching {"name": "Adam"} will enter into memory.

Compound indexes

When a query includes multiple filters, it is best to include all filters in a compound index. For example, {"name": "Adam", "year": {"$gte": 2016}} can use the following Compound Index:

db.collection.createIndex( {“name”: 1, "year": 1} )

This is preferable to a {"year": 1, "name": 1} index because an equivalency check on “name” is more selective than a range check on “stars”.

Sorting efficiently using indexes

The sort() method returns documents in the requested ascending (1) or descending (-1) sort order. For example, the results of .sort( {"lastName":1} ) are in ascending, alphabetical order.

For a query with sort that isn’t covered by an index, MongoDB sorts results at query time, using precious CPU time and delaying the response to the app.

However, if that same query is covered by an index which includes, in proper order, all fields used to filter and sort the result set, MongoDB does not sort results at query time. Instead, results are already sorted in the index and can be returned immediately.

Determining the order of fields

A good rule of thumb for queries with sort is to order the indexed fields in this order:

The reason one $in filter can precede sort fields is because MongoDB can break the query up according to that $in and merge sort the results.

An example of using this rule of thumb is in the section on “Sorting the results of a complex query on a range of values” below, including a link to further reading. Note that these ordering recommendations are good for most use-cases but may not be the best for your data. The best index in a given scenario is determined through testing using .explain.

Range operators

Query fields(s) which use the following operators should generally be included after the field(s) on which you will sort:

Determining sort orders

MongoDB can traverse an index in either direction. The {"lastName":1} index therefore satisfies both .sort( {"lastName":1} ) and .sort( {"lastName":-1} ). Therefore, for single-field indexes, the sort order of keys does not matter.

However, for compound indexes, sort order can matter in determining whether the index can support a sort operation - the combination of index field order and each field’s sort order determines whether the index can support a sort operation.

If you are using sort() with compound indexes, more information can be found in the “Sort Order” section of MongoDB’s documentation on Compound Indexes.

Sorting the results of a simple query on an exact value

Example of an exact value query with sort

db.collection.find( {"firstName":1}).sort({"lastName":1} )

Recommended index for this query

{"firstName":1, "lastName":1}

As described in MongoDB’s documentation on using indexes to sort query results, an index with the sort direction reversed on all fields would work as well:

{"firstName":-1, "lastName":-1}

Sorting the results of a complex query on a range of values

Sorting a query result with an index is not always straightforward. Consider the following query on a collection that contains baby name statistics per year by gender.

We created this example using a publicly available dataset of baby names from US Social Security Card Applications. Here’s an sample document:

{
	"_id" : ObjectId("577462ff0d1d34fcb64ba7d4"),
	"year" : 1880,
	"name" : "John",
	"sex" : "F",
	"count" : 46
}

This data is available at Data.gov under Baby Names from Social Security Card Applications-National Level Data if you would like to test these behaviors on your own. If you’d like to view database server log lines, use the “Logs” tab to access them (for-pay plans only).

Example of a range query with sort

 db.names.find( {"sex": "M", "count": {"$gt": 10000}} ).sort( {"year": 1} )

Non-optimal index for this query

 {"sex": 1, "count": 1, "year": 1} 

Resulting log message when using non-optimal index

2016-08-05T11:37:10.140-0700 I QUERY    [conn39] query data.names query: { query: { sex: "M", count: { $gt: 10000 } }, orderby: { year: 1 } } planSummary: IXSCAN { sex: 1.0, count: 1.0, year: 1.0 } ntoreturn:10000 ntoskip:0 nscanned:6480 nscannedObjects:6480 scanAndOrder:1 keyUpdates:0 writeConflicts:0 numYields:101 nreturned:3240 reslen:229840 locks:{ Global: { acquireCount: { r: 204} }, MMAPV1Journal: { acquireCount: { r: 102 } }, Database: { acquireCount: { r: 102 } }, Collection: { acquireCount: { R: 102 } } } 38ms

Even though the query is using an index that includes the sort field, the scanAndOrder: 1 is evidence that an in-memory sort took place. This is because of the range criteria on count and the structure of the index. To avoid scanAndOrder in these situations, place sort fields before range fields in the index.

Recommended index for this query

 {"sex": 1, "year": 1, "count": 1} 

Resulting log message when using recommended index

2016-08-05T11:38:21.780-0700 I QUERY    [conn39] query data.names query: { query: { sex: "M", count: { $gt: 10000 } }, orderby: { year: 1 } } planSummary: IXSCAN { sex: 1.0, year: 1.0, count: 1.0 } ntoreturn:10000 ntoskip:0 nscanned:3376 nscannedObjects:3240 keyUpdates:0 writeConflicts:0 numYields:29 nreturned:3240 reslen:229840 locks:{ Global: { acquireCount: { r: 60 } }, MMAPV1Journal: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 30 } }, Collection: { acquireCount: { R: 30 } } } 9ms

The omission of “scanAndOrder:1” confirms that the query is completed without an in-memory sort.

To better understand the log lines above, read the next section below. Further reading on this topic of sorting the results of complex queries is available at our blog post on sorting the results of queries on ranges.

Performing efficient $or queries

In general, index for $or queries as though each $or clause was its own query. Be sure to consider the sort clause as part of each $or clause.

Example $or query

db.names.find({"$or": [ {"firstName": "Linda"}, {"middleName": "Linda"} ]}).sort({"count": -1})

Recommended indexes for this query

{"firstName": 1, "count": 1}
{"middleName": 1, "count": 1}

SERVER-13732

Certain $or queries are affected by a known bug, SERVER-13732, which prevents efficient index selection. To avoid this bug when composing $or queries, ensure that $or is the only top-level field in the query. For example:

Example of an $or query affected by SERVER-13732

db.names.find({"year": 1972, "$or": [ {"name": "Eric", "sex": "M"}, {"name": "Erica", "sex": "F"} ]}).sort({"count": -1})

Example of an $or query not affected by SERVER-13732

db.names.find({"$or": [ {"year": 1972, "name": "Eric", "sex": "M"}, {"year": 1972, "name": "Erica", "sex": "F"} ]}).sort({"count": -1})

Recommended index for this query

Since both of these $or clauses use the same query filters, each clause can use the same index:

{"year": 1, "name": 1, "count": 1}

Comparing a log line for an unindexed query vs. an indexed one

This example uses the same baby name statistics data as above and compares two log lines for the same query.

The first operation takes almost 2 seconds because it does not use an index, but the second takes 1 millisecond with an index. The key differences between these log messages lie in the planSummary,nscannedObjects, and scanAndOrder values.

Starting in MongoDB 3.2, the following key words in MongoDB log lines changed:

  • scanAndOrder —> hasSortStage
  • nscanned —> keysExamined
  • nscannedObjects —> docsExamined

Example query with sort

We want to find the number of babies named “Arya” each year in ascending year order. Our database query is:

db.names.find( {"name": "Arya"} ).sort( {"year": 1} )

Resulting log message when unindexed

2016-06-30T12:21:05.153-0700 I QUERY    [conn120] query data.names query: { query: { name: "Arya" }, orderby: { year: 1.0 } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:1858689 scanAndOrder:1 keyUpdates:0 writeConflicts:0 numYields:14562 nreturned:58 reslen:4022 locks:{ Global: { acquireCount: { r: 29126 } }, MMAPV1Journal: { acquireCount: { r: 14563 } }, Database: { acquireCount: { r: 14563 } }, Collection: { acquireCount: { R: 14563 } } } 1795ms

The following information observed in the log line above shows a lot of room for improvement:

Recommended index

We can improve performance by:

  1. Reducing the number of documents being inspected in memory; and
  2. Removing the need to perform in-memory sorts.

We now build the following index:

 { "name": 1, "year": 1 }

Resulting log message when using recommended index

Once the index is created, the query runs orders of magnitude faster:

2016-06-30T12:21:24.883-0700 I QUERY    [conn120] query data.names query: { query: { name: "Arya" }, orderby: { year: 1.0 } } planSummary: IXSCAN { name: 1.0, year: 1.0 } ntoreturn:0 ntoskip:0 nscanned:58 nscannedObjects:58 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:58 reslen:4022 locks:{ Global: { acquireCount: { r: 2 } }, MMAPV1Journal: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { R: 1 } } } 1ms

The following information observed in the log line above shows an efficient query:

Results were not sorted in memory, and the number of documents inspected is equal to the number of documents returned. This is an efficient query.

Special considerations and further reading

Using embedded documents as _id values

Embedded document _ids allow you to encode complex values that aren’t included in the default ObjectId-based _ids generated by MongoDB and MongoDB drivers. However, we recommend using an ObjectId whenever possible.

If you must use embedded documents as _ids, there are important considerations to be aware of:

mLab’s management portal does not support embedded document _ids when viewing single documents.

Further reading

One of MongoDB’s strengths is its flexible data model. Flexible data modeling is powerful, but dangerous without proper indexing. If your data contains complex fields such as arrays, embedded documents, or geospatial data, read the following for important information about index behavior:

Index management

Viewing existing indexes

To view existing indexes for a particular collection, follow these steps:

  1. Log in to the mLab management portal
  2. From your account’s Home page, navigate to the deployment with the database whose indexes you want to view
    • If necessary, navigate to the desired database after navigating to the deployment
  3. Navigate to a collection to view its indexes
  4. Click the “Indexes” tab to display all indexes created for the collection img-indexes

Alternately, you can use the db.collection.getIndexes() method via the mongo shell to view all indexes for a particular collection.

Creating a new index

Best practices when creating indexes

The process of creating an index can require a lot of system resources and can be long-running (hours if not days).

If uptime and performance are important for your application:

  1. Build the index in the background (our management portal does this by default; see next section)
  2. Build the index during a low traffic time
  3. Avoid simultaneous index builds by waiting for one to complete before starting another

Index builds take significantly longer when there is insufficient RAM. In particular, if the index you’re building is close to or exceeds the RAM available, it could be impossible to build the index within a reasonable amount of time.

Background vs. foreground index builds

Always consider whether you want to build your indexes in the background or foreground. While building them in the foreground is faster, a foreground build blocks all other operations to the database. On a production system, you almost always want to build your indexes in the background to minimally impact your live database.

If you build your index via the mLab management portal, it will always be built in the background. If you build your index via the mongo shell or your driver, include the {"background": true} option to create the index in the background. If it is not specified, the index will be created in the foreground by default.

db.collection.createIndex( {"fieldName": 1}, {"background": true} )

Creating an index through the mLab management portal

Simple, compound, and geospatial indexes can all be created through the mLab management portal. To add an index for a specific collection:

  1. Log in to the mLab management portal
  2. From your account’s Home page, navigate to the deployment with the database where an index will be added
    • If necessary, navigate to the desired database after navigating to the deployment
  3. Navigate to the collection where the index(es) will be added
  4. Click the “Indexes” tab
  5. Click the “Add” button to open the “Add new index” window
    img-addindex
  6. Add your index in the editor window
  7. Adjust the options and custom name if desired. Note that custom names are not required; it is usually best to let MongoDB auto-generate a name.
  8. Click “Create in background” to build your index

Alternately, you can use the db.collection.createIndex() method via the mongo shell to add an index to your collection. If you would like for your deployment to remain available during the index building operation, specify ({"background": true}). Indexes created through mLab’s management portal are always built in the background.

Killing a running index creation operation

If you find yourself in the position where you want to kill an in-progress index build for one or both of the following reasons:

… be sure to proceed with caution. Killing a running index creation operation will often lead to replica sets with members having different sets of indexes as well as other unexpected issues.

Therefore, unless database performance has degraded to an emergency state, we strongly encourage you to allow the index build to complete and then optionally remove it once it’s done building.

Restarting the deployment to kill an index build won’t stop the index build and remove the index as you would expect - instead the process will start up again in an offline mode and build the index in the foreground as described in MongoDB’s documentation on interrupted index builds. This can lead to a loss of high availability or possibly downtime.

Identifying and removing unnecessary indexes

Overview

While it is very important that all queries are indexed, updating indexes slows the speed of inserts and updates. As such, unnecessary indexes may cause unnecessary use of CPU and RAM.

This section details how to identify redundant indexes and identify indexes that aren’t being used.

Be sure to review the checklist in the “Removing indexes” section below before dropping any indexes!

Identifying redundant indexes

An index may be redundant if it is:

In summary, an index on {"name": 1, "year": -1} supports:

… but does not support:

Redundant simple indexes

Example simple index

{"year": 1}

Example redundant index of the above index

The following index may be dropped if the above index exists:

{"year": -1}
Redundant compound indexes

Example compound index

{"year": 1, "name": 1, "count": 1}

Example redundant indexes of the above index

Depending on the sort clauses applied to queries, the following indexes may be redundant with the above index:

{"year": -1, "name": -1, "count": -1}
{"year": 1, "name": 1}
{"year": -1, "name": -1}
{"year": 1}
{"year": -1}

Be particularly careful if you are using unique, sparse, and/or partial indexes.

Identifying unused indexes

Similar to redundant indexes, unused indexes impact write performance and consume valuable resources. Periodic index review and maintenance is recommended to ensure good database performance.

On MongoDB versions prior to 3.2, manual inspection and consideration is necessary to infer unused indexes.

For MongoDB version 3.21, we recommend using the $indexStats operator. Guidance on how to use this operator is available on our blog:

Removing unnecessary indexes

Proceed with caution

As with all delete operations on the database, always err on the side of caution when removing an index.

In addition, here are some checks to perform before removing an unused index:

If after reviewing these considerations you have questions about specific indexes, email support@mlab.com for guidance.

Dropping an index through the mLab management portal

To drop an index for a specific collection:

  1. Log in to the mLab management portal
  2. From your account’s Home page, navigate to the deployment with the database where an index will be removed
    • If necessary, navigate to the desired database after navigating to the deployment
  3. Navigate to the collection where the index(es) will be removed
  4. Click the “Indexes” tab
  5. Click the delete icon (a blue “X”) that appears at the end of the row for the index that you want to delete

Alternately, you can use the db.collection.dropIndex(<index name>) method via the mongo shell to drop an index from your collection.

Managing indexes across environments

There are many ways to manage the process by which indexes are created and defined in your various environments and MongoDB deployments. Therefore you should carefully consider how you manage this process.

For databases that have gone into production, we recommend that you build your indexes in the background and at times that make the most sense for your app (e.g., during off-peak activity times). You can then maintain a separate script of createIndex calls to ensure these indexes are built in your development and staging environments as well.

If you decide to put createIndex calls in your code, they should be called when your application initializes. However, if you go this route you must be careful about how this fits with your architecture and your deployment processes or else you might end up with indexes being built at unplanned (and undesirable) times.

Performance analysis

Key performance tools

Not available for Sandbox databases

Indicators of poor performance / Assessing headroom

We will be augmenting this portion of our documentation in the coming weeks. In the meantime, contact support@mlab.com if you would like advice in this area.

  1. The $indexStats operator will be available on Sandbox and Shared plans once the fix for SERVER-26734 is available for supported versions.