SNMP Support
SNMP support is available for Enterprise licenses only.
-
This page explains how to use SNMP to monitor RavenDB and what metrics can be accessed.
-
In this page:
Overview
-
Simple Network Management Protocol (SNMP) is an Internet-standard protocol for collecting and organizing information about managed devices on IP networks. It is used primarily for monitoring network services. SNMP exposes management data in the form of variables (metrics) that describe the system status and configuration. These metrics can then be remotely queried (and, in some circumstances, manipulated) by managing applications.
-
In RavenDB we have support for SNMP which allows monitoring tools like Zabbix, PRTG, and Datadog direct access to the internal details of RavenDB. We expose a long list of metrics: CPU and memory usage, server total requests, the loaded databases, and database-specific metrics like the number of indexed items per second, document writes per second, storage space each database takes, and more.
-
You can still monitor what is going on with RavenDB directly from the Studio, or by using one of our monitoring tools. However, using SNMP might be easier in some cases. As users start running large numbers of RavenDB instances, it becomes impractical to deal with each of them individually, and using a monitoring system that can watch many servers becomes advisable.
Enabling SNMP in RavenDB
-
To monitor RavenDB using SNMP you must first set the Monitoring.Snmp.Enabled configuration key to true.
-
To learn how to modify a configuration key, refer to the Configuration Overview article,
which outlines all available options. -
For example, add this key to your settings.json file and restart the server.
{
...
"Monitoring.Snmp.Enabled": true
...
}
SNMP configuration options
There are several configurable SNMP properties in RavenDB:
For SNMPv1:
- Monitoring.Snmp.Port
The SNMP port.
Default:161
- Monitoring.Snmp.SupportedVersions
List of supported SNMP versions.
Default:"V2C;V3"
For SNMPv2c:
- Monitoring.Snmp.Community
The community string is used as a password.
It is sent with each SNMPGET
request and allows or denies access to the monitored device.
Default:"ravendb"
For SNMPv3:
- Monitoring.Snmp.AuthenticationProtocol
Authentication protocol.
Default:"SHA1"
- Monitoring.Snmp.AuthenticationUser
The user for authentication.
Default:"ravendb"
- Monitoring.Snmp.AuthenticationPassword
The authentication password. When set tonull
the community string is used instead.
Default:null
- Monitoring.Snmp.PrivacyProtocol
Privacy protocol.
Default:None
- Monitoring.Snmp.PrivacyPassword
Privacy password.
Default:"ravendb"
-
See article Monitoring Options for the full list of SNMP configuration keys.
-
To learn how to modify a configuration key, refer to the Configuration Overview article,
which outlines all available options.
The Metrics
Access metrics via monitoring tools
-
Querying the exposed metrics using a monitoring tool is typically straightforward (see this Zabbix example).
-
For a simplified setup, we have provided a few templates which can be found here.
These templates include the metrics and their associated OIDs.
Access metrics via SNMP agents
-
The metrics can be accessed directly using any SNMP agent such as Net-SNMP.
Each metric has a unique object identifier (OID) and can be accessed individually. -
The most basic SNMP commands are
snmpget
,snmpset
andsnmpwalk
.
For example, you can execute the following snmpget commands to retrieve the server's up-time metric.For SNMPv2c:
// Request: snmpget -v 2c -c ravendb live-test.ravendb.net 1.3.6.1.4.1.45751.1.1.1.3 // Result: iso.3.6.1.4.1.45751.1.1.1.3 = Timeticks: (29543973) 3 days, 10:03:59.73
ravendb
is the community string (set via the Monitoring.Snmp.Community configuration key)."live-test.ravendb.net"
is the host.
For SNMPv3:
snmpget -v 3 -l authNoPriv -u ravendb -a SHA \ -A ravendb live-test.ravendb.net 1.3.6.1.4.1.45751.1.1.1.3
-l authNoPriv
- sets the security level to use authentication but no privacy.-u ravendb
- sets the user for authentication purposes to "ravendb".-a SHA
- sets the authentication protocol to SHA.-A ravendb
- sets the authentication password to "ravendb".
Access metrics via HTTP
Access single OID value:
-
An individual OID value can be retrieved via HTTP
GET
endpoint:
<serverUrl>/monitoring/snmp?oid=<oid>
-
For example, a cURL request for the server up-time metric:
// Request: curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1.1.1.3 // Result: { "Value" : "4.21:32:56.0700000" }
Access multiple OID values:
-
Multiple OID values can be retrieved by making either a
GET
or aPOST
request to the following HTTP endpoint:<serverUrl>/monitoring/snmp/bulk
-
For example, cURL requests for the server managed memory and unmanaged memory metrics:
curl -X GET "http://live-test.ravendb.net/monitoring/snmp/bulk? \ oid=1.3.6.1.4.1.45751.1.1.1.6.7&oid=1.3.6.1.4.1.45751.1.1.1.6.8"
curl -X POST \ -H "Content-Type: application/json" \ -d '{ "OIDs": ["1.3.6.1.4.1.45751.1.1.1.6.7", "1.3.6.1.4.1.45751.1.1.1.6.8"]}' \ http://localhost:8080/monitoring/snmp/bulk
{ "Results": [ { "OID": "1.3.6.1.4.1.45751.1.1.1.6.7", "Value": "410" }, { "OID": "1.3.6.1.4.1.45751.1.1.1.6.8", "Value": "4" } ] }
-
You can get a list of all OIDs along with their description via this HTTP
GET
endpoint:
<serverUrl>/monitoring/snmp/oids
-
For example: http://live-test.ravendb.net/monitoring/snmp/oids
List of OIDs
-
RavenDB's root OID is: 1.3.6.1.4.1.45751.1.1.
-
Values represented by
X
,D
, orI
in the OIDs list below will be:X
:
0
- any kind of collection
1
- a generation-0 or generation-1 collection
2
- a blocking generation-2 collection
3
- a background collection (this is always a generation 2 collection)D
- Database numberI
- Index number
OID | Metric (Server) |
---|---|
1.1.1 | Server URL |
1.1.2 | Server Public URL |
1.1.3 | Server TCP URL |
1.1.4 | Server Public TCP URL |
1.2.1 | Server version |
1.2.2 | Server full version |
1.3 | Server up-time |
1.3.6.1.2.1.1.3.0 | Server up-time (global) |
1.4 | Server process ID |
1.5.1 | Process CPU usage in % |
1.5.2 | Machine CPU usage in % |
1.5.3.1 | CPU Credits Base |
1.5.3.2 | CPU Credits Max |
1.5.3.3 | CPU Credits Remaining |
1.5.3.4 | CPU Credits Gained Per Second |
1.5.3.5 | CPU Credits Background Tasks Alert Raised |
1.5.3.6 | CPU Credits Failover Alert Raised |
1.5.3.7 | CPU Credits Any Alert Raised |
1.5.4 | IO wait in % |
1.6.1 | Server allocated memory in MB |
1.6.2 | Server low memory flag value |
1.6.3 | Server total swap size in MB |
1.6.4 | Server total swap usage in MB |
1.6.5 | Server working set swap usage in MB |
1.6.6 | Dirty Memory that is used by the scratch buffers in MB |
1.6.7 | Server managed memory size in MB |
1.6.8 | Server unmanaged memory size in MB |
1.6.9 | Server encryption buffers memory being in use in MB |
1.6.10 | Server encryption buffers memory being in pool in MB |
1.6.11.X .2 |
GC info for X .Specifies if this is a concurrent GC or not. |
1.6.11.X .3 |
GC info for X .Gets the number of objects ready for finalization this GC observed. |
1.6.11.X .4 |
GC info for X .Gets the total fragmentation (in MB) when the last garbage collection occurred. |
1.6.11.X .5 |
GC info for X .Gets the generation this GC collected. |
1.6.11.X .6 |
GC info for X .Gets the total heap size (in MB) when the last garbage collection occurred. |
1.6.11.X .7 |
GC info for X .Gets the high memory load threshold (in MB) when the last garbage collection occurred. |
1.6.11.X .8 |
GC info for X .The index of this GC. |
1.6.11.X .9 |
GC info for X .Gets the memory load (in MB) when the last garbage collection occurred. |
1.6.11.X .10.1 |
GC info for X .Gets the pause durations. First item in the array. |
1.6.11.X .10.2 |
GC info for X .Gets the pause durations. Second item in the array. |
1.6.11.X .11 |
GC info for X .Gets the pause time percentage in the GC so far. |
1.6.11.X .12 |
GC info for X .Gets the number of pinned objects this GC observed. |
1.6.11.X .13 |
GC info for X .Gets the promoted MB for this GC. |
1.6.11.X .14 |
GC info for X .Gets the total available memory (in MB) for the garbage collector to use when the last garbage collection occurred. |
1.6.11.X .15 |
GC info for X .Gets the total committed MB of the managed heap. |
1.6.11.X .16.3 |
GC info for X .Gets the large object heap size (in MB) after the last garbage collection of given kind occurred. |
1.6.12.{0} | Monitor /proc/meminfo/ metrics (unix/linux). The description of each metric is available via endpoint <serverUrl>/monitoring/snmp/oids .See Get all OIDs. |
1.6.13 | Available memory for processing (in MB) |
1.7.1 | Number of concurrent requests |
1.7.2 | Total number of requests since server startup |
1.7.3 | Number of requests per second (one minute rate) |
1.7.3.1 | Number of requests per second (five second rate) |
1.7.4 | Average request time in milliseconds |
1.8 | Server last request time |
1.8.1 | Server last authorized non cluster admin request time |
1.9.1 | Server license type |
1.9.2 | Server license expiration date |
1.9.3 | Server license expiration left |
1.9.4 | Server license utilized CPU cores |
1.9.5 | Server license max CPU cores |
1.10.1 | Server storage used size in MB |
1.10.2 | Server storage total size in MB |
1.10.3 | Remaining server storage disk space in MB |
1.10.4 | Remaining server storage disk space in % |
1.10.5 | IO read operations per second |
1.10.6 | IO write operations per second |
1.10.7 | Read throughput in kilobytes per second |
1.10.8 | Write throughput in kilobytes per second |
1.10.9 | Queue length |
1.11.1 | Server certificate expiration date |
1.11.2 | Server certificate expiration left |
1.11.3 | List of well known admin certificate thumbprints |
1.11.4 | List of well known admin certificate issuers |
1.11.5 | Number of expiring certificates |
1.11.6 | Number of expired certificates |
1.12.1 | Number of processor on the machine |
1.12.2 | Number of assigned processors on the machine |
1.13.1 | Number of backups currently running |
1.13.2 | Max number of backups that can run concurrently |
1.14.1 | Number of available worker threads in the thread pool |
1.14.2 | Number of available completion port threads in the thread pool |
1.15.1 | Number of active TCP connections |
1.16.1 | Indicates if any experimental features are used |
1.17.1 | Value of the '/proc/sys/vm/max_map_count' parameter |
1.17.2 | Number of current map files in '/proc/self/maps' |
1.17.3 | Value of the '/proc/sys/kernel/threads-max' parameter |
1.17.4 | Number of current threads |
OID | Metric (Cluster) |
---|---|
3.1.1 | Current node tag |
3.1.2 | Current node state |
3.2.1 | Cluster term |
3.2.2 | Cluster index |
3.2.3 | Cluster ID |