Performance Dashboards Step 2 – Installing and Configuring Graphite

Now that OnCommand Performance Manager (OPM) is ready to send data, we need to configure something to catch that data. Graphite was originally written by the operations team at Orbitz.com way back in 2006 or so. Since then it has become the most widespread graphing solution of its type and is in use at many enterprises. The Graphite data format has become as close to a standard as there is in this space, so many things work with it. One of the reasons for the data format popularity is its simplicity. Timestamp, metric, and value are the only things in the datastream.

Graphite runs on most flavors of Unix or Linux, but the most common platforms are Ubuntu and Centos or RHEL. There are many good articles out there on how to install Graphite on these platforms. Your search engine of choice can find some for you, but I’ll put links at the end of this post pointing to the ones I’ve found to be the best or most helpful. There are also docker images to download and use if you like. It also helps to look at some other pages to understand the architecture of Graphite, because the Carbon and Whisper components are something you’ll need a high-level understanding of. Check out http://graphite.wikidot.com/high-level-diagram for exactly what the URL says it is.

Things to think about before installation

What I really want to talk about are some of the configuration choices that you’ll need to make. One of the first things to think about is your retention policy. If you choose incorrectly, you run the risk of exhausting all of the disk space on the server within minutes of receiving your first measurements. So before we make our choice on retention policy, let’s talk about how Graphite stores data.

Important Configuration Choice #1 – Configure your retention periods for collected measurements before you start to receive data, but do so wisely.

Whisper is the database component of Graphite that stores our measurements. It is similar to RRD (used by MRTG) but overcomes some of the limitations of RRD. (See http://graphite.wikidot.com/whisper for more information.) Whisper stores each metric gathered in its own file. When Whisper receives the first measurements, it looks for matches in the retention policy file, storage-schemas.conf. The matches are found using regex so you can specify different retention periods for different metrics. In this file you specify how often datapoints should be stored and for how long. It then pre-allocates all of the space to the database file that matches that particular metric. Each datapoint stored is 12 bytes. The default policy is to store a measurement every 10 seconds and keep those measurements for 90 days. Simple math (90 days/10 seconds) gives us 7,776,000 datapoints using that policy. That means almost 89 MB of disk space per metric is used the first time Graphite receives a measurement of that metric.

89 MB doesn’t seem like much, but consider how many metrics you are collecting. At the highest detail level that OPM exports, you are getting 8 metrics per volume, 11 metrics per LUN, 6 metrics per NAS LIF, etc. It is easy to get to over 500 metrics gathered per cluster. At 500 metrics, that is almost 43.5 GB used in in the first hour that this is running. Add a volume, 712 more MB used. Add a LUN – just shy of 1 GB used the first time it is measured.

Further, the way Whisper works is that changes to the retention period don’t take effect until you run a script that resizes the database files. This is why it is wise to set your retention policies before you receive any data. If you charge into this half-wise (as I did initially) and decide to change the default from 10 seconds: 90 days to 10 seconds: 5 years you get about 361 MB per metric. Tracking about 1,000 metrics for that period, or over 352 GB of data – and I didn’t even factor in leap year J – doesn’t work too well on a VM that has an 80 GB disk.

And honestly trying to keep 5 years of data at a 10 second granularity was massive overkill. Are most of us going to go back 4 and a half years to see what read latency on a particular LUN was between 9:36 AM and 9:47 AM? Probably not. Luckily, Whisper can set multiple retention times per measurement. This means we can keep data at different granularity levels for different periods. I would’ve been much smarter to create a retention policy that says something like 10 seconds:90 days, 5 minutes:1 year, 1 hour: 5 years. That gives me 5 years of data (at a much lower granularity level) for just over 90 MB of space. Whisper doesn’t calculate values on retrieval, but it actually store 1 measure for each retention period, so that is why more space is used. Each set of retention periods is additive per metric is a simpler way to think of it. Go to http://graphite.wikidot.com/getting-your-data-into-graphite for details on this.

Of course if you wanted to actually retain 10 second measurements for 5 years, NetApp storage is the place to do it. In the beginning our compression and deduplication rates would be phenomenal!

Important Configuration Choice #2 – Plan your Graphite architecture before receiving data in large environments.

Remember that OPM will forward data to exactly one external server. Carbon, the data receiver component of Graphite, can forward metrics that match a particular regex pattern to another or even multiple servers. This is done by configuring the Carbon Relay service. This can all be done later, but without a lot of planning, you may end up with gaps in the collected data.

 

Actually Installing Graphite on Ubuntu

These are the steps that I use to install Graphite. Anything that is non-default I’ll give a reason for, and don’t feel that I’ve necessarily made the best choices, although I did research things before I did them. If using a VM, plan the size of your virtual disk using the factors mentioned above. I did this on Ubuntu Server 14.04.1 without incident and I only know enough Linux to play an admin on TV.  The lines with the shaded background are console commands or in text files that you’ll be editing.  For text file edits, the text is bold is what you add or change.

  1. Install Ubuntu Server and choose only OpenSSH and LAMP Server as the components installed. Also, set the hostname and configure DNS so that resolution is working both on the server and from the clients.
  2. After installation at the first boot of the server, run
    sudo aptitude update
    sudo aptitude safe-upgrade OR sudo aptitude full-upgrade
    Note that if you are like me and don’t like typing “sudo” before the real command, just run “sudo –s” after you login and you’ll be root and not need to use sudo for the rest of the session.
  3. Install Graphite and the required Apache modules. Also install wget because we’ll use that later to install some other software.
    sudo apt-get install graphite-web graphite-carbon libapache2-mod-wsgi wget
    On my server it installed a lot of dependency and suggested packages, so 24 new packages using 45.9 MB of additional disk space was used.
    It will ask if you want to keep the whisper files if you remove Graphite in case you want to keep them in case you reinstall. I chose yes, but it this isn’t really that important either way.
  4. Initialize the Graphite database.
    sudo graphite-manage syncdb
    It will ask to create a superuser for the Django subsystem. Answer yes and keep track of the username and password that you use because you need it for the next step.
  5. Edit /etc/graphite/localsettings.py to reflect the credentials created in the last step. I used “root” for the username and “graphite” for the password. It should look something like this:
    DATABASES = {
            ‘default’: {
            ‘NAME’: ‘/var/lib/graphite/graphite.db’,
            ‘ENGINE’: ‘django.db.backends.sqlite3’,
            ‘USER’: ‘root‘,
            ‘PASSWORD’: ‘graphite‘,    
            ‘HOST’: ”,
            ‘PORT’: ”
        }
    }
  6. Synchronize the database again and make sure there are no errors.
    sudo graphite-manage syncdb
  7. There are a few permissions that need to be changed for Graphite to run properly.
    sudo chmod 666 /var/lib/graphite/graphite.db
    sudo chmod 755 /usr/share/graphite-web/graphite.wsgi
  8. Edit the file “/etc/default/graphite-carbon” to enable the Carbon component of Graphite to start automatically when the server boots. Simply change the word “false” to “true” so that it looks like this –
    # Change to true, to enable carbon-cache on boot
    CARBON_CACHE_ENABLED=true
  9. Now start the carbon service.
    service carbon-cache restart
  10. Now we need to make a couple of changes to Apache to server Graphite. We are just going to link a file included with Graphite into some Apache configuration directories.
    ln -s /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available/

    ln -s /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-enabled/

  11. Since we are running Graphite and Grafana on the same server, and Grafana is the interface we’ll be in most of the time after setup, we should change Graphite to run on some port other than 80. I typically use 8080. That way the end users of Grafana won’t have to worry about browsing to a subdirectory. To do that we’ll change the /etc/apache/ports.conf file and the /usr/share/graphite-web/apache2-graphite.conf to use a different port for Graphite.
    In /etc/apache2/ports.conf add a line just under the line that says “Listen 80” that says “Listen 8080” so that it looks like this –

    # If you just change the port or add more ports here, you will likely also

    # have to change the VirtualHost statement in
    # /etc/apache2/sites-enabled/000-default.conf

     Listen 80
    Listen 8080

    <IfModule ssl_module>

    Now update /usr/share/graphite-web/apache2-graphite.conf so that the first line looks like this-
    <VirtualHost *:8080>

  12. Now restart Apache to activate the changes.
    sudo service apache2 restart
    Now if you browse to the server on port 8080 you should see the Graphite interface. Browser the tree under the Carbon leaf to make sure you are getting data. Browse to the end of the tree and pick “commitedPoints” and make sure the value is non-zero.
  13. The next few steps will be to install and configure InfluxDB. InfluxDB is used to store the dashboards we create with Grafana. None of our metric data is stored in InfluxDB so it won’t require much space or memory. Use wget to download InfluxDB and then install it using dpkg.
    wget http://s3.amazonaws.com/influxdb/influxdb_latest_amd64.deb

    sudo dpkg -i influxdb_latest_amd64.deb
    service influxdb start

  14. Create a user and database in InfluxDB to store the dashboards for Grafana. InfluxDB has an easy to use GUI that you access by browsing to the srever on port 8083. The default username and password in InfluxDB is “root” and “root” – you probably want to change that.

    Next create a database and user for the dashboards. I’m very creative and original, so my database is named “dashboards” and I’m using “graphite” for both the username and password for this database.

    After you create the database, click on the database name and create a user. I made this user an admin user also. I’m not sure if that is required. When you create the user it will say admin is false, but wait a few seconds and refresh the page. Now we are finished with configuring InfluxDB, but remember the database name, username, and password that you just created for the next step.

  15. Download and install Grafana. Let’s back up the old index.html that came with Apache as the first step, then we download and install Grafana.
    cd /var/www/html
    sudo mv index.html index.html.old

    Then browse to http://grafana.org/download to see the latest version and download link. Copy the link address for the tar file and paste it into your terminal window and download it with wget. I’m getting version 1.9.0.
    sudo wget http://grafanarel.s3.amazonaws.com/grafana-1.9.0.tar.gz
    Extract the contents of the tar file.sudo tar xfzv grafana-1.9.0.tar.gz -C /var/www/html/ –strip-components 1
  16. Modify Apache to allow Grafana to connect to Graphite. We need to modify the Apache conf file, enable header, and then restart Apache. Modify /etc/apache2/apache2.conf to add the following lines just before the end so it looks like this –

    # Include the virtual host configurations:

    IncludeOptional sites-enabled/*.conf

    Header set Access-Control-Allow-Origin “*”
    Header set Access-Control-Allow-Methods “GET, OPTIONS”
    Header set Access-Control-Allow-Headers “origin, authorization, accept”

    # vim: syntax=apache ts=4 sw=4 sts=4 sr noet

    Enable header –
    sudo a2enmod headers

    Restart Apache once again
    sudo service apache2 restart

     

  17. Modify the Grafana configuration to connect to the local Graphite installation. Start by copying the sample Grafana configuration file in the /var/www/html directory to the name of the file that Grafana actually uses.sudo cp config.sample.js config.jsNow we need to modify the configuration to use the InfluxDB database we set up earlier. The sample configuration that we copied has blocks of examples that are commented out. We are going to use Graphite as the datasource and InfluxDB for dashboards, so we’ll be mashing together the first two example sections. The easiest way to do this is to copy the InfluxDB example and the Graphite/Elasticsearch example to notepad, make the edits and paste that section back into the config.js file. Make sure that the section you paste back in is not commented out with /* and */ before and after the block you edited. It should look like this –
    return new Settings({
    /* Data sources

    * ========================================================
    * Datasources are used to fetch metrics, annotations, and serve as dashboard storage
    * – You can have multiple of the same type.
    * – grafanaDB: true marks it for use for dashboard storage
    * – default: true marks the datasource as the default metric source (if you have multiple)
    * – basic authentication: use url syntax http://username:password@domain:port
    */

    datasources: {
      graphite: {
      type: ‘graphite’,
      url: “http://graphite.local:8080”,
      },

    grafana: {
      type: ‘influxdb’,
      url: “http://graphite.local:8086/db/dashboards”,
      username: ‘graphite’,
      password: ‘graphite’,
      grafanaDB: true
      },
    },

    // InfluxDB example setup (the InfluxDB databases specified need to exist)
    /*

     

    Make sure that the block you paste back in is not commented out.

  18. Test that Grafana is working. Just browse to your server’ address on the standard http port.

You now have a fully functional Graphite/InfluxDB/Grafana server ready to catch data. In my next post we’ll talk about what to do next and how to build a simple dashboard showing data exported from OnCommand Performance Manager.

2 thoughts on “Performance Dashboards Step 2 – Installing and Configuring Graphite”

  1. I found this information very useful and timely as I’m in the process of configuring a Grafana/Graphite server as I write this. Looking forward to the rest of the series… hopefully soon. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *