FacebookLinkedInShare

What is Graylog

Graylog is a tool that provides a simple (yet powerful) log capturing and analysis, being open-source it is extremely flexible and can provide a good foundation of a centralized logging system.

Graylog is divided to a web frontend and a back end server, in this example we will create a Docker container that includes both the frontend, back-end and the required prerequisites (MongoDB & ElasticSearch), we will Also use Supervisord to ensure that all services remain running.

What is Docker

Docker is a platform to wrap everything required by a certain software (files, binaries ,specific versions of programming languages and so on) into a single container, this container can then be run on any docker host even if it is from a different distribution.

File structure

The basic file structure for this project looks like:

-project root

-Dockerfile

-ElasticSearch.repo

-mongodb-org-3.0.repo

-graylog-server.sh

-conf/

-elasticsearch.yml

-mongod.conf

-server.conf

-upervisord.conf

-web.conf

We will go through each part of it in greater detail below.

The Dockerfile

The first step is creating the Dockerfile, this is what tells Docker how to create the image which will run everything in it.

The basic Dockerfile commands we will use are:

FROM – choose what baseline image to start from, this can be any docker image, https://hub.docker.com is a great location to get a baseline image

MAINTAINER – who maintains the docker image

RUN – run a command during the image build process,  equivalent to running a bash command in a normal Linux server

COPY – copy a file from the host into the docker image

EXPOSE – open a container port to the host, much like opening a firewall port

CMD- what command to run when a container of the Docker image starts

First we need to tell Docker what base image to use and who is the image maintainer, I chose to use CentOS 6.6 as I wanted to maintain backwards compliance with existing undockerized graylog servers we have.

## Set the base image

FROM centos:6.6

## File maintainer

MAINTAINER Naor Livne

After Docker knows what is the baseline image it starts with we now need to install the required packages in it, first we install Java which is a requirement of ElasticSearch & graylog:

# install java

RUN yum -y install java-1.7.0-openjdk

Then we install ElasticSearch, this is where graylog stores the logs and is what allows it to search them so fast, the first 2 lines set the elasticsearch repository, the 3rd line installs it and the 4th creates the configuration folder:

# install ElasticSearch

RUN rpm –import https://packages.elastic.co/GPG-KEY-elasticsearch

COPY elasticsearch.repo /etc/yum.repos.d/elasticsearch.repo

RUN yum -y install elasticsearch

RUN mkdir -p /usr/share/elasticsearch/config

After that we install mongo, which is where all the web front-end data (users, preferences, etc’) is stored, similar to ElasticSearch the 1st line set the repository, the 2nd installs and the 3rd sets the DB storage folder:

# install MongoDB

COPY mongodb-org-3.0.repo /etc/yum.repos.d/mongodb-org-3.0.repo

RUN yum install -y mongodb-org

RUN mkdir -p /data/db

After that we install graylog:

# install graylog

RUN rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-1.0-repository-el6_latest.rpm

RUN yum -y install graylog-server graylog-web

COPY graylog-server.sh /usr/bin/graylog-server.sh

RUN chmod 1777 /usr/bin/graylog-server.sh

And to finish it up we will install supervisord, we will later use it to ensure all “services” remain running:

# install supervisord

RUN yum -y install python-setuptools

RUN easy_install supervisor

We now have a dockerfile that includes all the needed software, all that remains is to configure them to work with each other, we will accomplish this by creating a new config file for each service and use the Docker COPY command to overwrite the default one:

#copy config files

COPY conf/server.conf /etc/graylog/server/server.conf

COPY conf/elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

COPY conf/web.conf /usr/share/graylog-web/conf/graylog-web-interface.conf

COPY conf/supervisord.conf /etc/supervisord.conf

COPY conf/mongod.conf /etc/mongod.conf

We now expose the ports, this is the Container equivalent of opening the ports in the firewall:

#open ports

EXPOSE 9000

EXPOSE 12202

EXPOSE 12201

EXPOSE 12900

Unlike in a server a docker container will only run as long as there is a task running in the foreground, and as we have multiple needed commands we will user supervisord as the main command and have it orchestrate the running state of everything needed:

#run supervisord

CMD [“/usr/bin/supervisord”]

The finished Dockerfile should look something like this:

############################################################

# Dockerfile to build graylog container images

#

# Based on: Centos Official image

#

# Created On: May 27, 2015

# Author: Naor Livne <naor.livne@naturalint.com>

############################################################

## Set the base image

FROM centos:6.6

## File maintainer

MAINTAINER Naor Livne

## Update OS packages

#RUN yum -y update

############################################################

#

# INSTALLATION

# ———————————————————-

#

#

############################################################

# install java

RUN yum -y install java-1.7.0-openjdk

# install ElasticSearch

RUN rpm –import https://packages.elastic.co/GPG-KEY-elasticsearch

COPY elasticsearch.repo /etc/yum.repos.d/elasticsearch.repo

RUN yum -y install elasticsearch

RUN mkdir -p /usr/share/elasticsearch/config

# install MongoDB

COPY mongodb-org-3.0.repo /etc/yum.repos.d/mongodb-org-3.0.repo

RUN yum install -y mongodb-org

RUN mkdir -p /data/db

# install graylog

RUN rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-1.0-repository-el6_latest.rpm

RUN yum -y install graylog-server graylog-web

COPY graylog-server.sh /usr/bin/graylog-server.sh

RUN chmod 1777 /usr/bin/graylog-server.sh

# install supervisord

RUN yum -y install python-setuptools

RUN easy_install supervisor

############################################################

#

# CONFIGURATION

# ———————————————————-

#

############################################################

#copy config files

COPY conf/server.conf /etc/graylog/server/server.conf

COPY conf/elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

COPY conf/web.conf /usr/share/graylog-web/conf/graylog-web-interface.conf

COPY conf/supervisord.conf /etc/supervisord.conf

COPY conf/mongod.conf /etc/mongod.conf

#create log files

RUN mkdir -p /var/log/supervisor

RUN mkdir -p /var/log/graylog

RUN mkdir -p /var/log/MongoDB

RUN mkdir -p /var/log/Elesticsearch

############################################################

#

# RUNNING

# ———————————————————-

#

############################################################

#open ports

EXPOSE 9000

EXPOSE 12202

EXPOSE 12201

EXPOSE 12900

#run supervisord

CMD [“/usr/bin/supervisord”]

ElasticSearch.repo & mongodb-org-3.0.repo

As CentOS 6.x doesn’t include neither the ElasticSearch or mongodb Repositories by default we will have to manually add them, we do this by creating two files (creatively named elasticsearch.repo and mongodb-org-3.0.repo), the copy commands we used in the dockerfile above then place a copy of it in the container yum.repo.d folder:

ElasticSearch.repo:

[elasticsearch-1.5]

name=Elasticsearch repository for 1.5.x packages

baseurl=http://packages.elastic.co/elasticsearch/1.5/centos

gpgcheck=1

gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch

enabled=1

mongodb-org-3.0.repo:

[mongodb-org-3.0]

name=MongoDB Repository

baseurl=http://repo.mongodb.org/yum/redhat/6/mongodb-org/3.0/x86_64/

gpgcheck=0

enabled=1

Graylog-server.sh

For reasons unclear to me the installed version of graylog doesn’t includes a way to run the graylog-server in the foreground (IE non demonized), and as it is considered best practice with supervisord to run everything non demonized so I created the script below to run it in the foreground:

#!/bin/sh

java -jar -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dlog4j.configuration=file:///etc/graylog/server/log4j.xml /usr/share/graylog-server/graylog.jar server -f /etc/graylog/server/server.conf

conf/elasticsearch.yml

This file configures ElasticSearch, there are a lot of possible configuration options but for this guide we only have to use one to get the basic working:

cluster.name: graylog2

conf/mongod.conf

This file configures MongoDB:

# mongod.conf

#where to log

logpath=/var/log/mongodb/mongod.log

logappend=true

# fork and run in background

fork=true

#port=27017

dbpath=/var/lib/mongo

# location of pidfile

pidfilepath=/var/run/mongodb/mongod.pid

# Listen to local interface only. Comment out to listen on all interfaces.

bind_ip=127.0.0.1

conf/server.conf

This file configure the graylog-server back-end, there are two password hashes that will have to be created and placed in the file, the 1st is the result of running “pwgen -N 1 -s 96” that needs to be placed in “password_secret” and the 2nd is the result of “echo -n yourpassword | shasum -a 256” that needs to be placed in root_password_sha2, remember to change yourpassword to something more secure as this will be the root user (admin by default) password:

# If you are running more than one instances of graylog2-server you have to select one of these

# instances as master. The master will perform some periodical tasks that non-masters won’t perform.

is_master = true

# The auto-generated node ID will be stored in this file and read after restarts. It is a good idea

# to use an absolute file path here if you are starting graylog2-server from init scripts or similar.

node_id_file = /etc/graylog/server/node-id

# You MUST set a secret to secure/pepper the stored user passwords here. Use at least 64 characters.

# Generate one by using for example: pwgen -N 1 -s 96

password_secret = “Insert the result of pwgen -N 1 -s 96 here”

# The default root user is named ‘admin’

#root_username = admin

# You MUST specify a hash password for the root user (which you only need to initially set up the

# system and in case you lose connectivity to your authentication back-end)

# This password cannot be changed using the API or via the web interface. If you need to change it,

# modify it in this file.

# Create one by using for example: echo -n yourpassword | shasum -a 256

# and put the resulting hash value into the following line

root_password_sha2 = “ insert the hash of your password here”

# Set plugin directory here (relative or absolute)

plugin_dir = /usr/share/graylog-server/plugin

# REST API listen URI. Must be reachable by other graylog2-server nodes if you run a cluster.

rest_listen_uri = http://127.0.0.1:12900/

# Enable CORS headers for REST API. This is necessary for JS-clients accessing the server directly.

# If these are disabled, modern browsers will not be able to retrieve resources from the server.

# This is disabled by default. Uncomment the next line to enable it.

#rest_enable_cors = true

# Analyzer (tokenizer) to use for message and full_message field. The “standard” filter usually is a good idea.

# All supported analyzers are: standard, simple, whitespace, stop, keyword, pattern, language, snowball, custom

# Elasticsearch documentation: http://www.elasticsearch.org/guide/reference/index-modules/analysis/

# Note that this setting only takes effect on newly created indices.

elasticsearch_analyzer = standard

# Batch size for the Elasticsearch output. This is the maximum (!) number of messages the Elasticsearch output

# module will get at once and write to Elasticsearch in a batch call. If the configured batch size has not been

# reached within output_flush_interval seconds, everything that is available will be flushed at once. Remember

# that every outputbuffer processor manages its own batch and performs its own batch write calls.

# (“outputbuffer_processors” variable)

output_batch_size = 500

# Flush interval (in seconds) for the Elasticsearch output. This is the maximum amount of time between two

# batches of messages written to Elasticsearch. It is only effective at all if your minimum number of messages

# for this time period is less than output_batch_size * outputbuffer_processors.

output_flush_interval = 1

# As stream outputs are loaded only on demand, an output which is failing to initialize will be tried over and

# over again. To prevent this, the following configuration options define after how many faults an output will

# not be tried again for an also configurable amount of seconds.

output_fault_count_threshold = 5

output_fault_penalty_seconds = 30

# The number of parallel running processors.

# Raise this number if your buffers are filling up.

processbuffer_processors = 5

outputbuffer_processors = 3

# Wait strategy describing how buffer processors wait on a cursor sequence. (default: sleeping)

# Possible types:

# – yielding

# Compromise between performance and CPU usage.

# – sleeping

# Compromise between performance and CPU usage. Latency spikes can occur after quiet periods.

# – blocking

# High throughput, low latency, higher CPU usage.

# – busy_spinning

# Avoids syscalls which could introduce latency jitter. Best when threads can be bound to specific CPU cores.

processor_wait_strategy = blocking

# Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore.

# For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache.

# Start server with –statistics flag to see buffer utilization.

# Must be a power of 2. (512, 1024, 2048, …)

ring_size = 65536

inputbuffer_ring_size = 65536

inputbuffer_processors = 2

inputbuffer_wait_strategy = blocking

# Enable the disk based message journal.

message_journal_enabled = true

# The directory which will be used to store the message journal. The directory must me exclusively used by Graylog and

# must not contain any other files than the ones created by Graylog itself.

message_journal_dir = /var/lib/graylog-server/journal

# MongoDB “dead_letters” collection to make sure that you never lose a message. The actual writing of dead

# letter should work fine already but it is not heavily tested yet and will get more features in future

# releases.

dead_letters_enabled = false

# How many seconds to wait between marking node as DEAD for possible load balancers and starting the actual

# shutdown process. Set to 0 if you have no status checking load balancers in front.

lb_recognition_period_seconds = 3

# MongoDB Configuration

mongodb_useauth = false

#mongodb_user = grayloguser

#mongodb_password = 123

mongodb_host = 127.0.0.1

#mongodb_replica_set = localhost:27017,localhost:27018,localhost:27019

mongodb_database = graylog2

mongodb_port = 27017

# Raise this according to the maximum connections your MongoDB server can handle if you encounter MongoDB connection problems.

mongodb_max_connections = 100

# Number of threads allowed to be blocked by MongoDB connections multiplier. Default: 5

# If mongodb_max_connections is 100, and mongodb_threads_allowed_to_block_multiplier is 5, then 500 threads can block. More than that and an exception will be thrown.

# http://api.mongodb.org/java/current/com/mongodb/MongoOptions.html#threadsAllowedToBlockForConnectionMultiplier

mongodb_threads_allowed_to_block_multiplier = 5

conf/web.conf

This file configure the graylog-web frontend, once again a password hash is needed to be created and placed in the file,the result of running “pwgen -N 1 -s 96” needs to be placed in “application.secret”

# graylog2-server REST URIs (one or more, comma separated) For example: “http://127.0.0.1:12900/,http://127.0.0.1:12910/”

graylog2-server.uris=”http://172.18.5.229:12900/,http://127.0.0.1:12900/,http://127.0.0.1:12910/”

# Learn how to configure custom logging in the documentation:

# https://www.graylog.org/documentation/setup/webinterface/

# Secret key

# ~~~~~

# The secret key is used to secure cryptographics functions. Set this to a long and randomly generated string.

# If you deploy your application to several instances be sure to use the same key!

# Generate for example with: pwgen -N 1 -s 96

application.secret=”Insert the result of pwgen -N 1 -s 96 here”

# Web interface timezone

# Graylog stores all timestamps in UTC. To properly display times, set the default timezone of the interface.

# If you leave this out, Graylog will pick your system default as the timezone. Usually you will want to configure it explicitly.

# timezone=”Europe/Berlin”

# Message field limit

# Your web interface can cause high load in your browser when you have a lot of different message fields. The default

# limit of message fields is 100. Set it to 0 if you always want to get all fields. They are for example used in the

# search result sidebar or for autocompletion of field names.

field_list_limit=100

# You usually do not want to change this.

application.global=lib.Global

conf/supervisord.conf

The following is the supervisord configuration, it ensures all “services” are always running and will direct the stdout & stderr to the logfiles for easier bug hunting in case of errors:

[supervisord]

nodaemon = true

[program:elesticsearch]

command = /usr/share/elasticsearch/bin/elasticsearch -Des.default.path.data=/var/lib/elasticsearch

stdout_logfile = /var/log/supervisor/%(program_name)s.log

stderr_logfile = /var/log/supervisor/%(program_name)s.log

environment = JAVA_HOME=”/usr/lib/jvm/jre”

autorestart = true

[program:MongoDB]

command = mongod

stdout_logfile = /var/log/supervisor/%(program_name)s.log

stderr_logfile = /var/log/supervisor/%(program_name)s.log

autorestart = true

[program:graylog-server]

command = /usr/bin/graylog-server.sh

stdout_logfile = /var/log/supervisor/%(program_name)s.log

stderr_logfile = /var/log/supervisor/%(program_name)s.log

autorestart = true

[program:graylog-web]

command = /usr/share/graylog-web/bin/graylog-web-interface

stdout_logfile = /var/log/supervisor/%(program_name)s.log

stderr_logfile = /var/log/supervisor/%(program_name)s.log

autorestart = true

Docker Build & Docker Run

Now that everything is set up all that remains is building the image from the Dockerfile, from the project root folder run the following command:

sudo docker build -t graylog .

It will take a bit of time but once it’s done you should have a docker image tagged graylog, you can confirm it by running “sudo docker images | grep graylog” before continuing to run the container:

sudo docker run -p=9000:9000 -p=12900:12900 -p=12901:12901 -d graylog

This command starts the container and using the “-p” flag publishes the host ports to the container to allow access to it from outside the localhost, the “-d” flag deamonize the container as well.

You should now be able to connect to the the graylog webhost by pointing your browser to Hostname:9000 or host_ip:9000 and login using the user “admin” and the password you set in the server.conf file.

Now that you have a graylog container working all that remains is sending to it the log data you want to view, this can be achieved in multiple ways, we chose to use logstash to aggregate the data of our servers into graylog but that is outside the scope of this post.

  • Dansky

    Excellent review Thanks for the instructions.
    I gather Graylog is new on the market. And rather cheap. How does it rank when compared with existing alternatives, like Splunk?

  • Dansky

    Thanks for a great review and instructions.

    I gather Graylog is new on the market. And rather cheap. How does it rank when compared with existing alternatives, like Splunk?

  • disqus_K29vzian2O

    This is sort of wrong. ONE process per container, stitch them together with, say, docker-compose.