Graylog and ElasticSearch troubleshooting

Graylog field limit reached

Graylog shows the following indexer error message:

{
  "type": "illegal_argument_exception",
  "reason": "Limit of total fields [1000] in index [graylog_1] has been exceeded"
}

The value in the brackets is the affected index, here it’s graylog_1.
You have two options to update the index field limit:

1) full JSON config object

curl -XPUT 'localhost:9200/graylog_1/_settings' -H 'Content-Type: application/json' -d'
{
  "index" : {
    "mapping" : {
      "total_fields" : {
        "limit" : "3000"
      }
    }
  }
}'

2) flattened JSON config

curl -XPUT 'localhost:9200/graylog_1/_settings' -H 'Content-Type: application/json' -d '{"index.mapping.total_fields.limit": 3000}' 

Either command should return:

{
  "acknowledged" : true
}

You can inspect the index configuration with:

curl -XGET 'localhost:9200/graylog_1/_settings?pretty'
{
  "template": "graylog_*",
  "settings" : { 
    "index.mapping.total_fields.limit": 3000,
  },
  "mappings": {
    "message": {
      "properties": {
        "level": {
          "type": "keyword"
        }
      }
    }
  }
}

Update custom field mappings

From time to time it can happen that Graylog stores a field under the wrong type. To fix this we need to update the type mapping and force Graylog to rotate indices (type changes cannot be applied to an existing index). Errors like these can fill up the index failure log so we’ll clean that up while we’re at it.

In my case the active graylog index had the field level stored as number but it’s expected to also contain regular words. To update the type mapping I created the following JSON file with type set to keyword (the type I want it to have):

graylog-custom-mapping.json:

{
  "template": "graylog_*",
  "mappings": {
    "message": {
      "properties": {
        "level": {
          "type": "keyword"
        }
      }
    }
  }
}

When applying the type update I got a 406 error:

curl -X PUT -d @'graylog-custom-mapping.json' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
  "status" : 406
}

This error is due to strict content-type checking introduced in ElasticSearch 6.0:

Starting from Elasticsearch 6.0, all REST requests that include a body must also provide the correct content-type for that body. Source: https://www.elastic.co/de/blog/strict-content-type-checking-for-elasticsearch-rest-requests

All we need to do to fix this is add the proper content type:

curl -H 'Content-Type: application/json' -X PUT -d @'graylog-custom-mapping' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

Finally, we can rotate the index to apply our updated type mapping:

Go to: System> Ìndices > click on the Index Set name > Maintenance > Rotate active write index

Next, clean up the index failures log (mine contained 130.000 entries):

$ mongo

# switch to graylog db
> use graylog

# deleting all documents in the collection does not work...
> db.index_failures.remove({})
WriteResult({
	"nRemoved" : 0,
	"writeError" : {
		"code" : 20,
		"errmsg" : "cannot remove from a capped collection: graylog.index_failures"
	}
})

# so drop the entire collection instead
> db.index_failures.drop()
true

Restart graylog with sudo systemctl restart graylog-server.

Work with ES indices

List your indices

curl 'localhost:9200/_cat/indices?v'

Delete indices from the first 9 days of September

curl -XDELETE 'localhost:9200/rsyslog-2018.09.0*?pretty'

Delete indices from August

curl -XDELETE 'localhost:9200/rsyslog-2018.08*?pretty'

Delete indices from 2017

curl -XDELETE 'localhost:9200/rsyslog-2017*?pretty'

Logspout

Map your logspout port to a port on your docker host that is only reachable via localhost:

version: '3'

services:
  logspout:
    image: registry.example.com/devops/logspout-gelf:a4a9a71
    command: "gelf://loghost.example.com:12201"
    hostname: wolverine-01a
    restart: always
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    ports:
      - "127.0.0.1:40000:80"
    environment:
      BACKLOG: "false"
      INACTIVITY_TIMEOUT: "0"

This allows you to tail the events that logspout is registering:

curl http://localhost:40000/logs

Connectivity

The events from some of my hosts have been missing from graylog. To verify, that the UDP communication from the logspout forwarder to the graylog server works is by using netcat (sudo yum install -y nc):

# On the graylog server open a port:
firewall-cmd --zone=public --add-rich-rule 'rule family="ipv4" source address="${SOURCE_MACHINE_IP}" port port="13000" protocol="udp" accept' --permanent

# then listen on that UDP port:
netcat -u -l 13000

On the source machine, run:

# -u = UDP, -w 10 = timeout seconds 
echo -e '{"version": "1.1","host":"example.org","short_message":"Short message","full_message":"Backtrace here\n\nmore stuff","level":1,"_user_id":9001,"_some_info":"foo","_some_env_var":"bar"}\0' | nc -u -w 10 log.example.com 13000

You should see the message appear on your graylog server (remove -u on both sides to test via TCP instead of UDP). This confirms that your source machine is able to reach your graylog server, so remove the firewall exception on the graylog server again:

firewall-cmd --zone=public --remove-rich-rule 'rule family="ipv4" source address="${SOURCE_MACHINE_IP}" port port="13000" protocol="udp" accept' --permanent
firewall-cmd --complete-reload
firewall-cmd --zone=public --list-all