Mapping conflicts with ELK

I recently started upgrading to a newer version of ES (4.5), and found that it refused to start:

IllegalStateException: unable to upgrade the mappings for the index

In fact, this mapping conflict was one of the things I was hoping the upgrade would solve. After a bit of reading it became clear that I would have to make some changes.

The mapping in question was a field from the logs called “level”. In the postgres logs it was a string (e.g. “INFO”), and in our application logs (using bunyan) it was an integer (40 => “WARN”).

To allow me to search using a range (e.g. level:[40 TO 60]), I was using a mutate filter to convert the string “40” to an integer, and this was the cause of the conflict.

My first thought was to copy the field before converting:

mutate {
    add_field => { "level_int" => "%{level}" }
    convert => { "level_int" => "integer" }
}

But it turns out that that’s not enough to avoid a conflict (possibly because ES guesses the type, and saw an int first?). So I went with the nuclear option, and renamed the field:

mutate {
    rename => { "level" => "level_int" }
    convert => { "level_int" => "integer" }
}

Now my new documents were conflict free. Unfortunately, the only solution provided for existing data is to export and re-import it, which I wasn’t really in the mood for.

Luckily, I’m not in any rush to upgrade, and we close indices after 30 days. So I plan to wait for a month, and hope my data is clean by then!

Grokking postgresql logs with logstash

Logstash provides a grok pattern for postgresql logs. Unfortunately, it doesn’t seem to be compatible with our postgres version (9.4), and our messages were all tagged with “_grokparsefailure”.

Using the fantastic grok debugger, I was able to produce something that worked:

%{DATESTAMP:timestamp} %{TZ} %{DATA:user_id} %{GREEDYDATA:connection_id} %{DATA:level}:  %{GREEDYDATA:msg}

I’ve created an issue here, to track it.

Parsing json from syslog entries with logstash

A consequence of moving to Debian 8 (and hence systemd), is that all our log data now goes to syslog. So long logrotate!

It does however require a change to the way we filter them, once they’ve been aggregated:

filter {
    if [type] == "syslog" {
        grok {
            match => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:syslog_message}" }
        }
    }

    if [program] == "foo" {
        json {
            source => "syslog_message"
        }
        mutate {
            convert => [ "level", "integer" ]
            remove_field => [ "hostname "]
        }
        date {
            match => [ "time", "ISO8601" ]
        }
    }
}

First, we parse the syslog entry, and put the free form message into a property named “syslog_message”. We could overwrite the existing message, but this makes it easier to investigate if it goes wrong.

Then, if the “program” (set by the SyslogIdentifier in your systemd unit file) matches, we parse the message as json and tidy up a few fields.