Log prometheus alerts in a file when monitoring OpenShift

I am starting to be a big fan of Prometheus and mainly when it is used to monitor OpenShift.

Here is a quick hack for those of you who wants to log alerts in a file when they are processed by alertmanager. 

Just a small reminder of how Prometheus and Alertmanager works together:

  • Prometheus is responsible of scraping metrics from exporter and at the same time, you can define alerts on these metrics using the prometheus alerting rule language. These alerts have condition, time boxing, summary and severity. We usually define prometheus alerts in .rules files located in /etc/prometheus-rules.

 

  • Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email or more simply webhooks. The Alertmanager configuration usually resides in /etc/alertmanager/config.yml and it allows you to define routes, which basically describe what happen to an alert when it fires.

Alertmanager supports a few receivers which can process your alerts. A simple, yet generic one, is the webhook receiver. And for a custom need, I had to write a simple file-webhook that only append the request body to a file named alerts.log, so it can be used later for tracing or whatever. The good thing is that it is a simple nodejs app, and it uses nodejs s2i image provided by OpenShift. So, you just have to:

oc project monitoring
oc new-build  openshift/nodejs~https://github.com/akram/prometheus-file-webhook.git \
              --name=file-webhook

Once the build is finished, you can deploy the file-webhook app and add a persistent volume to it.

oc new-app file-webhook
oc volume --add dc/file-webhook -t pvc --name=alerts --type=persistentVolumeClaim \
          --claim-name=alerts --claim-size='1Gi' --mount-path=/opt/app-root/src/logs

Then, you will have new service called file-webhook in your project which we can plug to alertmanager to write alerts to file. To make alertmanager use the service, edit your alertmanager configuration by using oc edit configmap alertmanager-configmap and change the value for the config.yml key to:

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://file-webhook.monitoring.svc.cluster.local:8080/'

Then, everytime your alerts fire, you will see a new line in alerts.log, something like this:

"receiver":"web\\.hook","status":"firing","alerts":[{"status":"resolved","labels":{"alertname":"PodRestartingTooOften","container":"alert-file-logger2","instance":"172.20.2.64:8080","job":"kubernetes-service-endpoints","kubernetes_name":"kube-state-metrics","namespace":"monitoring","pod":"alert-file-logger2-1-61h5p","severity":"page"},"annotations":{"DESCRIPTION":"Pod monitoring/alert-file-logger2-1-61h5p restarting more than once times during last 2 hours.","SUMMARY":"Pod monitoring/alert-file-logger2-1-61h5p restarting more than once times during last 2 hours."},"startsAt":"2017-11-14T09:41:37.803Z","endsAt":"2017-11-14T10:10:38.254Z","generatorURL":"http://prometheus-15-p4lbc:9090/graph?g0.expr=rate%28kube_pod_container_status_restarts%5B2h%5D%29+%2A+7200+%3E+1\u0026g0.tab=1"},{"status":"firing","labels":{"alertname":"PodRestartingTooOften","container":"prometheus","instance":"172.20.2.64:8080","job":"kubernetes-service-endpoints","kubernetes_name":"kube-state-metrics","namespace":"monitoring","pod":"prometheus-15-k2fj8","severity":"page"},"annotations":{"DESCRIPTION":"Pod monitoring/prometheus-15-k2fj8 restarting more than once times during last 2 hours.","SUMMARY":"Pod monitoring/prometheus-15-k2fj8 restarting more than once times during last 2 hours."},"startsAt":"2017-11-14T09:41:37.803Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-15-p4lbc:9090/graph?g0.expr=rate%28kube_pod_container_status_restarts%5B2h%5D%29+%2A+7200+%3E+1\u0026g0.tab=1"},{"status":"firing","labels":{"alertname":"PodRestartingTooOften","container":"hawkular-openshift-agent","instance":"172.20.2.64:8080","job":"kubernetes-service-endpoints","kubernetes_name":"kube-state-metrics","namespace":"monitoring","pod":"hawkular-openshift-agent-wqdgn","severity":"page"},"annotations":{"DESCRIPTION":"Pod monitoring/hawkular-openshift-agent-wqdgn restarting more than once times during last 2 hours.","SUMMARY":"Pod monitoring/hawkular-openshift-agent-wqdgn restarting more than once times during last 2 hours."},"startsAt":"2017-11-14T09:41:37.803Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-15-p4lbc:9090/graph?g0.expr=rate%28kube_pod_container_status_restarts%5B2h%5D%29+%2A+7200+%3E+1\u0026g0.tab=1"}],"groupLabels":{"alertname":"PodRestartingTooOften"},"commonLabels":{"alertname":"PodRestartingTooOften","instance":"172.20.2.64:8080","job":"kubernetes-service-endpoints","kubernetes_name":"kube-state-metrics","namespace":"monitoring","severity":"page"},"commonAnnotations":{},"externalURL":"http://alertmanager-16-zxx5j:9093","version":"4","groupKey":"{}:{alertname=\"PodRestartingTooOften\"}"}

Voilà, enjoy, and feel free to share.

Leave a Reply