The samber/awesome-prometheus-alerts repository is a comprehensive collection of production-ready Prometheus alerting rules designed to facilitate robust monitoring and alerting across a wide range of services and infrastructure components. Prometheus is a popular open-source monitoring and alerting toolkit, and this repository aims to simplify the process of setting up effective alerting by providing a curated set of YAML alert rules that can be easily integrated into Prometheus and Alertmanager setups.
The repository features over 940 alerting rules covering more than 90 different services, making it one of the most extensive resources available for Prometheus users. These rules are organized by category, including basic resource monitoring, databases, message brokers, proxies and load balancers, runtimes, data engineering platforms, orchestrators, CI/CD tools, network and security components, storage solutions, cloud providers, and observability tools. Each category contains specific alerting rules tailored to the unique metrics and operational concerns of the respective service or technology.
For basic resource monitoring, the repository includes alerts for Prometheus self-monitoring, host and hardware health, SMART disk status, IPMI, Docker containers, Windows servers, VMware, Proxmox VE, Netdata, eBPF, process exporters, and systemd. These rules help ensure the underlying infrastructure remains healthy and performant.
Database alerting rules span popular systems such as MySQL, PostgreSQL, SQL Server, Oracle, Redis, Memcached, MongoDB, Elasticsearch, OpenSearch, Meilisearch, Cassandra, Clickhouse, CouchDB, and Solr. These rules are crafted to detect issues like replication lag, high query latency, resource exhaustion, and other common database problems.
Message broker alerts cover RabbitMQ, Zookeeper, Kafka, Pulsar, and NATS, focusing on queue health, broker availability, and message throughput. Proxies, load balancers, and service meshes such as Nginx, Apache, HAProxy, Traefik, Caddy, Envoy, Linkerd, and Istio are also included, with rules for monitoring traffic, error rates, and service mesh health.
Runtime alerts are available for PHP-FPM, JVM, Golang, Ruby, Python, and Sidekiq, addressing application-level metrics like memory usage, request latency, and error rates. Data engineering platforms like Apache Flink, Spark, and Hadoop have rules for job failures, resource utilization, and cluster health.
Orchestrators such as Kubernetes, Nomad, Consul, Etcd, and OpenStack are covered with alerts for pod failures, node health, and cluster stability. CI/CD tools including Jenkins, ArgoCD, FluxCD, GitLab CI, and Spinnaker have rules for pipeline failures and job status.
Network and security alerts include SpeedTest, SSL/TLS certificate expiration, cert-manager, Juniper devices, CoreDNS, FreeSwitch, Hashicorp Vault, Keycloak, Cloudflare, SNMP, Cilium, and WireGuard. Storage solutions like Ceph, ZFS, OpenEBS, and Minio are also supported.
Cloud provider alerts are available for AWS CloudWatch, Google Cloud Stackdriver, DigitalOcean, and Azure, while observability tools such as Thanos, Loki, Promtail, Cortex, Grafana Tempo, Grafana Mimir, Grafana Alloy, OpenTelemetry Collector, and Jaeger are included for monitoring distributed systems and tracing.
The repository encourages community contributions, inviting users to submit new rules, improve documentation, report issues, and discuss better error tracking. The alert rules and content are licensed under Creative Commons CC BY 4.0, while the site source code is MIT-licensed. Overall, samber/awesome-prometheus-alerts serves as a valuable resource for teams seeking to enhance their monitoring and alerting capabilities with Prometheus.