node-problem-detector
by
openshift

Description: This is a place for various problem detectors running on the Kubernetes nodes.

View on GitHub ↗

Summary Information

Updated 42 minutes ago

Added to GitGenius on June 23rd, 2023

Created on September 13th, 2017

Open Issues & Pull Requests: 1 (+0)

Number of forks: 10

Total Stargazers: 6 (+0)

Total Subscribers: 181 (+0)

Issue Activity (beta)

Open issues: 1

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 0 days

Stale 30+ days: 1

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

tide/merge-blocker (9)
lifecycle/rotten (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 21.2 hours

Mean response time: 14.3 days

90th percentile: 27.7 days

Tracked items: 2

Most active contributors

jmguzik - 1 events, 1 issues
sebrandon1 - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The node-problem-detector repository is an OpenShift project written in Go that implements a daemon system for detecting and reporting various hardware, software, and infrastructure problems on Kubernetes nodes. The core purpose is to make node problems visible to the cluster management stack so that upstream layers can respond appropriately rather than continuing to schedule pods on degraded nodes.

The daemon runs on each node and detects problems across multiple categories including infrastructure issues like NTP service failures, hardware problems such as bad CPU or memory, kernel issues like deadlocks or corrupted filesystems, and container runtime problems. It reports these problems to the Kubernetes API server using two mechanisms: NodeCondition for permanent problems that make a node unavailable, and Event for temporary problems with limited impact.

The architecture is built around modular problem daemons that monitor specific categories of issues. The SystemLogMonitor watches system logs and reports problems like kernel deadlocks, readonly filesystems, and frequent kubelet or container runtime restarts. The SystemStatsMonitor collects health-related system statistics as metrics. The CustomPluginMonitor allows users to define custom check scripts for on-demand problem detection, with NTP problems provided as an example. The HealthChecker component specifically monitors kubelet and container runtime health status.

The project supports multiple exporters for reporting detected problems and metrics to different backends. The Kubernetes exporter reports to the API server, the Prometheus exporter exposes metrics locally for scraping, and the Stackdriver exporter sends data to Stackdriver Monitoring. Each exporter and problem daemon type can be disabled at compilation time using build tags, allowing for trimmed executables that exclude unused dependencies and background goroutines.

Configuration is handled through JSON files for each monitor type, with command-line flags controlling behavior such as hostname override, API server connection details, and exporter endpoints. The daemon can run as a Kubernetes DaemonSet or standalone, and was originally integrated as a default addon in GCE clusters.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 21.2 hours with a mean of 343.5 hours across tracked items. The most active label is tide/merge-blocker, indicating merge workflow management. Primary contributors tracked include jmguzik and sebrandon1. The repository shares contributors with openshift/installer, openshift/coredns, and openshift/descheduler, suggesting integration points across the OpenShift ecosystem.

The project addresses a fundamental gap in Kubernetes cluster visibility by making node health problems explicit and actionable rather than invisible to the scheduler, enabling more intelligent pod placement decisions and cluster reliability improvements.

node-problem-detector
by
openshift

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

node-problem-detector
by
openshiftopenshift/node-problem-detector

Repository Details

node-problem-detector by openshift

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

node-problem-detector by openshiftopenshift/node-problem-detector

Repository Details

node-problem-detector
by
openshift

node-problem-detector
by
openshiftopenshift/node-problem-detector