radanalyticsio/openshift-spark

Description: The openshift-spark repository provides a collection of container image definitions and build tooling for deploying Apache Spark on OpenShift Origin.

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 2 hours ago

Added to GitGenius on May 28th, 2026

Created on August 26th, 2016

Open Issues & Pull Requests: 14 (+0)

Number of forks: 82

Total Stargazers: 73 (+0)

Total Subscribers: 21 (+0)

Issue Activity (beta)

Open issues: 0

New in 7 days: 0

Closed in 7 days: 0

Avg open age: N/A days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Full issues analysis pending...

Detailed Description

The openshift-spark repository provides a collection of container image definitions and build tooling for deploying Apache Spark on OpenShift Origin. The project is written primarily in Shell and focuses on creating Docker images that integrate Spark with OpenShift's container orchestration capabilities. It addresses the need for cloud-native Spark deployments by offering pre-configured images that combine Apache Spark with OpenShift-specific tooling and Python 3.6 support.

The repository's core functionality centers on two types of image builds. The primary offering is the complete openshift-spark image, which includes a full Apache Spark distribution ready for immediate deployment on OpenShift. Additionally, the project supports building partial or incomplete images called openshift-spark-inc, which contain OpenShift tooling but omit the Spark distribution itself. This dual approach gives users flexibility in how they deploy Spark, allowing them to either use pre-built complete images or customize their Spark versions through source-to-image workflows.

The build system relies on GNU Make and the Container Evolution Kit (cekit) version 3.7.0 as its primary dependencies. The Makefile orchestrates the image construction process, while the image.yaml file serves as the container specification. Users can build images locally with a simple make command, which compiles all specified images into their local Docker registry. The build system also supports pushing images to designated registries with customizable naming and tagging through the make push command.

For users requiring custom Spark distributions, the repository implements a source-to-image workflow that enables image completion without modifying the repository's build files. This workflow accepts Spark distributions as tarballs, either from local files or URLs, and can optionally include SHA512 checksums for verification. The build-input directory structure supports a modify-spark subdirectory that allows users to inject custom files into the Spark installation using rsync, enabling additions like custom JAR files without rebuilding the entire image.

The project supports two methods for completing partial images. Users can employ the s2i tool locally for image completion, or they can leverage OpenShift's native build system using the oc command-line tool. The OpenShift method accepts both local files and URLs as build inputs, providing flexibility for different deployment scenarios. Completed images are written to OpenShift imagestreams for seamless integration with the platform.

All images produced by this repository implement a usage command that provides reference documentation when executed. This allows users to quickly access information about any image they build or pull, supporting better discoverability of image capabilities and options.

The repository is classified within the Apache Spark, Kubernetes integration, and cluster deployment domains, reflecting its role in enabling distributed computing and big data processing on cloud-native platforms. Its focus on OpenShift-specific integration distinguishes it from generic Spark containerization approaches, making it particularly valuable for organizations standardizing on OpenShift for their container orchestration infrastructure.

radanalyticsio/openshift-spark

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

openshift-spark
by
radanalyticsioradanalyticsio/openshift-spark

Repository Details

radanalyticsio/openshift-spark

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

openshift-spark by radanalyticsioradanalyticsio/openshift-spark

Repository Details

openshift-spark
by
radanalyticsioradanalyticsio/openshift-spark