openshift-spark
by
radanalyticsio

Description: No description available.

View radanalyticsio/openshift-spark on GitHub ↗

Summary Information

Updated 28 minutes ago
Added to GitGenius on May 28th, 2026
Created on August 26th, 2016
Open Issues & Pull Requests: 14 (+0)
Number of forks: 82
Total Stargazers: 73 (+0)
Total Subscribers: 21 (+0)

Issue Activity (beta)

Open issues: 0
New in 7 days: 0
Closed in 7 days: 0
Avg open age: N/A days
Stale 30+ days: 0
Stale 90+ days: 0

Recent activity

Opened in 7 days: 0
Closed in 7 days: 0
Comments in 7 days: 0
Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Detailed Description

The radanalyticsio/openshift-spark repository provides resources and instructions for building container images of Apache Spark specifically tailored for deployment on OpenShift Origin, a popular Kubernetes-based platform. The main purpose of this repository is to simplify the process of running Spark workloads in OpenShift by offering pre-built and customizable container images that integrate Spark with Python 3.6, making it easier for data scientists and engineers to leverage Spark's distributed computing capabilities within OpenShift environments.

The repository's primary offering is the 'openshift-spark' image, which includes Apache Spark and Python 3.6. This image is built using CEKit (Container Evolution Kit), a tool designed for container specification and construction, and the build workflow is managed through GNU Make. The repository includes a Makefile that automates the build process, allowing users to create images and store them in their local Docker registry with a simple 'make' command. Once built, images can be tagged and pushed to a designated registry using the provided make targets, facilitating easy distribution and deployment.

A notable feature of the repository is its support for partial images, referred to as 'openshift-spark-inc'. These images contain the necessary tooling for OpenShift but do not include a Spark distribution. This design enables users to customize their Spark environment by installing a Spark version of their choice via a source-to-image (s2i) workflow. The s2i process can be executed either locally or within OpenShift, allowing users to provide a Spark distribution as input and generate a final, complete image. This flexibility is particularly useful for organizations or individuals who require specific Spark versions or custom configurations without modifying the repository's build files directly.

The repository also supports advanced customization during the image completion process. Users can include a 'modify-spark' directory within their build input, structured to mirror the Spark distribution's directory layout. This directory can contain additional files or modifications, such as custom JARs, which are integrated into the final Spark installation using rsync. This feature allows for seamless extension or alteration of the Spark environment to meet specific application requirements.

For ease of use and reference, all images built from this repository respond to a 'usage' command, which provides information about the image's capabilities and intended usage. This helps users quickly understand how to interact with the images and integrate them into their OpenShift workflows.

In summary, radanalyticsio/openshift-spark is a comprehensive solution for deploying Apache Spark on OpenShift. It streamlines the creation, customization, and deployment of Spark container images, supports both standard and partial builds, and offers flexible workflows for integrating custom Spark distributions and configurations. By leveraging CEKit, GNU Make, and s2i, the repository empowers users to efficiently manage Spark environments in OpenShift, catering to a wide range of use cases from standard deployments to highly customized setups.

openshift-spark
by
radanalyticsioradanalyticsio/openshift-spark

Repository Details

Fetching additional details & charts...