Apache Spark cluster logo

Apache Spark cluster

by Apache
Orchestration

Apache Spark is a unified analytics engine for large-scale data processing — known for its speed, ease of use, and sophisticated analytics capabilities. This Helm chart deploys a Spark 4.0 standalone cluster (one master plus N workers) on top of Managed Kubernetes, using upstream apache/spark container images straight from the Apache project. The master Web UI is exposed through a secure tunnel and protected by HTTP Basic Auth, so the cluster overview is reachable from your browser without exposing the cluster directly to the internet. Job submission stays on the cluster-internal port (spark://<release>-master-svc:7077), reachable from any pod in the namespace.

Key features

Upstream Apache images

Container images come straight from docker.io/apache/spark — not a third-party rebuild.

Master + worker topology

Submit jobs over the standalone protocol on port 7077 from any pod in the namespace; the master coordinates a fleet of long-running workers.

Web UI behind Basic Auth

Master Web UI exposed through a secure HTTP/2 tunnel with HTTP Basic Auth — set or generate a password at install time.

Configurable autoscaling

Worker StatefulSet supports HorizontalPodAutoscaler so the cluster grows with workload pressure.


Pricing

Additional Nebius infrastructure costs may apply. Use the Nebius Pricing Page to estimate your infrastructure costs.

Self-managed

Apache Spark cluster on Kubernetes

Deploy a long-running Apache Spark standalone cluster (master + workers) on Kubernetes, with the master Web UI exposed through a secure tunnel and protected by HTTP Basic Auth.

Free
Charged for resources
Setup time15+ minutes
ScalingAuto
MaintenanceSelf-managed (cluster)
Deploy
White-glove

Deploy with a solutions architect

Some applications are easier with a hand on the wheel. Talk to an architect who has deployed this in production.

  • Architecture review & sizing
  • Hands-on deploy session
  • 30 days of follow-up support
Talk to an expert

Security & compliance

Run Apache Spark cluster on infrastructure built for AI workloads

Reliable AI infrastructure backed by top-tier NVIDIA GPUs, purpose-built for demanding inference workloads. Multiple deployment methods — virtual machines for full hardware control, Kubernetes for scalable cluster deployments, and managed serverless applications for teams that want inference running without infrastructure overhead

Learn about Nebius AI Cloud

Security & compliance, out of the box

Nebius meets a broad set of security and compliance standards. Fine-grained IAM controls, audit logs, and encrypted storage are available out of the box — so teams can meet security requirements without additional tooling.

Explore the Trust center

Support

Application support

Provided by Apache. See the documentation and project links above.

Infrastructure support

Provided by Nebius for the underlying cloud infrastructure.