Ensure GCP Dataproc Clusters do not have public IPs

Error: GCP Dataproc Clusters have public IPs
Bridgecrew Policy ID: BC_GCP_GENERAL_37
Checkov Check ID: CKV_GCP_103
Severity: HIGH

GCP Dataproc Clusters have public IPs

Description

Dataproc is commonly used for data lake modernization, ETL, and data science workloads. A Dataproc cluster contains at least one "management" VM and one "compute" VM which are deployed into a VPC network. A common misconfiguration is creating a Dataproc cluster with public IPs. This security misconfiguration could put your data at risk of accidental exposure, because a public IP accompanied by an open firewall rule allows potentially unauthorized access to the underlining Dataproc VMs.

We recommend you only assign private IPs to your Dataproc clusters.

Fix - Runtime

GCP Console

It is not currently possible to edit a running Dataproc cluster to remove it's public IPs.

To create a Dataproc cluster with only private IPs:

  1. Log in to the GCP Console.
  2. Navigate to Dataproc.
  3. Select Customize Cluster to view Network Configuration settings.
  4. Locate the Internal IP Only section and select the checkbox next to Configure all instances to have only internal IP addresses

CLI Command

It is not currently possible to edit a running Dataproc cluster to remove it's public IPs.

To create a Dataproc cluster with only private IPs you need to specify the --no-address flag. As an example:

gcloud beta dataproc clusters create my_cluster  \
  --region=us-central1  \
  --no-address

Fix - Buildtime

Terraform

  • Resource: google_dataproc_cluster
  • Field: internal_ip_only
resource "google_dataproc_cluster" "accelerated_cluster" {
  name   = "my-cluster-with-gpu"
  region = "us-central1"

  cluster_config {
    gce_cluster_config {
      zone = "us-central1-a"
-     internal_ip_only = false
+     internal_ip_only = true
    }

    master_config {
      accelerators {
        accelerator_type  = "nvidia-tesla-k80"
        accelerator_count = "1"
      }
    }
  }
}