Ensure GCP Dataproc cluster is not anonymously or publicly accessible

Error: GCP Dataproc cluster is anonymously or publicly accessible
Bridgecrew Policy ID: BC_GCP_GENERAL_20
Checkov Check ID: CKV_GCP_98
Severity: HIGH

GCP Dataproc cluster is anonymously or publicly accessible

Description

Dataproc is commonly used for data lake modernization, ETL, and data science workloads. A Dataproc cluster contains at least one "management" VM and one "compute" VM. Access to Dataproc clusters is controlled via IAM policies. These IAM policies can be set for public access via the allUsers and allAuthenticatedUsers IAM principals which can inadvertently expose your data to the public.

We recommend you ensure anonymous and public access to Dataproc clusters is not allowed.

Fix - Runtime

GCP Console

To remove anonymous or public access for Dataproc clusters:

  1. Log in to the GCP Console at https://console.cloud.google.com.
  2. Navigate to Clusters.
  3. Select the target Dataproc cluster.
  4. Expand the Info Panel by selecting Show Info Panel.
  5. To remove a specific role assignment, select allUsers or allAuthenticatedUsers, and then click Remove member.

CLI Command

To remove access for allUsers and allAuthenticatedUsers, you need to first get the Dataproc cluster's existing IAM policy. To retrieve the existing policy and copy it to a local file:

gcloud dataproc clusters get-iam-policy CLUSTER-ID  \
  --format json > policy.json

Replace CLUSTER-ID with your Dataproc cluster ID.

Next, locate and remove the IAM bindings with either allUsers or allAuthenticatedUsers depending on your Checkov error. After modifying the policy.json file, update the Dataproc cluster with the following command:

gcloud dataproc clusters set-iam-policy CLUSTER-ID policy.json

Replace CLUSTER-ID with your Dataproc cluster ID.

Fix - Buildtime

Terraform

  • Resource: google_dataproc_cluster_iam_member

  • Field: member

  • Resource: google_dataproc_cluster_iam_binding

  • Field: members

//Option 1
resource "google_dataproc_cluster_iam_member" "editor" {
  cluster = "your-dataproc-cluster"
  role    = "roles/editor"
-  member        = "allUsers"
-  member        = "allAuthenticatedUsers"
}

//Option 2
resource "google_dataproc_cluster_iam_binding" "editor" {
  cluster = "your-dataproc-cluster"
  role    = "roles/editor"
  members = [
-    "allUsers",
-    "allAuthenticatedUsers"
  ]
}