Ensure GCP Dataflow jobs are private

Error: GCP Dataflow jobs are not private
Bridgecrew Policy ID: BC_GCP_GENERAL_33
Checkov Check ID: CKV_GCP_94
Severity: HIGH

GCP Dataflow jobs are not private

Description

Cloud Dataflow in GCP is a service used for streaming and batch data processing. A Dataflow job consists of at least one management node and one compute node (both are GCE VMs). By default, these nodes are configured with public IPs that allow them to communicate with the public internet, but this also means they increase your potential attack surface by being publicly accessible.

We recommend you remove the public IPs for your Dataflow jobs. View the official Google documentation for the currently supported internet access configuration options.

Fix - Runtime

GCP Console

Making Dataflow jobs private via the console is not currently supported.

CLI Command

Making running Dataflow jobs private via the gcloud CLI is not currently supported. Instead, you need to drain or cancel your job and then re-create with the correct flag configured.

# To cancel a Dataflow job
gcloud dataflow jobs cancel JOB_ID

Replace JOB_ID with your Dataflow job ID.

# To drain a Dataflow job
gcloud dataflow jobs drain JOB_ID

Replace JOB_ID with your Dataflow job ID.

# To create a new Dataflow job without public IPs
gcloud dataflow jobs run JOB_NAME \
  --disable-public-ips \
  --gcs-location=GCS_LOCATION

Replace JOB_ID with your Dataflow job ID. Replace GCS_LOCATION with the GCS bucket name where your job template lives. Must be a URL beginning with gs://.

Google also provides documentation on how to Turn off external IP address for your Dataflow jobs. This documentation has examples for Java and Python.

Fix - Buildtime

Terraform

  • Resource: google_dataflow_job
  • Field: ip_configuration
resource "google_dataflow_job" "big_data_job" {
  name              = "dataflow-job"
  template_gcs_path = "gs://my-bucket/templates/template_file"
  temp_gcs_location = "gs://my-bucket/tmp_dir"
  parameters = {
    foo = "bar"
    baz = "qux"
  }

-  ip_configuration = "WORKER_IP_PUBLIC"
+  ip_configuration = "WORKER_IP_PRIVATE"
}

Did this page help you?