Signed NVIDIA drivers on Google Cloud Dataproc 2.2

Hello folks,

I’ve been working this year on better integrating NVIDIA hardware with the Google Cloud Dataproc product (Hadoop on Google Cloud) running the default cluster node image. We have an open bug[1] in the initialization-actions repo regarding creation failures upon enabling secure boot. This is because with secure boot, kernel driver code has its signature verified before insmod places the symbols into kernel memory. The verification process involves reading trust root certificates from EFI variables, and validating that the signatures on the kernel driver either a) were made directly by one of the certificates in the boot sector or b) were made by certificates which chain up to one of them.

This means that Dataproc disk images must have a certificate installed into them. My work on the internals will likely start producing images which have certificates from Google in them. In the meantime, however, our users are left without a mechanism to have both secure boot enabled and install out-of-tree kernel modules such as the NVIDIA GPU drivers. To that end, I’ve got PR #83[2] open with the GoogleCloudDataproc/custom-images github repository. This PR introduces a new argument to the custom image creation script, `–trusted-cert`, the argument of which is the path to a DER-encoded certificate to be included in the certificate database in the EFI variables of the disk’s boot sector.

I’ve written up the instructions on creating a custom image with a trusted certificate here:

Here is a set of commands that can be used to create a Dataproc custom image with certificate installed to the EFI’s db variable. You can run these commands from the root directory of a checkout such as this:

git clone --branch secure-boot-custom-image --single-branch
pushd custom-images

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${PROJECT_NUMBER} \
gcloud config set project ${PROJECT_ID}

gcloud auth login

eval $(bash examples/secure-boot/

#image_name="nvidia-open-kernel-bullseye-$(date +%F)"
image_name="nvidia-open-kernel-bookworm-$(date +%F)"

python \
    --image-name ${image_name} \
    --dataproc-version ${dataproc_version} \
    --trusted-cert "tls/db.der" \
    --customization-script ${customization_script} \
    --metadata "${metadata}" \
    --zone "${custom_image_zone}" \
    --disk-size "${disk_size_gb}" \
    --no-smoke-test \
    --gcs-bucket "${my_bucket}"

I’d love to hear your feedback!


Leave a Reply