{"id":2119,"date":"2026-01-29T01:08:12","date_gmt":"2026-01-29T09:08:12","guid":{"rendered":"https:\/\/wp.c9h.org\/cj\/?p=2119"},"modified":"2026-01-29T01:08:13","modified_gmt":"2026-01-29T09:08:13","slug":"part-3-building-the-keystone-dataproc-custom-images-for-secure-boot-gpus","status":"publish","type":"post","link":"https:\/\/wp.c9h.org\/cj\/?p=2119","title":{"rendered":"Part 3: Building the Keystone &#8211; Dataproc Custom Images for Secure Boot &amp; GPUs"},"content":{"rendered":"<h1\nid=\"part-3-building-the-keystone---dataproc-custom-images-for-secure-boot-gpus\">Part<br \/>\n3: Building the Keystone &#8211; Dataproc Custom Images for Secure Boot &amp;<br \/>\nGPUs<\/h1>\n<p>In Part 1, we established a secure, proxy-only network. In Part 2, we<br \/>\nexplored the enhanced <code>install_gpu_driver.sh<\/code> initialization<br \/>\naction. Now, in Part 3, we\u2019ll focus on using the <a\nhref=\"https:\/\/github.com\/LLC-Technologies-Collier\/custom-images\">LLC-Technologies-Collier\/custom-images<\/a><br \/>\nrepository (branch <code>proxy-exercise-2025-11<\/code>) to build the<br \/>\nactual custom Dataproc images embedded with NVIDIA drivers signed for<br \/>\nSecure Boot, all within our proxied environment.<\/p>\n<h2 id=\"why-custom-images\">Why Custom Images?<\/h2>\n<p>To run NVIDIA GPUs on Shielded VMs with Secure Boot enabled, the<br \/>\nNVIDIA kernel modules must be signed with a key trusted by the VM\u2019s EFI<br \/>\nfirmware. Since standard Dataproc images don\u2019t include these<br \/>\ncustom-signed modules, we need to build our own. This process also<br \/>\nallows us to pre-install a full stack of GPU-accelerated software.<\/p>\n<h2 id=\"the-custom-images-toolkit-examplessecure-boot\">The<br \/>\n<code>custom-images<\/code> Toolkit<br \/>\n(<code>examples\/secure-boot<\/code>)<\/h2>\n<p>The <code>examples\/secure-boot<\/code> directory within the<br \/>\n<code>custom-images<\/code> repository contains the necessary scripts and<br \/>\nconfigurations, refined through significant development to handle proxy<br \/>\nand Secure Boot challenges.<\/p>\n<p><strong>Key Components &amp; Development Insights:<\/strong><\/p>\n<ul>\n<li><strong><code>env.json<\/code>:<\/strong> The central configuration<br \/>\nfile (as used in Part 1) for project, network, proxy, and bucket<br \/>\ndetails. This became the single source of truth to avoid configuration<br \/>\ndrift.<\/li>\n<li><strong><code>create-key-pair.sh<\/code>:<\/strong> Manages the Secure<br \/>\nBoot signing keys (PK, KEK, DB) in Google Secret Manager, essential for<br \/>\nthe module signing.<\/li>\n<li><strong><code>build-and-run-podman.sh<\/code>:<\/strong> Orchestrates<br \/>\nthe image build process in an isolated Podman container. This was<br \/>\nintroduced to standardize the build environment and encapsulate<br \/>\ndependencies, simplifying what the user needs to install locally.<\/li>\n<li><strong><code>pre-init.sh<\/code>:<\/strong> Sets up the build<br \/>\nenvironment within the container and calls<br \/>\n<code>generate_custom_image.py<\/code>. It crucially passes metadata<br \/>\nderived from <code>env.json<\/code> (like proxy settings and Secure Boot<br \/>\nkey secret names) to the temporary build VM.<\/li>\n<li><strong><code>generate_custom_image.py<\/code>:<\/strong> The core<br \/>\nPython script that automates GCE VM creation, runs the customization<br \/>\nscript, and creates the final GCE image.<\/li>\n<li><strong><code>gce-proxy-setup.sh<\/code>:<\/strong> This script from<br \/>\n<code>startup_script\/<\/code> is vital. It\u2019s injected into the temporary<br \/>\n<em>build VM<\/em> and runs <em>first<\/em> to configure the OS, package<br \/>\nmanagers (apt, dnf), tools (curl, wget, GPG), Conda, and Java to use the<br \/>\nproxy settings passed in the metadata. This ensures the entire build<br \/>\nprocess is proxy-aware.<\/li>\n<li><strong><code>install_gpu_driver.sh<\/code>:<\/strong> Used as the<br \/>\n<code>--customization-script<\/code> within the build VM. As detailed in<br \/>\nPart 2, this script handles the driver\/CUDA\/ML stack installation and<br \/>\nsigning, now able to function correctly due to the proxy setup by<br \/>\n<code>gce-proxy-setup.sh<\/code>.<\/li>\n<\/ul>\n<p><strong>Layered Image Strategy:<\/strong><\/p>\n<p>The <code>pre-init.sh<\/code> script employs a layered approach:<\/p>\n<ol type=\"1\">\n<li><strong><code>secure-boot<\/code> Image:<\/strong> Base image with<br \/>\nSecure Boot certificates injected.<\/li>\n<li><strong><code>tf<\/code> Image:<\/strong> Based on<br \/>\n<code>secure-boot<\/code>, this image runs the full<br \/>\n<code>install_gpu_driver.sh<\/code> within the proxy-configured build VM<br \/>\nto install NVIDIA drivers, CUDA, ML libraries (TensorFlow, PyTorch,<br \/>\nRAPIDS), and sign the modules. This is the primary target image for our<br \/>\nuse case.<\/li>\n<\/ol>\n<p>(Note: <code>secure-proxy<\/code> and <code>proxy-tf<\/code> layers<br \/>\nwere experiments, but the <code>-tf<\/code> image combined with runtime<br \/>\nmetadata emerged as the most effective solution for 2.2-debian12).<\/p>\n<p><strong>Build Steps:<\/strong><\/p>\n<ol type=\"1\">\n<li>\n<p><strong>Clone Repos &amp; Configure<br \/>\n<code>env.json<\/code>:<\/strong> Ensure you have the<br \/>\n<code>custom-images<\/code> and <code>cloud-dataproc<\/code> repos and a<br \/>\ncomplete <code>env.json<\/code> as described in Part 1.<\/p>\n<\/li>\n<li>\n<p><strong>Run the Build:<\/strong><br \/>\n<code>bash     # Example: Build a 2.2-debian12 based image set     # Run from the custom-images repository root     bash examples\/secure-boot\/build-and-run-podman.sh 2.2-debian12<\/code><br \/>\nThis command will build the layered images, leveraging the proxy<br \/>\nsettings from <code>env.json<\/code> via the metadata injected into the<br \/>\nbuild VM. Note the final image name produced (e.g.,<br \/>\n<code>dataproc-2-2-deb12-YYYYMMDD-HHMMSS-tf<\/code>).<\/p>\n<\/li>\n<\/ol>\n<h2 id=\"conclusion-of-part-3\">Conclusion of Part 3<\/h2>\n<p>Through an iterative process, we\u2019ve developed a robust workflow<br \/>\nwithin the <code>custom-images<\/code> repository to build Secure<br \/>\nBoot-compatible GPU images in a proxy-only environment. The key was<br \/>\nisolating the build in Podman, ensuring the build VM is fully<br \/>\nproxy-aware using <code>gce-proxy-setup.sh<\/code>, and leveraging the<br \/>\nenhanced <code>install_gpu_driver.sh<\/code> from Part 2.<\/p>\n<p>In Part 4, we\u2019ll bring it all together, deploying a Dataproc cluster<br \/>\nusing this custom <code>-tf<\/code> image within the secure network, and<br \/>\nverifying the end-to-end functionality.<\/p>\n\n<div class=\"twitter-share\"><a href=\"https:\/\/twitter.com\/intent\/tweet?via=cjamescollier\" class=\"twitter-share-button\">Tweet<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Part 3: Building the Keystone &#8211; Dataproc Custom Images for Secure Boot &amp; GPUs In Part 1, we established a secure, proxy-only network. In Part 2, we explored the enhanced install_gpu_driver.sh initialization action. Now, in Part 3, we\u2019ll focus on using the LLC-Technologies-Collier\/custom-images repository (branch proxy-exercise-2025-11) to build the actual custom Dataproc images embedded with [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[322,17,79,316,330,171,47,323,102,50,18,125,101,166,86,184,100],"tags":[],"class_list":["post-2119","post","type-post","status-publish","format-standard","hentry","category-bookworm","category-debian","category-free-software","category-gcp","category-google-cloud-dataproc","category-hardware","category-linux","category-nvidia","category-open-source","category-performance","category-perl","category-pgp","category-security","category-software","category-tls","category-virtualization","category-x509"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1YDIB-yb","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2119"}],"version-history":[{"count":2,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2119\/revisions"}],"predecessor-version":[{"id":2125,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2119\/revisions\/2125"}],"wp:attachment":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}