{"id":2114,"date":"2026-01-28T02:37:46","date_gmt":"2026-01-28T10:37:46","guid":{"rendered":"https:\/\/wp.c9h.org\/cj\/?p=2114"},"modified":"2026-01-29T01:02:46","modified_gmt":"2026-01-29T09:02:46","slug":"dataproc-gpus-secure-boot-proxies","status":"publish","type":"post","link":"https:\/\/wp.c9h.org\/cj\/?p=2114","title":{"rendered":"Dataproc GPUs, Secure Boot, &amp; Proxies"},"content":{"rendered":"<h1\nid=\"part-1-building-a-secure-network-foundation-for-dataproc-with-gpus-swp\">Part<br \/>\n1: Building a Secure Network Foundation for Dataproc with GPUs &amp;<br \/>\nSWP<\/h1>\n<p>Welcome to the first post in our series on running GPU-accelerated<br \/>\nDataproc workloads in secure, enterprise-grade environments. Many<br \/>\norganizations need to operate within VPCs that have no direct internet<br \/>\negress, instead routing all traffic through a Secure Web Proxy (SWP).<br \/>\nAdditionally, security mandates often require the use of Shielded VMs<br \/>\nwith Secure Boot enabled. This series will show you how to meet these<br \/>\nrequirements for your Dataproc GPU clusters.<\/p>\n<p>In this post, we\u2019ll focus on laying the network foundation using<br \/>\ntools from the <a\nhref=\"https:\/\/github.com\/LLC-Technologies-Collier\/cloud-dataproc\">LLC-Technologies-Collier\/cloud-dataproc<\/a><br \/>\nrepository (branch <code>proxy-sync-2026-01<\/code>).<\/p>\n<h2 id=\"the-challenge-network-isolation-control\">The Challenge: Network<br \/>\nIsolation &amp; Control<\/h2>\n<p>Before we can even think about custom images or GPU drivers, we need<br \/>\na network environment that:<\/p>\n<ol type=\"1\">\n<li>Prevents direct internet access from Dataproc cluster nodes.<\/li>\n<li>Forces all egress traffic through a manageable and auditable<br \/>\nSWP.<\/li>\n<li>Provides the necessary connectivity for Dataproc to function and for<br \/>\nus to build images later.<\/li>\n<li>Supports Secure Boot for all VMs.<\/li>\n<\/ol>\n<h2 id=\"the-toolkit-llc-technologies-colliercloud-dataproc\">The Toolkit:<br \/>\n<code>LLC-Technologies-Collier\/cloud-dataproc<\/code><\/h2>\n<p>To make setting up and tearing down these complex network<br \/>\nenvironments repeatable and consistent, we\u2019ve developed a set of bash<br \/>\nscripts within the <code>gcloud<\/code> directory of the<br \/>\n<code>cloud-dataproc<\/code> repository. These scripts handle the<br \/>\ncreation of VPCs, subnets, firewall rules, service accounts, and the<br \/>\nSecure Web Proxy itself.<\/p>\n<p><strong>Key Script:<br \/>\n<code>gcloud\/bin\/create-dpgce-private<\/code><\/strong><\/p>\n<p>This script is the cornerstone for creating the private, proxied<br \/>\nenvironment. It automates:<\/p>\n<ul>\n<li>VPC and Subnet creation (for the cluster, SWP, and management).<\/li>\n<li>Setup of Certificate Authority Service and Certificate Manager for<br \/>\nSWP TLS interception.<\/li>\n<li>Deployment of the SWP Gateway instance.<\/li>\n<li>Configuration of a Gateway Security Policy to control egress.<\/li>\n<li>Creation of necessary firewall rules.<\/li>\n<li><strong>Result:<\/strong> Cluster nodes in this VPC have NO default<br \/>\ninternet route and MUST use the SWP.<\/li>\n<\/ul>\n<p><strong>Configuration via <code>env.json<\/code><\/strong><\/p>\n<p>We use a single <code>env.json<\/code> file to drive the<br \/>\nconfiguration. This file will also be used by the<br \/>\n<code>custom-images<\/code> scripts in Part 3. This <code>env.json<\/code><br \/>\nshould reside in your <code>custom-images<\/code> repository clone, and<br \/>\nyou\u2019ll symlink it into the <code>cloud-dataproc\/gcloud<\/code><br \/>\ndirectory.<\/p>\n<p><strong>Running the Setup:<\/strong><\/p>\n<div class=\"sourceCode\" id=\"cb1\">\n<pre\nclass=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb1-1\"><a href=\"#cb1-1\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Assuming you have cloud-dataproc and custom-images cloned side-by-side<\/span><\/span>\n<span id=\"cb1-2\"><a href=\"#cb1-2\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># And your env.json is in the custom-images root<\/span><\/span>\n<span id=\"cb1-3\"><a href=\"#cb1-3\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"bu\">cd<\/span> cloud-dataproc\/gcloud<\/span>\n<span id=\"cb1-4\"><a href=\"#cb1-4\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Symlink to the env.json in custom-images<\/span><\/span>\n<span id=\"cb1-5\"><a href=\"#cb1-5\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"fu\">ln<\/span> <span class=\"at\">-sf<\/span> ..\/..\/custom-images\/env.json env.json<\/span>\n<span id=\"cb1-6\"><a href=\"#cb1-6\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Run the creation script, but don&#39;t create a cluster yet<\/span><\/span>\n<span id=\"cb1-7\"><a href=\"#cb1-7\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"fu\">bash<\/span> bin\/create-dpgce-private <span class=\"at\">--no-create-cluster<\/span><\/span>\n<span id=\"cb1-8\"><a href=\"#cb1-8\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"bu\">cd<\/span> ..\/..\/custom-images<\/span><\/code><\/pre>\n<\/div>\n<h2 id=\"node-configuration-the-metadata-startup-script-for-runtime\">Node<br \/>\nConfiguration: The Metadata Startup Script for Runtime<\/h2>\n<p>For the Dataproc cluster nodes to function correctly in this proxied<br \/>\nenvironment, they need to be configured to use the SWP on boot. We<br \/>\nachieve this using a GCE metadata startup script.<\/p>\n<p>The script <code>startup_script\/gce-proxy-setup.sh<\/code> (from the<br \/>\n<code>custom-images<\/code> repository) is designed to be run on each<br \/>\ncluster node at boot. It reads metadata like <code>http-proxy<\/code> and<br \/>\n<code>http-proxy-pem-uri<\/code> (which our cluster creation scripts in<br \/>\nPart 4 will pass) to configure the OS environment, package managers, and<br \/>\nother tools to use the SWP.<\/p>\n<p><strong>Upload this script to your GCS bucket:<\/strong><\/p>\n<div class=\"sourceCode\" id=\"cb2\">\n<pre\nclass=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb2-1\"><a href=\"#cb2-1\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Run from the custom-images repository root<\/span><\/span>\n<span id=\"cb2-2\"><a href=\"#cb2-2\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"ex\">gsutil<\/span> cp startup_script\/gce-proxy-setup.sh gs:\/\/<span class=\"va\">$(<\/span><span class=\"ex\">jq<\/span> <span class=\"at\">-r<\/span> .BUCKET env.json<span class=\"va\">)<\/span>\/custom-image-deps\/<\/span><\/code><\/pre>\n<\/div>\n<p>This script is essential for the <em>runtime<\/em> behavior of the<br \/>\ncluster nodes.<\/p>\n<h2 id=\"conclusion-of-part-1\">Conclusion of Part 1<\/h2>\n<p>With the <code>cloud-dataproc<\/code> scripts, we\u2019ve laid the<br \/>\ngroundwork by provisioning a secure VPC with controlled egress through<br \/>\nan SWP. We\u2019ve also prepared the essential node-level proxy configuration<br \/>\nscript (<code>gce-proxy-setup.sh<\/code>) in GCS, ready to be used by our<br \/>\nclusters.<\/p>\n<p>Stay tuned for Part 2, where we\u2019ll dive into the<br \/>\n<code>install_gpu_driver.sh<\/code> initialization action from the<br \/>\n<code>LLC-Technologies-Collier\/initialization-actions<\/code> repository<br \/>\n(branch <code>gpu-202601<\/code>) and how it\u2019s been adapted to install<br \/>\nall GPU-related software through the proxy during the image build<br \/>\nprocess.<\/p>\n\n<div class=\"twitter-share\"><a href=\"https:\/\/twitter.com\/intent\/tweet?via=cjamescollier\" class=\"twitter-share-button\">Tweet<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Part 1: Building a Secure Network Foundation for Dataproc with GPUs &amp; SWP Welcome to the first post in our series on running GPU-accelerated Dataproc workloads in secure, enterprise-grade environments. Many organizations need to operate within VPCs that have no direct internet egress, instead routing all traffic through a Secure Web Proxy (SWP). Additionally, security [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[322,17,79,316,330,49,47,102,125,101,86,1,100],"tags":[],"class_list":["post-2114","post","type-post","status-publish","format-standard","hentry","category-bookworm","category-debian","category-free-software","category-gcp","category-google-cloud-dataproc","category-images","category-linux","category-open-source","category-pgp","category-security","category-tls","category-uncategorized","category-x509"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1YDIB-y6","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2114"}],"version-history":[{"count":4,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2114\/revisions"}],"predecessor-version":[{"id":2124,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2114\/revisions\/2124"}],"wp:attachment":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}