Category: Apache Hadoop

  • Signed NVIDIA drivers on Google Cloud Dataproc 2.2

    Hello folks, I’ve been working this year on better integrating NVIDIA hardware with the Google Cloud Dataproc product (Hadoop on Google Cloud) running the default cluster node image. We have an open bug[1] in the initialization-actions repo regarding creation failures upon enabling secure boot. This is because with secure boot, kernel driver code has its…

  • Early Access: Inserting JSON data to BigQuery from Spark on Dataproc

    Hello folks! We recently received a case letting us know that Dataproc 2.1.1 was unable to write to a BigQuery table with a column of type JSON. Although the BigQuery connector for Spark has had support for JSON columns since 0.28.0, the Dataproc images on the 2.1 line still cannot create tables with JSON columns…