Accessing GCP BigQuery: a Golang kata
Table of contents
Recently I got a task of scraping some data from one of our BigQuery tables. It could be easily done through the GCP console, however, my personal account hasn't had the necessary access. On the other hand, a service account that was at my disposal, had it.
So, an obvious solution was to make a small Golang script that makes a query to the desired BQ table. No sooner said than done, behold!
I hate writing raw SQL queries, especially if a language does not support proper string templating syntax. In case of Go, it is either ugly concatenation of strings, or fmt.Sprintf(). That's why I've decided to make use of a query builder, such as the famous gocraft/dbr. So I make a fake connection, then I make the select statement, and finally I interpolate it for the Postgres dialect.
Also, I always clamp the results using the limit clause. It is automatically attached to the query at the end. This is necessary, because unlike CloudSQL, BigQuery charges you based on the amount of data processed. That's why you probably don't want to grab unnecessary data you are not gonna put into any use.
The table name is also auto-prefixed with the namespace, if any. The service account is loaded from the JSON file, placed locally.
package mainimport ("context""database/sql""fmt""log""os""cloud.google.com/go/bigquery""github.com/gocraft/dbr/v2""github.com/gocraft/dbr/v2/dialect""github.com/lib/pq""github.com/luna-duclos/instrumentedsql""google.golang.org/api/iterator""google.golang.org/api/option")const (QueryLimit = 10)type Record struct {SKU string `bigquery:"sku"`Title string `bigquery:"title"`}type DB struct {*dbr.Connection}func main() {projectName := os.Getenv("PROJECT_NAME")credentialsFile := os.Getenv("CREDENTIALS_FILE")ctx := context.Background()sess, closeConn, err := makeFakeSession()if err != nil {log.Fatalf("creation error: %v", err)}defer closeConn()query, err := interpolateBQQuery(sess.Select("sku, title").From(prefTable("products")),QueryLimit,)if err != nil {log.Fatalf("interpolation failure: %v", err)}fmt.Printf("EXECUTING: %s\n", query)client, err := bigquery.NewClient(ctx, projectName, option.WithCredentialsFile(credentialsFile))if err != nil {log.Fatalf("new bigquery client creation error: %v", err)}defer client.Close()result, err := client.Query(query).Read(ctx)if err != nil {log.Fatalf("bigquery read error: %v", err)}for {var row Recorderr := result.Next(&row)if err == iterator.Done {break}if err != nil {log.Fatalf("error iterating through results: %v", err)}fmt.Printf("%v\n", row)fmt.Println("--------==============================--------")}}func prefTable(sql string) string {tablePrefix := os.Getenv("TABLE_PREFIX")return fmt.Sprintf("%s%s", tablePrefix, sql)}func makeFakeSession() (*dbr.Session, func() error, error) {sql.Register("instrumented-postgres", instrumentedsql.WrapDriver(&pq.Driver{}))masterDB, err := sql.Open("instrumented-postgres", "host= port= user= password= dbname= sslmode=disable")if err != nil {return nil, nil, err}masterConn := &DB{Connection: &dbr.Connection{DB: masterDB,Dialect: dialect.PostgreSQL,EventReceiver: &dbr.NullEventReceiver{},},}sess := masterConn.NewSession(nil)return sess, masterDB.Close, nil}func interpolateSelectStatement(statement *dbr.SelectStmt) (string, error) {buf := dbr.NewBuffer()err := statement.Build(statement.Dialect, buf)if err != nil {return "", err}return dbr.InterpolateForDialect(buf.String(), buf.Value(), statement.Dialect)}func interpolateBQQuery(statement *dbr.SelectStmt, limit uint64) (string, error) {statement.Limit(limit)sqlQuery, err := interpolateSelectStatement(statement)if err != nil {return "", err}return sqlQuery, nil}
As you can see, the script is not rocket science at all, but I am glad I had a chance to work with BQ using Go.
And that's it. The code is available here.
Sergei Gannochenko
Golang, React, TypeScript, Docker, AWS, Jamstack.
20+ years in dev.