Avatar

Download a remote file and store to a GCP bucket: a Golang kata

← Back to list
Posted on 20.11.2022
Image by Maarten van den Heuvel on Unsplash
Refill!

Not so long time ago I was assigned to build a feature for copying a remote file into our own GCP bucket. For somebody experienced in Golang that kind of task may seem quite trivial, but in my case I thought it could be helpful to somehow mark my progress with Go.

I also googled for some canned solution out there, but it seems nobody has this particular topic uncovered so far. So here I am with my 50 cents.

The GCP Storage emulator

First thing first: I had to prepare a dev environment. While some engineers prefer having a dedicated dev environment in the cloud, I try to have everything emulated, mainly because of two reasons:

  • I can save bits and pieces of the company's budget by not using the cloud infrastructure for development.
  • I can work completely offline, from a train or a plane, which I sometimes in fact do.

If you prefer otherwise, you can create a Google Cloud service account, download the credentials and proceed with the cloud environment. To each their own.

So, my local infra be like:

👉 📃  docker-compose.yml
version: "3.8"
services:
storage:
image: oittaa/gcp-storage-emulator
env_file: ./.env.local
environment:
PORT: 9023
ports:
- "9023:9023"
volumes:
- ./.data/storage/:/storage
The code is licensed under the MIT license

Here I used the gcp-storage-emulator (kudos, author!), and it worked for me just fine.

A short .env.local file contains two essential variables:

👉 📃  .env.local
STORAGE_EMULATOR_HOST=http://localhost:9023
BUCKET_NAME=my-awesome-bucket
FILE_URL=https://file.url.here
OBJECT_PATH=my-objects
The code is licensed under the MIT license

The STORAGE_EMULATOR_HOST is required by Google Storage package to be defined in case if working with a local emulator.

Bootstrapping the local resources

Before using the bucket, we need to create it obviously. Not that I know Python, but I borrowed some code from Google's toolkit and wrote this script to help myself out:

👉 📃  ./scripts/bootstrap-storage.py
import os
from google.cloud import storage
def create_bucket(bucket_name: str) -> None:
client = storage.Client()
bucket = client.bucket(bucket_name)
bucket.location = 'eu'
bucket.create()
print(f"Bucket created: {bucket.name}")
bucket_name_env = os.getenv('BUCKET_NAME')
create_bucket(bucket_name_env)
The code is licensed under the MIT license

I used pyenv to install Python by the way.

The code

Right, done with the preparations. I assume that your have your Go environment already prepared.

Now goes the code.

👉 📃  cmd/main.go
package main
import (
"io"
"net/http"
"net/url"
"errors"
"os"
"fmt"
"log"
"time"
"context"
"strings"
"github.com/pborman/uuid"
"cloud.google.com/go/storage"
)
func main() {
fileURL := os.Getenv("FILE_URL")
bucketName := os.Getenv("BUCKET_NAME")
objectPath := os.Getenv("OBJECT_PATH")
fileName, err := extractFileName(fileURL)
if err != nil {
panic(err)
}
fileReader, closeReader, err := downloadFile(fileURL)
if err != nil {
panic(err)
}
defer func() {
err := closeReader()
if err != nil {
panic(err)
}
}()
err = uploadFile(fileName, fileReader, bucketName, objectPath)
if err != nil {
panic(err)
}
log.Println("Done!")
}
func downloadFile(url string) (reader io.ReadCloser, Close func() error, err error) {
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return nil, nil, err
}
client := &http.Client{
Transport: &http.Transport{},
}
resp, err := client.Do(req)
if err != nil {
return nil, nil, err
}
if resp.StatusCode != http.StatusOK {
resp.Body.Close()
return nil, nil, errors.New("could not load the file " + fmt.Sprintf("%d", resp.StatusCode))
}
return resp.Body, resp.Body.Close, nil
}
func uploadFile(fileName string, fileReader io.ReadCloser, bucketName string, objectPath string) error {
uploaderCtx := context.Background()
uploaderCtx, cancel := context.WithTimeout(uploaderCtx, time.Second*50)
defer cancel()
targetObjectPath := objectPath+"/"+uuid.New()+"-"+fileName
log.Println("Uploading to "+bucketName+"/"+targetObjectPath)
client, err := storage.NewClient(uploaderCtx)
if err != nil {
return err
}
object := client.Bucket(bucketName).Object(targetObjectPath)
objectWriter := object.NewWriter(uploaderCtx)
if _, err := io.Copy(objectWriter, fileReader); err != nil {
return err
}
if err := objectWriter.Close(); err != nil {
return err
}
return nil
}
func extractFileName(fileURL string) (string, error) {
parsedURL, err := url.Parse(fileURL)
if err != nil {
return "", err
}
splitPath := strings.Split(parsedURL.Path, "/")
if len(splitPath) == 0 {
return "", nil
}
return splitPath[len(splitPath) - 1], nil
}
The code is licensed under the MIT license

Few important things to outline here:

  1. I use io.Copy() instead of ioutil.ReadAll() in order to save memory in case if the file is too large.

  2. I properly close both the reader and the writer in order to make it clean.

  3. I have the main function split onto several sub-functions for better structure. Despite the fact the boilerplate grew slightly thicker because of that, I still think it's better to have the things this way.

  4. I use a separate context with a timeout, and of course it should not be the same context as the application itself uses.

  5. Install dependencies via

    $
    go get github.com/pborman/uuid cloud.google.com/go/storage
    The code is licensed under the MIT license

    or if you have my go.mod, then via

    $
    go mod download
    The code is licensed under the MIT license

Yeh, a small step for mankind, one giant leap for me. Hope this was helpful for someone.

I wrote a simple makefile to keep everything in one place:

👉 📃  Makefile
install:
@pip install google-api-python-client
@pip install google.cloud.storage
@pip install google-cloud
@pip install google-cloud-vision
@go mod download
create_resources:
@godotenv -f ./.env.local python ./scripts/bootstrap-storage.py
run_infra:
@docker-compose up
stop_infra:
@docker-compose stop
run:
@godotenv -f ./.env.local go run ./cmd/main.go
The code is licensed under the MIT license

That's all for today, folks! As usual, the code is here.


Avatar

Sergei Gannochenko

Business-oriented fullstack engineer, in ❤️ with Tech.
Golang, React, TypeScript, Docker, AWS, Jamstack.
15+ years in dev.