Download a remote file and store to a GCP bucket: a Golang kata
Not so long time ago I was assigned to build a feature for copying a remote file into our own GCP bucket. For somebody experienced in Golang that kind of task may seem quite trivial, but in my case I thought it could be helpful to somehow mark my progress with Go.
I also googled for some canned solution out there, but it seems nobody has this particular topic uncovered so far. So here I am with my 50 cents.
First thing first: I had to prepare a dev environment. While some engineers prefer having a dedicated dev environment in the cloud, I try to have everything emulated, mainly because of two reasons:
- I can save bits and pieces of the company's budget by not using the cloud infrastructure for development.
- I can work completely offline, from a train or a plane, which I sometimes in fact do.
If you prefer otherwise, you can create a Google Cloud service account, download the credentials and proceed with the cloud environment. To each their own.
So, my local infra be like:
version: "3.8"services:storage:image: oittaa/gcp-storage-emulatorenv_file: ./.env.localenvironment:PORT: 9023ports:- "9023:9023"volumes:- ./.data/storage/:/storage
Here I used the gcp-storage-emulator (kudos, author!), and it worked for me just fine.
A short .env.local file contains two essential variables:
STORAGE_EMULATOR_HOST=http://localhost:9023BUCKET_NAME=my-awesome-bucketFILE_URL=https://file.url.hereOBJECT_PATH=my-objects
The STORAGE_EMULATOR_HOST is required by Google Storage package to be defined in case if working with a local emulator.
Before using the bucket, we need to create it obviously. Not that I know Python, but I borrowed some code from Google's toolkit and wrote this script to help myself out:
import osfrom google.cloud import storagedef create_bucket(bucket_name: str) -> None:client = storage.Client()bucket = client.bucket(bucket_name)bucket.location = 'eu'bucket.create()print(f"Bucket created: {bucket.name}")bucket_name_env = os.getenv('BUCKET_NAME')create_bucket(bucket_name_env)
I used pyenv to install Python by the way.
Right, done with the preparations. I assume that your have your Go environment already prepared.
Now goes the code.
package mainimport ("io""net/http""net/url""errors""os""fmt""log""time""context""strings""github.com/pborman/uuid""cloud.google.com/go/storage")func main() {fileURL := os.Getenv("FILE_URL")bucketName := os.Getenv("BUCKET_NAME")objectPath := os.Getenv("OBJECT_PATH")fileName, err := extractFileName(fileURL)if err != nil {panic(err)}fileReader, closeReader, err := downloadFile(fileURL)if err != nil {panic(err)}defer func() {err := closeReader()if err != nil {panic(err)}}()err = uploadFile(fileName, fileReader, bucketName, objectPath)if err != nil {panic(err)}log.Println("Done!")}func downloadFile(url string) (reader io.ReadCloser, Close func() error, err error) {req, err := http.NewRequest("GET", url, nil)if err != nil {return nil, nil, err}client := &http.Client{Transport: &http.Transport{},}resp, err := client.Do(req)if err != nil {return nil, nil, err}if resp.StatusCode != http.StatusOK {resp.Body.Close()return nil, nil, errors.New("could not load the file " + fmt.Sprintf("%d", resp.StatusCode))}return resp.Body, resp.Body.Close, nil}func uploadFile(fileName string, fileReader io.ReadCloser, bucketName string, objectPath string) error {uploaderCtx := context.Background()uploaderCtx, cancel := context.WithTimeout(uploaderCtx, time.Second*50)defer cancel()targetObjectPath := objectPath+"/"+uuid.New()+"-"+fileNamelog.Println("Uploading to "+bucketName+"/"+targetObjectPath)client, err := storage.NewClient(uploaderCtx)if err != nil {return err}object := client.Bucket(bucketName).Object(targetObjectPath)objectWriter := object.NewWriter(uploaderCtx)if _, err := io.Copy(objectWriter, fileReader); err != nil {return err}if err := objectWriter.Close(); err != nil {return err}return nil}func extractFileName(fileURL string) (string, error) {parsedURL, err := url.Parse(fileURL)if err != nil {return "", err}splitPath := strings.Split(parsedURL.Path, "/")if len(splitPath) == 0 {return "", nil}return splitPath[len(splitPath) - 1], nil}
Few important things to outline here:
I use io.Copy() instead of ioutil.ReadAll() in order to save memory in case if the file is too large.
I properly close both the reader and the writer in order to make it clean.
I have the main function split onto several sub-functions for better structure. Despite the fact the boilerplate grew slightly thicker because of that, I still think it's better to have the things this way.
I use a separate context with a timeout, and of course it should not be the same context as the application itself uses.
Install dependencies via
$go get github.com/pborman/uuid cloud.google.com/go/storageThe code is licensed under the MIT licenseor if you have my go.mod, then via
$go mod downloadThe code is licensed under the MIT license
Yeh, a small step for mankind, one giant leap for me. Hope this was helpful for someone.
I wrote a simple makefile to keep everything in one place:
install:@pip install google-api-python-client@pip install google.cloud.storage@pip install google-cloud@pip install google-cloud-vision@go mod downloadcreate_resources:@godotenv -f ./.env.local python ./scripts/bootstrap-storage.pyrun_infra:@docker-compose upstop_infra:@docker-compose stoprun:@godotenv -f ./.env.local go run ./cmd/main.go
That's all for today, folks! As usual, the code is here.
Sergei Gannochenko
Golang, React, TypeScript, Docker, AWS, Jamstack.
19+ years in dev.