Avatar

Error handling in Go: a Golang kata

← Back to list
Posted on 28.02.2024
Image by AI on Dall-E (ChatGPT)
Refill!

In one of my application there was a total mess with error handling. Any wrong situation was treated as an internal error, so I could not set the alerting right. That's why every time there was an alert, I went to investigate, and it usually turned out to be some request for a non-existing ID or an inconsistent request body.

A more clever solution was needed.

The standard error is merely an interface, so it opens up endless possibilities for custom error creation, as the only constraint would be to implement the Error() method.

So, I've defined these requirements for the solution:

  • A custom error type must implement the error interface, because I have miles and miles of legacy code that only supports that standard interface.
  • An error should have a type (internal, bad input, not found), so it can become distinguishable at the top level in order to send distinct HTTP codes.
  • The error type should be also used to define a log level (error, warning, info) in order to fix the metrics.
  • An error should provide extensive information about the nature of the issue, including full stack trace.
  • The solution must take care of potential panics, as nil-pointer panics is not an unusual thing in legacy code.

Apart from that, the logger itself needed some rewiring:

  • The logger must support structured logging, so it generates JSON-formatted output that Grafana understands.
  • It must accept additional information in a form of fields.
  • It must also support an operation id. The id, once chosen, can combine several messages into a cluster, so the course of the entire operation could be captured in the logs in a better way.

The whole approach taken seems quite promising, the only concern is, as always, performance. I wouldn't use it for a highly loaded system, sticking to standard plain error instead.

But I don't have a highly loaded system here, so, full steam ahead!

The syserr package

There is a custom error implementation I've introduced to cover some requirements.

👉 📃  internal/syserr/syserr.go
package syserr
import (
"errors"
"fmt"
"strings"
pkgError "github.com/pkg/errors"
)
type Code string
const (
InternalCode Code = "internal"
BadInputCode Code = "bad_input"
NotFoundCode Code = "not_found"
)
type Field struct {
Key string
Value any
}
func NewFiled(key string, value any) *Field {
return &Field{
Key: key,
Value: value,
}
}
func Internal(message string, fields ...*Field) *Error {
return NewError(InternalCode, message, fields...)
}
func BadInput(message string, fields ...*Field) *Error {
return NewError(BadInputCode, message, fields...)
}
func NotFound(message string, fields ...*Field) *Error {
return NewError(NotFoundCode, message, fields...)
}
func NewError(code Code, message string, fields ...*Field) *Error {
stack := GetStack(pkgError.New(""))
return &Error{
Message: message,
Code: code,
Fields: fields,
Stack: stack,
}
}
type Error struct {
Message string
Code Code
Stack []*ErrorStackItem
Fields []*Field
}
type ErrorStackItem struct {
File string
Line string
Function string
}
func (e *Error) Error() string {
return e.Message
}
func (e *Error) GetCode() Code {
return e.Code
}
func (e *Error) GetStack() []*ErrorStackItem {
return e.Stack
}
func (e *Error) GetFields() []*Field {
return e.Fields
}
func WrapError(err error, message string, fields ...*Field) *Error {
wrappedMessage := err.Error()
if message != "" {
wrappedMessage = fmt.Sprintf("%s: %s", message, wrappedMessage)
}
var systemError *Error
ok := errors.As(err, &systemError)
if ok {
systemError.Message = wrappedMessage
for _, field := range fields {
systemError.Fields = append(systemError.Fields, field)
}
return systemError
} else {
return NewError(InternalCode, wrappedMessage, fields...)
}
}
type stackTracer interface {
StackTrace() pkgError.StackTrace
}
func GetStack(err error) []*ErrorStackItem {
var traceableError stackTracer
ok := errors.As(err, &traceableError)
if ok {
stackTrace := (traceableError).StackTrace()
result := make([]*ErrorStackItem, len(stackTrace))
for index, frame := range stackTrace {
result[index] = &ErrorStackItem{
File: getFrameFilePath(frame),
Line: fmt.Sprintf("%d", frame),
Function: fmt.Sprintf("%s", frame),
}
}
return result
}
return make([]*ErrorStackItem, 0)
}
func getFrameFilePath(frame pkgError.Frame) string {
frameString := strings.Split(fmt.Sprintf("%+s", frame), "\n\t")
return frameString[1]
}
The code is licensed under the MIT license

Let me break it down. So,

  • There is a custom error that holds a message, a code, a stack and a bunch of fields.
  • There are three factory functions to create Internal, BadInput or NotFound types of errors. It also accepts optional fields.
  • There is a NewError() function that creates an error with the github.com/pkg/errors package in order to obtain the call stack. I could have had this logic implemented, but why bother when the implementation is already available?
  • There is also a WrapError() function that allows adding more context to an already existing error.
  • There is a GetStack() helper that extracts call stack from an error created using the github.com/pkg/errors package.

👉 In many similar articles I've seen it was advised for the error structure to keep the HTTP code right from the start. I don't think this is a right approach, because the application can be exposed not only via REST, but also through gRPC or Kafka. Even the piece of code itself, the error is returned from, can be part of some microservice running on the background as a cron job. That's why the error code must be abstracted from the concrete transport or environment, and then essentially mapped into specific codes on the way out.

The logger package

Another key part is the logger. I've decided to put the log/slog logger into an additional wrapper in order to de-couple the rest of the code from the specific logger implementation.

👉 📃  internal/logger/logger.go
package logger
import (
"context"
"log/slog"
"os"
pkgContext "loggingerrorhandling/internal/context"
)
var (
logger = slog.New(slog.NewJSONHandler(os.Stdout, nil))
)
type Field struct {
key string
value any
}
func NewFiled(key string, value any) *Field {
return &Field{
key: key,
value: value,
}
}
func Warning(ctx context.Context, message string, fields ...*Field) {
logger.Warn(message, convertFields(addOperationID(ctx, fields))...)
}
func Error(ctx context.Context, message string, fields ...*Field) {
logger.Error(message, convertFields(addOperationID(ctx, fields))...)
}
func Info(ctx context.Context, message string, fields ...*Field) {
logger.Info(message, convertFields(addOperationID(ctx, fields))...)
}
func addOperationID(ctx context.Context, fields []*Field) []*Field {
return append(fields, NewFiled("operation_id", pkgContext.GetOperationID(ctx)))
}
func convertFields(fields []*Field) []any {
result := make([]any, 2*len(fields))
index := 0
for _, field := range fields {
result[index] = field.key
result[index+1] = field.value
index += 2
}
return result
}
The code is licensed under the MIT license

Essentially,

  • There are three functions for each of the log levels: info, warn, error, and they basically just wrap corresponding functions of the log/slog logger.
  • The logger also checks the context for the presence of the operation id. If found, it is appended to the list of fields.

The HTTP middleware

In order to set this all in motion, a wrapper is needed. In case of HTTP endpoint it will be a middleware.

👉 📃  internal/http/middleware.go
package http
import (
"context"
"encoding/json"
"errors"
"fmt"
"net/http"
pkgContext "loggingerrorhandling/internal/context"
"loggingerrorhandling/internal/logger"
"loggingerrorhandling/internal/syserr"
)
const (
OperationIDHeaderName = "X-Operation-Id"
)
func ResponseWriter(controllerFn func(w http.ResponseWriter, r *http.Request) ([]byte, error)) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
r = r.WithContext(pkgContext.WithOperationID(r.Context(), extractProvidedOperationID(r)))
w.Header().Set("Content-Type", "application/json")
w.Header().Set(OperationIDHeaderName, pkgContext.GetOperationID(r.Context()))
defer func() {
if rec := recover(); rec != nil {
httpStatus := http.StatusInternalServerError
err := syserr.Internal(fmt.Sprintf("PANIC: %v", rec))
writeError(httpStatus, extractPublicMessage(syserr.InternalCode, err), w)
logRequest(r, err, httpStatus)
}
}()
responseBody, err := controllerFn(w, r)
httpStatus := http.StatusOK
if err != nil {
errorCode := extractErrorCode(err)
httpStatus = mapErrorCodeToHTTPStatus(errorCode, err)
writeError(httpStatus, extractPublicMessage(errorCode, err), w)
} else {
w.WriteHeader(httpStatus)
if len(responseBody) > 0 {
_, _ = w.Write(responseBody)
}
}
logRequest(r, err, httpStatus)
})
}
func writeError(httpStatus int, publicErrorMessage string, w http.ResponseWriter) {
w.WriteHeader(httpStatus)
responseBody, _ := json.Marshal(&ErrorResponse{
Error: publicErrorMessage,
})
_, _ = w.Write(responseBody)
}
func logRequest(r *http.Request, err error, httpStatus int) {
loggerFn := logger.Info
fields := make([]*logger.Field, 3)
fields[0] = logger.NewFiled("method", r.Method)
fields[1] = logger.NewFiled("url", fmt.Sprintf("%s?%s", r.URL.Path, r.URL.RawQuery))
fields[2] = logger.NewFiled("status", httpStatus)
loggedMessage := "request handled"
// todo: we can also log the request body here, if needed
if err != nil {
errorCode := extractErrorCode(err)
loggedMessage = fmt.Sprintf("%s with error: %s", loggedMessage, err.Error())
loggerFn = mapErrorCodeToLoggerFunction(errorCode)
stack := make([]string, 0)
var systemError *syserr.Error
ok := errors.As(err, &systemError)
if ok {
for _, field := range systemError.GetFields() {
fields = append(fields, logger.NewFiled(field.Key, field.Value))
}
stack = formatErrorStack(systemError.GetStack())
} else {
stack = getErrorStackFormatted(err)
}
fields = append(fields, logger.NewFiled("stack", stack))
}
loggerFn(r.Context(), loggedMessage, fields...)
}
func mapErrorCodeToHTTPStatus(errorCode syserr.Code, err error) int {
switch errorCode {
case syserr.NotFoundCode:
return http.StatusNotFound
case syserr.BadInputCode:
return http.StatusBadRequest
default:
return http.StatusInternalServerError
}
}
func extractPublicMessage(errorCode syserr.Code, err error) string {
switch errorCode {
case syserr.InternalCode:
return "internal error occurred"
default:
return err.Error()
}
}
func mapErrorCodeToLoggerFunction(errorCode syserr.Code) func(ctx context.Context, message string, fields ...*logger.Field) {
switch errorCode {
case syserr.InternalCode:
return logger.Error
case syserr.NotFoundCode:
return logger.Warning
case syserr.BadInputCode:
return logger.Warning
default:
return logger.Error
}
}
func extractErrorCode(err error) syserr.Code {
var systemError *syserr.Error
ok := errors.As(err, &systemError)
if ok {
return systemError.GetCode()
}
return syserr.InternalCode
}
func getErrorStackFormatted(e error) []string {
return formatErrorStack(syserr.GetStack(e))
}
func formatErrorStack(stack []*syserr.ErrorStackItem) []string {
result := make([]string, len(stack))
for index, item := range stack {
result[index] = fmt.Sprintf("%s:%s %s", item.File, item.Line, item.Function)
}
return result
}
func extractProvidedOperationID(r *http.Request) string {
headers := r.Header
for name, value := range headers {
if name == OperationIDHeaderName {
return value[0]
}
}
return ""
}
The code is licensed under the MIT license

So, this middleware:

  • Injects the operation id to the request context,
  • Wraps the handler of an endpoint and executes it,
  • If the handler returned an error, it looks for the type of the error. If the error was a custom one, it gets the code, stack and fields.
  • The code is then mapped to an HTTP code using the mapErrorCodeToHTTPStatus() function.
  • In case if the error was internal, a generic neutral message is shown to a user through the API, saving them from reading technical mambo-jumbo they won't anyway understand.
  • Despite the fact the error message is obscured, there is the X-Operation-Id header returned in the response, so the corresponding record in the logs can be found using its value.
  • The same code is mapped to the logger function with mapErrorCodeToLoggerFunction().
  • Regardless of the fact whether there was an error or not, the request is logged, providing additional insights on the verb and path.
  • The middleware also takes care of panics that may happen inside of the endpoint handlers. All panics are converted into the internal errors, logged and sent to the client with HTTP status 500. What is especially cool is the fact that even for panics the call stack is recorded!

Applying the middleware

The middleware should wrap every endpoint as shown below:

👉 📃  cmd/main.go
package main
import (
"context"
"fmt"
"net"
"net/http"
"os"
"sync"
"loggingerrorhandling/internal/controller/book"
httpInternal "loggingerrorhandling/internal/http"
bookRepository "loggingerrorhandling/internal/repository/book"
bookService "loggingerrorhandling/internal/service/book"
"loggingerrorhandling/internal/util/db"
)
const (
keyServerAddress = "serverAddress"
)
func main() {
...
booksRepo := &bookRepository.Repository{
Session: session,
}
bookSvc := &bookService.Service{
BookRepository: booksRepo,
}
bookController := book.Controller{
BookService: bookSvc,
}
mux := http.NewServeMux()
mux.Handle("/books", httpInternal.ResponseWriter(bookController.GetBooks))
...
}
The code is licensed under the MIT license

Creating and wrapping

Here is how to spawn an error:

func (r *Repository) GetBookCount(ctx context.Context, filter string) (count int64, err error) {
...
if somethingIsntRight {
return 0, syserr.BadInput(
"could not retrieve book count",
syserr.NewFiled("term", filter),
syserr.NewFiled("foo", "bar"),
)
}
...
}
The code is licensed under the MIT license

And here is how to wrap it for better understanding of the context of the situation:

func (s *Service) GetBooks(ctx context.Context, filter string, page int32) (result *book.GetBooksResult, err error) {
logger.Info(ctx, "trying to execute the GetBooks() method")
...
bookCount, err := s.BookRepository.GetBookCount(ctx, filter)
if err != nil {
return nil, syserr.WrapError(err, "could not execute GetBooks()")
}
...
}
The code is licensed under the MIT license

Please note, that all functions return a simple error interface. It allows:

  • Gradual migration of legacy code from standard errors to the custom ones. Actually, I don't even have to migrate, I can just leave it as error and cast to a specific type using a middleware or another wrapper.
  • Saving some boilerplate.

The output

$
{"time":"2024-02-27T00:13:16.485015+01:00","level":"INFO","msg":"trying to execute the GetBooks() method","operation_id":"2bdc993d-5686-47e2-9c6a-16ff12bbeb03"}
{"time":"2024-02-27T00:13:16.485465+01:00","level":"WARN","msg":"request handled with error: could not execute GetBooks(): could not retrieve book count","method":"POST","url":"/books?","status":400,"term":"","foo":"bar","stack":["/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/syserr/syserr.go:44 syserr.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/syserr/syserr.go:36 syserr.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/repository/book/book.go:31 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/service/book/book.go:29 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/controller/book/book.go:33 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/http/middleware.go:23 middleware.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2136 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2514 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2938 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2009 server.go","/Users/mustermann/.gvm/gos/go1.21/src/runtime/asm_amd64.s:1650 asm_amd64.s"],"operation_id":"2bdc993d-5686-47e2-9c6a-16ff12bbeb03"}
The code is licensed under the MIT license

As you can see, there are two calls of the logger function, the first one is just unobtrusive debug, and the second records an error happened during the endpoint execution, caused by an illegal input (hence the 400 HTTP status). Both of the calls provide the operation id, and they also expose the fields defined when the error was initially created.

One gotcha

It may happen so, that some fields passed to the error factory function may contain sensitive information. While it may be fine to log it, you might not want this info to flick in front of your client's eyes. Also, the error message of format "something happened: this happened: that happened" is not quite user-friendly. So there must be away to make it non-tech people friendly and secure. This can be partially solved by marking certain fields as "sensitive", so they won't be passed on through REST.

Food for thought.

Another gotcha

The proposed method makes errors difficult to extend. Right now, we have Internal, BadInput and NotFound error types. However, what if we need to introduce the "Filesystem error" type and process them in a special way? Will it still be internal? Kind of yes, but we already have the Internal type. Should we introduce subtypes then?

The solution quickly turns into a rabbit hole stuffed with questions difficult to answer.

In Golang there is a way to create custom errors using the errors package, similar to what I just did above, and there are methods to check if an error is actually an instance of a certain structure. These methods rely on a chain of wrapped errors. An internal error can wrap a file system error, and still there will be a way to detect that from the outside.

I need to think about it. Maybe I'll upgrade the solution later.

That's all for today, folks! As usual, the code is here.


Avatar

Sergei Gannochenko

Business-oriented fullstack engineer, in ❤️ with Tech.
Golang, React, TypeScript, Docker, AWS, Jamstack.
15+ years in dev.