Error handling in Go: a Golang kata
In one of my application there was a total mess with error handling. Any wrong situation was treated as an internal error, so I could not set the alerting right. That's why every time there was an alert, I went to investigate, and it usually turned out to be some request for a non-existing ID or an inconsistent request body.
A more clever solution was needed.
The standard error is merely an interface, so it opens up endless possibilities for custom error creation, as the only constraint would be to implement the Error() method.
So, I've defined these requirements for the solution:
- A custom error type must implement the error interface, because I have miles and miles of legacy code that only supports that standard interface.
- An error should have a type (internal, bad input, not found), so it can become distinguishable at the top level in order to send distinct HTTP codes.
- The error type should be also used to define a log level (error, warning, info) in order to fix the metrics.
- An error should provide extensive information about the nature of the issue, including full stack trace.
- The solution must take care of potential panics, as nil-pointer panics is not an unusual thing in legacy code.
Apart from that, the logger itself needed some rewiring:
- The logger must support structured logging, so it generates JSON-formatted output that Grafana understands.
- It must accept additional information in a form of fields.
- It must also support an operation id. The id, once chosen, can combine several messages into a cluster, so the course of the entire operation could be captured in the logs in a better way.
The whole approach taken seems quite promising, the only concern is, as always, performance. I wouldn't use it for a highly loaded system, sticking to standard plain error instead.
But I don't have a highly loaded system here, so, full steam ahead!
The syserr package
There is a custom error implementation I've introduced to cover some requirements.
package syserrimport ("errors""fmt""strings"pkgError "github.com/pkg/errors")type Code stringconst (InternalCode Code = "internal"BadInputCode Code = "bad_input"NotFoundCode Code = "not_found")type Field struct {Key stringValue any}func NewFiled(key string, value any) *Field {return &Field{Key: key,Value: value,}}func Internal(message string, fields ...*Field) *Error {return NewError(InternalCode, message, fields...)}func BadInput(message string, fields ...*Field) *Error {return NewError(BadInputCode, message, fields...)}func NotFound(message string, fields ...*Field) *Error {return NewError(NotFoundCode, message, fields...)}func NewError(code Code, message string, fields ...*Field) *Error {stack := GetStack(pkgError.New(""))return &Error{Message: message,Code: code,Fields: fields,Stack: stack,}}type Error struct {Message stringCode CodeStack []*ErrorStackItemFields []*Field}type ErrorStackItem struct {File stringLine stringFunction string}func (e *Error) Error() string {return e.Message}func (e *Error) GetCode() Code {return e.Code}func (e *Error) GetStack() []*ErrorStackItem {return e.Stack}func (e *Error) GetFields() []*Field {return e.Fields}func WrapError(err error, message string, fields ...*Field) *Error {wrappedMessage := err.Error()if message != "" {wrappedMessage = fmt.Sprintf("%s: %s", message, wrappedMessage)}var systemError *Errorok := errors.As(err, &systemError)if ok {systemError.Message = wrappedMessagefor _, field := range fields {systemError.Fields = append(systemError.Fields, field)}return systemError} else {return NewError(InternalCode, wrappedMessage, fields...)}}type stackTracer interface {StackTrace() pkgError.StackTrace}func GetStack(err error) []*ErrorStackItem {var traceableError stackTracerok := errors.As(err, &traceableError)if ok {stackTrace := (traceableError).StackTrace()result := make([]*ErrorStackItem, len(stackTrace))for index, frame := range stackTrace {result[index] = &ErrorStackItem{File: getFrameFilePath(frame),Line: fmt.Sprintf("%d", frame),Function: fmt.Sprintf("%s", frame),}}return result}return make([]*ErrorStackItem, 0)}func getFrameFilePath(frame pkgError.Frame) string {frameString := strings.Split(fmt.Sprintf("%+s", frame), "\n\t")return frameString[1]}
Let me break it down. So,
- There is a custom error that holds a message, a code, a stack and a bunch of fields.
- There are three factory functions to create Internal, BadInput or NotFound types of errors. It also accepts optional fields.
- There is a NewError() function that creates an error with the github.com/pkg/errors package in order to obtain the call stack. I could have had this logic implemented, but why bother when the implementation is already available?
- There is also a WrapError() function that allows adding more context to an already existing error.
- There is a GetStack() helper that extracts call stack from an error created using the github.com/pkg/errors package.
👉 In many similar articles I've seen it was advised for the error structure to keep the HTTP code right from the start. I don't think this is a right approach, because the application can be exposed not only via REST, but also through gRPC or Kafka. Even the piece of code itself, the error is returned from, can be part of some microservice running on the background as a cron job. That's why the error code must be abstracted from the concrete transport or environment, and then essentially mapped into specific codes on the way out.
The logger package
Another key part is the logger. I've decided to put the log/slog logger into an additional wrapper in order to de-couple the rest of the code from the specific logger implementation.
package loggerimport ("context""log/slog""os"pkgContext "loggingerrorhandling/internal/context")var (logger = slog.New(slog.NewJSONHandler(os.Stdout, nil)))type Field struct {key stringvalue any}func NewFiled(key string, value any) *Field {return &Field{key: key,value: value,}}func Warning(ctx context.Context, message string, fields ...*Field) {logger.Warn(message, convertFields(addOperationID(ctx, fields))...)}func Error(ctx context.Context, message string, fields ...*Field) {logger.Error(message, convertFields(addOperationID(ctx, fields))...)}func Info(ctx context.Context, message string, fields ...*Field) {logger.Info(message, convertFields(addOperationID(ctx, fields))...)}func addOperationID(ctx context.Context, fields []*Field) []*Field {return append(fields, NewFiled("operation_id", pkgContext.GetOperationID(ctx)))}func convertFields(fields []*Field) []any {result := make([]any, 2*len(fields))index := 0for _, field := range fields {result[index] = field.keyresult[index+1] = field.valueindex += 2}return result}
Essentially,
- There are three functions for each of the log levels: info, warn, error, and they basically just wrap corresponding functions of the log/slog logger.
- The logger also checks the context for the presence of the operation id. If found, it is appended to the list of fields.
The HTTP middleware
In order to set this all in motion, a wrapper is needed. In case of HTTP endpoint it will be a middleware.
package httpimport ("context""encoding/json""errors""fmt""net/http"pkgContext "loggingerrorhandling/internal/context""loggingerrorhandling/internal/logger""loggingerrorhandling/internal/syserr")const (OperationIDHeaderName = "X-Operation-Id")func ResponseWriter(controllerFn func(w http.ResponseWriter, r *http.Request) ([]byte, error)) http.Handler {return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {r = r.WithContext(pkgContext.WithOperationID(r.Context(), extractProvidedOperationID(r)))w.Header().Set("Content-Type", "application/json")w.Header().Set(OperationIDHeaderName, pkgContext.GetOperationID(r.Context()))defer func() {if rec := recover(); rec != nil {httpStatus := http.StatusInternalServerErrorerr := syserr.Internal(fmt.Sprintf("PANIC: %v", rec))writeError(httpStatus, extractPublicMessage(syserr.InternalCode, err), w)logRequest(r, err, httpStatus)}}()responseBody, err := controllerFn(w, r)httpStatus := http.StatusOKif err != nil {errorCode := extractErrorCode(err)httpStatus = mapErrorCodeToHTTPStatus(errorCode, err)writeError(httpStatus, extractPublicMessage(errorCode, err), w)} else {w.WriteHeader(httpStatus)if len(responseBody) > 0 {_, _ = w.Write(responseBody)}}logRequest(r, err, httpStatus)})}func writeError(httpStatus int, publicErrorMessage string, w http.ResponseWriter) {w.WriteHeader(httpStatus)responseBody, _ := json.Marshal(&ErrorResponse{Error: publicErrorMessage,})_, _ = w.Write(responseBody)}func logRequest(r *http.Request, err error, httpStatus int) {loggerFn := logger.Infofields := make([]*logger.Field, 3)fields[0] = logger.NewFiled("method", r.Method)fields[1] = logger.NewFiled("url", fmt.Sprintf("%s?%s", r.URL.Path, r.URL.RawQuery))fields[2] = logger.NewFiled("status", httpStatus)loggedMessage := "request handled"// todo: we can also log the request body here, if neededif err != nil {errorCode := extractErrorCode(err)loggedMessage = fmt.Sprintf("%s with error: %s", loggedMessage, err.Error())loggerFn = mapErrorCodeToLoggerFunction(errorCode)stack := make([]string, 0)var systemError *syserr.Errorok := errors.As(err, &systemError)if ok {for _, field := range systemError.GetFields() {fields = append(fields, logger.NewFiled(field.Key, field.Value))}stack = formatErrorStack(systemError.GetStack())} else {stack = getErrorStackFormatted(err)}fields = append(fields, logger.NewFiled("stack", stack))}loggerFn(r.Context(), loggedMessage, fields...)}func mapErrorCodeToHTTPStatus(errorCode syserr.Code, err error) int {switch errorCode {case syserr.NotFoundCode:return http.StatusNotFoundcase syserr.BadInputCode:return http.StatusBadRequestdefault:return http.StatusInternalServerError}}func extractPublicMessage(errorCode syserr.Code, err error) string {switch errorCode {case syserr.InternalCode:return "internal error occurred"default:return err.Error()}}func mapErrorCodeToLoggerFunction(errorCode syserr.Code) func(ctx context.Context, message string, fields ...*logger.Field) {switch errorCode {case syserr.InternalCode:return logger.Errorcase syserr.NotFoundCode:return logger.Warningcase syserr.BadInputCode:return logger.Warningdefault:return logger.Error}}func extractErrorCode(err error) syserr.Code {var systemError *syserr.Errorok := errors.As(err, &systemError)if ok {return systemError.GetCode()}return syserr.InternalCode}func getErrorStackFormatted(e error) []string {return formatErrorStack(syserr.GetStack(e))}func formatErrorStack(stack []*syserr.ErrorStackItem) []string {result := make([]string, len(stack))for index, item := range stack {result[index] = fmt.Sprintf("%s:%s %s", item.File, item.Line, item.Function)}return result}func extractProvidedOperationID(r *http.Request) string {headers := r.Headerfor name, value := range headers {if name == OperationIDHeaderName {return value[0]}}return ""}
So, this middleware:
- Injects the operation id to the request context,
- Wraps the handler of an endpoint and executes it,
- If the handler returned an error, it looks for the type of the error. If the error was a custom one, it gets the code, stack and fields.
- The code is then mapped to an HTTP code using the mapErrorCodeToHTTPStatus() function.
- In case if the error was internal, a generic neutral message is shown to a user through the API, saving them from reading technical mambo-jumbo they won't anyway understand.
- Despite the fact the error message is obscured, there is the X-Operation-Id header returned in the response, so the corresponding record in the logs can be found using its value.
- The same code is mapped to the logger function with mapErrorCodeToLoggerFunction().
- Regardless of the fact whether there was an error or not, the request is logged, providing additional insights on the verb and path.
- The middleware also takes care of panics that may happen inside of the endpoint handlers. All panics are converted into the internal errors, logged and sent to the client with HTTP status 500. What is especially cool is the fact that even for panics the call stack is recorded!
Applying the middleware
The middleware should wrap every endpoint as shown below:
package mainimport ("context""fmt""net""net/http""os""sync""loggingerrorhandling/internal/controller/book"httpInternal "loggingerrorhandling/internal/http"bookRepository "loggingerrorhandling/internal/repository/book"bookService "loggingerrorhandling/internal/service/book""loggingerrorhandling/internal/util/db")const (keyServerAddress = "serverAddress")func main() {...booksRepo := &bookRepository.Repository{Session: session,}bookSvc := &bookService.Service{BookRepository: booksRepo,}bookController := book.Controller{BookService: bookSvc,}mux := http.NewServeMux()mux.Handle("/books", httpInternal.ResponseWriter(bookController.GetBooks))...}
Creating and wrapping
Here is how to spawn an error:
func (r *Repository) GetBookCount(ctx context.Context, filter string) (count int64, err error) {...if somethingIsntRight {return 0, syserr.BadInput("could not retrieve book count",syserr.NewFiled("term", filter),syserr.NewFiled("foo", "bar"),)}...}
And here is how to wrap it for better understanding of the context of the situation:
func (s *Service) GetBooks(ctx context.Context, filter string, page int32) (result *book.GetBooksResult, err error) {logger.Info(ctx, "trying to execute the GetBooks() method")...bookCount, err := s.BookRepository.GetBookCount(ctx, filter)if err != nil {return nil, syserr.WrapError(err, "could not execute GetBooks()")}...}
Please note, that all functions return a simple error interface. It allows:
- Gradual migration of legacy code from standard errors to the custom ones. Actually, I don't even have to migrate, I can just leave it as error and cast to a specific type using a middleware or another wrapper.
- Saving some boilerplate.
The output
{"time":"2024-02-27T00:13:16.485015+01:00","level":"INFO","msg":"trying to execute the GetBooks() method","operation_id":"2bdc993d-5686-47e2-9c6a-16ff12bbeb03"}{"time":"2024-02-27T00:13:16.485465+01:00","level":"WARN","msg":"request handled with error: could not execute GetBooks(): could not retrieve book count","method":"POST","url":"/books?","status":400,"term":"","foo":"bar","stack":["/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/syserr/syserr.go:44 syserr.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/syserr/syserr.go:36 syserr.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/repository/book/book.go:31 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/service/book/book.go:29 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/controller/book/book.go:33 book.go","/Users/mustermann/proj/pocs-demos-katas/golang/logging-error-handling/internal/http/middleware.go:23 middleware.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2136 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2514 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2938 server.go","/Users/mustermann/.gvm/gos/go1.21/src/net/http/server.go:2009 server.go","/Users/mustermann/.gvm/gos/go1.21/src/runtime/asm_amd64.s:1650 asm_amd64.s"],"operation_id":"2bdc993d-5686-47e2-9c6a-16ff12bbeb03"}
As you can see, there are two calls of the logger function, the first one is just unobtrusive debug, and the second records an error happened during the endpoint execution, caused by an illegal input (hence the 400 HTTP status). Both of the calls provide the operation id, and they also expose the fields defined when the error was initially created.
One gotcha
It may happen so, that some fields passed to the error factory function may contain sensitive information. While it may be fine to log it, you might not want this info to flick in front of your client's eyes. Also, the error message of format "something happened: this happened: that happened" is not quite user-friendly. So there must be away to make it non-tech people friendly and secure. This can be partially solved by marking certain fields as "sensitive", so they won't be passed on through REST.
Food for thought.
Another gotcha
The proposed method makes errors difficult to extend. Right now, we have Internal, BadInput and NotFound error types. However, what if we need to introduce the "Filesystem error" type and process them in a special way? Will it still be internal? Kind of yes, but we already have the Internal type. Should we introduce subtypes then?
The solution quickly turns into a rabbit hole stuffed with questions difficult to answer.
In Golang there is a way to create custom errors using the errors package, similar to what I just did above, and there are methods to check if an error is actually an instance of a certain structure. These methods rely on a chain of wrapped errors. An internal error can wrap a file system error, and still there will be a way to detect that from the outside.
I need to think about it. Maybe I'll upgrade the solution later.
That's all for today, folks! As usual, the code is here.
Sergei Gannochenko
Golang, React, TypeScript, Docker, AWS, Jamstack.
15+ years in dev.