Ultimate checklist for a newly joining fullstack software engineer
I was working as a frontend engineer for a quite while, so when I was back to fullstack, it took some time to pick it up. One of the challenges was to gather all the information about the setup of the project I joined for, in order to be able to perform my duties at a professional level.
So here I've put together a checklist that every newly joining fullstack should go through, to feel confident developing, maintaining and observing a service they are responsible for.
The first and the most important thing is to know where to find Logs. You needed logs for every microservice, for both Staging and every Live environment in every region. Logs can be managed via Datadog, Loki or Loggly, so if that's the case, go there, filter out by the service name, log level set to "Error" and choose the backspan of one day.
So any time you could check the logs and quickly see if there are any recent errors.
Essentially alerting helps SRE and the rest of the team to react on possible outages, so its essential to have all that properly rigged. The alerting can be set up via Datadog as well, with dumping all relevant information to special Slack channels, so you make sure you joined these.
Metrics typically contain:
- Resource consumption (CPU, Memory, DB Size).
- Request per second.
- Request duration.
- Query duration.
- Error rate.
- Cost of running the cloud native application, per month, per quarter, per year.
- Any other custom metrics you might be interested in.
All these may be rigged in Grafana, however, Datadog can be also utilised for this cause. Again, the dashboards should be available for every environment, and the quick links must be saved for your disposal.
If a project uses things like Open Telemetry, it may gather information about so-called spans. A span is typically a sub-routine in the code, and tracing allows to measure the duration of these. It's essentially like a profiler, but for the backend. To see the spans several tools may be used, such as Jaeger or Datadog, so make sure you have it.
If the application is containerised, it is most likely running inside of a K8s cluster. Typically there is a Staging cluster, and several regional Live clusters. One way or another, in case of K8s a good option could be to use Spinnaker. Spinnaker allows deploying new versions of Docker images into the cluster swiftly and frictionless. Whenever you make a release that triggers an image build and, consequently, a Spinnaker pipeline, you might want to go to Spinnaker and see if there were any errors deploying that new image.
Again, the links must be saved for both Staging and Live environments.
GCP dashboard access
The very same thing is applicable to AWS. But if the infrastructure is spinning on GCP, at least two things should be done:
- Get access to the GCP panel to see all the resources.
- Set the gcloud CLI tool up in order to perform useful operation in the console.
Sometimes it's necessary to interact with the clusters directly. Typically, two tasks are quite frequent:
- Restart a misbehaving container.
- Obtain the logs of a failing container.
Assuming that gcloud was already installed and configured, you can get the credentials for a specific cluster and store them locally:
gcloud container clusters get-credentials <cluster_name> --zone <zone_name> --project <project_name>
To see the list of all stored credentials, you type:
kubectl config get-contexts
Then, to use a specific set of credentials for the current terminal session, you use the following command:
kubectl config use-context <context_name>
Most of the time there are multiple cloud-native application running on a single cluster, each under its own namespace. You list the namespaces and find the one that holds your project:
kubectl get namespace
Restart a misbehaving container
To restart a container, you scale it down to zero and then back to the regular amount of replicas:
kubectl -n <namespace> scale deploy <container_name> --replicas=0# wait for some timekubectl -n <namespace> scale deploy <container_name> --replicas=2
Obtain the logs of a failing container
When you notice that Spinnaker failed to spin up a new version of an image, you go check the container logs:
kubectl -n <namespace> logs -f <pod_name> -c <container_name>
There is also an amazing CLI tool called k9s for managing K8s clusters.
Connecting to the databases
You must always have read/write access to the staging database. You'll also gonna need access to all production databases in the readonly mode. The best way to do it would be to tunnel the connection to the local port.
With GPC it can be done via the cloud_sql_proxy tool. So you pick a port that you want to be allocated locally, and then run:
The best way to automate this would be to create a script that allows connecting to a database for an arbitrary country or region:
So make sure you have one. It is company-specific, so you'll have to make that script by yourself.
You can use any client to access the database. It can be DataGrip, but projects like PgAdmin or PhpMyAdmin can also do fine if you use Postgres on Mysql respectively. One important note: when connecting to the live database, always connect to the read replica. As mentioned before, if you don't have a read replica (which you should), at least make the connection readonly. You don't want to mess up with the production data, do you?
Running the application locally
It's a good practice to have an option of running the whole cloud-native app on your local machine. I my opinion, it's wise to not rely solely on Unit testing, but also be able to actually test new features before pushing to Staging.
Docker Desktop or Colima to the rescue, if you are on Mac or Windows. There is also a variety of projects that offer mocking of the most popular cloud services: Localstack for AWS and a handful of projects like gcloud-pubsub-emulator or gcp-storage-emulator for GCP.
Dumping the staging database locally
One of the most frequent thing that may happen to you is your QA engineer reporting an issue on Staging. Then without any further ado, you can just dump the staging database locally and do the research in a local environment, which is of course extremely transparent and safe. Better than digging down the logs and trying to figure the issue. You can even use a debugger if a situation calls for it.
There is always a dumping tool available for your database out there. Use pg_dump for Postgres, mongodump for MongoDB, etc. Make a tunnel to the read replica, dump and then restore locally. You can even make a script to automate this, exactly as I did.
So yeh, as you see, observability is the key.
This article is a work in progress, so as soon as I find new relevant information, I am gonna expand the post.
React, Node, Go, Docker, AWS, Jamstack.
15+ years in dev.