Kubernetes CronJobs, Docker, Elastic Container Registry, Python base image

Introduction

This post will focus on the deployment side of things. Remember that a fully automated solution will necessitate the use of two independent CronJobs. If you’re unfamiliar with the term “Cronjob,” it comes from the Unix world. It is essentially a background program that can launch user-defined programs on a user-defined schedule. The classic example is ‘log rotation,’ in which a program renames a log file at the end of each day, and where the program creates a new blank log file to hold the next day’s logs.

A Cronjob in Kubernetes is simply a job that is scheduled to run. And a job will run a pod, and the pod will run a docker container, which will run our valuable program.  Running in Kubernetes may appear to be overkill at first, but keep in mind that once you set things up, Kubernetes takes care of managing things. Your server has less chance of crashing undetected, and it is difficult to accidentally break the cronjob. Running k8s cronjobs is pretty cool because each cronjob runs a Job, and with each Job comes a log showing the console output of the program that ran.

Recall that we need to launch a program to achieve two distinct goals:

  1. Launch the ‘tb’ – Trigger builds command — this will go and trigger all the Jenkins projects in a subfolder.   Each Jenkins build will launch the python Veracode scanner, which will in turn package the source, upload it to Veracode, start the scan, wait, retrieve the results and dump the results of the scan to the mongo database.  I will probably do this once per day, in the middle of the night, so that we do not consume build server resources during the day.
  2. Launch the Publish to Confluence part.  The same Veracode-scanner python project can be launched with the –confluence-publish flag.  It will then look for the latest scan results, and transform them into an HTML table, and publish the new document to the confluence wiki.  Note that it does not matter when we run this; even if the trigger build command is still busy doing analyses – we will capture the state of the database at the time of the run, and the HTML will reflect that.  The next run will then get newer results.  This can be triggered whenever, but I will probably set that one up to run daily at 7 am, so well before I would ever look at it.

To have a CronJob, we need a docker image, uploaded to a docker repository.  In this post, we will be using ECR, the elastic container registry from Amazon.  So the order goes like this

  1. Write the program (in our case the python program) first.
  2. Test the program locally by running it in the debugger.
  3. Identify a base image you can use, one that will have the python runtime.  Create your own if need be, but make sure it’s available in your docker registry
  4. Create a Dockerfile with a set of instructions to package your code, one that will use the base image from step 3
  5. Build the Docker container by executing docker build -t ‘image’ .
  6. Log into the ECR registry
  7. Tag the newly created build (this step is necessary if you are not using docker hub)
  8. Push the newly tagged image to ECR
  9. Testing locally
  10. Create a CronJob yaml file — it references the image tag you have just created and has the scheduling instructions (when, how frequently to run)
  11. Apply the CronJob file to your cluster with kubectl apply -f cronjob.yaml
  12. Test with a frequent schedule to make sure your program starts and does what it is supposed to do.
  13. Reset to the desired schedule and update the cluster with your cronjob.yaml manifest.

We need to do the packaging steps (1-8) once for the TB program (.net) and again for the scan-and-report.py program;  Note that setting up a Kubernetes cluster is out of the scope of this article.  I will be deploying using straight Kubernetes manifests, as this is the simplest way to go.  Argo-cd, kustomize or helm – these tools are overkill in this case.

Dockerizing the python program

After discussing the program in previous posts, we can say that step 1 is complete, and both the ‘tb’ and python programs are complete. We’ll start with the Python program and work our way through the steps. Let’s start with Step 3, the base image. We require an OS image that includes the Python runtime. Fortunately, we can get such an image on Docker Hub.

Let’s now look at Step 4, creating the Dockerfile for our python project.

FROM python:3.9.15-alpine3.16

ADD . /home/app
RUN pip install -r /home/app/requirements.txt
WORKDIR /home/app

EXPOSE 80

CMD python scan-and-report.py --confluence-publish

Line 1 shows us we are using python3.9, and it will be running on alpine 3.16.

For steps 5, 6, 7 and 8 we will create a shell script.  The benefit here is that we can re-run this at will, passing in a new image tag every time.  It’s best practice to version your image tags, and not re-push a different program with an old tag. The shell script name will be build_docker.sh

#!/bin/bash
# remember to login to docker before running this script
# remember to set your AWS_PROFILE to the correct profile
# pass the next to parameters to this script:
AWS_ACCOUNT_ID=$1
AWS_REGIONls=$2
VERSION=1.0.2
if [ $# -eq 0 ]
  then
    echo "No arguments supplied - run with AWS_ACCOUNT_ID and AWS_REGION as parameters"
    exit
fi
# build it
docker build -t veracode-scanner:$VERSION . --progress=plain --no-cache

# create the ECR repo if we need to
aws ecr create-repository --repository-name veracode-scanner --region $AWS_REGION > /dev/null || true 
docker tag veracode-scanner:$VERSION "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/veracode-scanner:$VERSION"

# login to the repo
echo $(aws ecr get-login-password --region "$AWS_REGION") | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com"

# and push the newly created image to the ECR registry
docker push "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/veracode-scanner:$VERSION"

Invoke this with bash build_docker.sh 123456789 us-east-1 where 12345689 is your AWS account number, and the second parameter is your region. You should get a message like 1.0.0: digest: sha256:4405580b83d7f31e1c18182711d03267213f6405dd37266d459472f1c4d134a8 size: 1994 after some push progress if things worked OK.

Dockerizing the .net ‘tb’ program

The good news is dockerizing the .net program is very similar, except that the Dockerfile is more complex.

FROM mcr.microsoft.com/dotnet/runtime:6.0 AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["jenkins-trigger-all-builds/jenkins-trigger-all-builds.csproj", "jenkins-trigger-all-builds/"]
RUN dotnet restore "jenkins-trigger-all-builds/jenkins-trigger-all-builds.csproj"
COPY . .
WORKDIR "/src/jenkins-trigger-all-builds"
RUN dotnet build "jenkins-trigger-all-builds.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "jenkins-trigger-all-builds.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT [ "dotnet", 
  "jenkins-trigger-all-builds.dll", 
  "run", 
  "--url", 
  "https://jenkins.net", 
  "--path", 
  "job/oneomics/job/veracode", 
  "--wait",
  "--token",
  "958fbe72-0179-4819-8929-7066e50feac2",
  "--max-wait-delay",
  "30" ]

This Dockerfile may appear frightening, but it is an automatically generated Dockerfile right down to the ENTRYPOINT. I won’t go into too much detail, but it’s a staged build that aims to reduce the docker image build time when changing the program; it builds with the full SDK but uses a much smaller runtime image to allow the.net code to run. The ENTRYPOINT is the interesting part; it is simply our program with all of its glorious command line parameters, as explained in previous posts. When the Docker container starts, so does the program. We simply need to ensure that we provide the environment variables required by the executable. Then we also need to make sure that we mount the required secrets for the job. Finally, we build and push the image using a slightly changed shell script.

Testing your docker images locally

The program that was dockerized was hopefully tested before you built your docker images. Now comes the integration test: will the dockerized image work when I run it manually? This type of test will identify any problems with our base image. So, how do we go about it? Simple. Simply run inside Docker. You may wonder how.

The container image, the BEARER_TOKEN, and MONGO_CONNECTION_STRING – all these are required to run the program that publishes to Confluence. Remember that the connection string MONGO_CONNECTION_STRING=mongodb:/localhost:27017/ was specified, but in the context of the container, localhost refers to the docker container. The connection string must point to the machine hosting the Docker container, which in my case is a Mac workstation. We need to use host.docker.internal inside the container. The mongo environment variable must thus be MONGO_CONNECTION_STRING=mongodb:/host.docker.internal:27017. The container must then be started. I use:

IMAGE_ID=$(docker run \
           --env BEARER_TOKEN='Bearer NOT-THE-REAL-THING-vpMJytHf3R6MaGXVBLhDF2YPPP' \
           --env MONGO_CONNECTION_STRING=mongodb://host.docker.internal:27017 \
           -itd veracode-scanner:1.0.2 ash)

This runs the docker container while setting the needed env variables and runs ‘ash’ the default command interpreter on Linux-alpine-based systems.  Because of the -i, it’s interactive, -t gets us a tty, and -d gets the container to run in the background.  Docker will start the container, and assign its ID to the IMAGE_ID variable.

Next, we execute a separate shell in the running container:

docker exec -it $IMAGE_ID ash

This gets us a terminal with a command prompt, which looks like this:

You can see we have a nice root-level command prompt, and in /home/app is our python script and all the supporting files. We can now verify that the environment variables are set correctly, use env or set to do this.

Time to run the program — we shell exec into the container, and launch the program.

The output attests to the fact that it worked. A quick check of the confluence page shows updated date stamps. Be sure to run docker kill $IMAGE_ID && docker rm $IMAGE id to clean up after yourself.

Kubenetes cron-job time. You can find helpful documentation here about Kubernetes, or k8s cronjobs.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: veracode-confluence-updater
  namespace: tooling
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: veracode-confluence-updater
            image: [REDACTED].dkr.ecr.us-east-1.amazonaws.com/veracode-scanner:1.0.3
            imagePullPolicy: IfNotPresent
            command:
            - python
            - scan-and-report.py
            - --confluence-publish
          restartPolicy: OnFailure

We’re ready to go now that we’ve saved this as ‘publish-cronjob.yaml,’ or are we? At this point, you must have kubectl installed, a cluster configured, and your image pushed, and we must remember the needed environment variables. We need MONGO_CONNECTION_STRING to point to a database with some actual records. We will house this database as a pod within the cluster. The creation of the mongo database using helm (or other tools) is beyond the scope of this article; you can look up how to do this on the internet. We must also remember to make the BEARER_TOKEN environment variable a secret because we use it as our confluence credential. Let’s start with the secret.

apiVersion: v1
data:
  bearer-token: Tk9fWU9VX0RPTl9UX0dFVF9NWV9TRUNSRVQ=
  mongo-connection-string: SVRTX0NPTkZJREVOVElBTF9ZT1VfS05PVw==
kind: Secret
metadata:
  name: veracode-publisher-secrets
  namespace: tooling
type: Opaque

Save as secrets.yaml . The above contains fake secrets — remember that all secrets must be base-64 encoded.  To do this base64 encoding on a *nix system used

printf "your-secret" | base64 in a terminal.  This will spit out the secret in the correct format.  Don’t use echo, as this adds a \n

Next, we create the secret with

kubectl apply -f secrets.yaml

You should not store this secret in git, for obvious reasons.  Once it’s in git, it never goes away unless you use bfg, the big friendly giant tool, but that one re-writes history.  So we create a .gitignore with a single line:

secrets.yaml

Secrets management is a topic beyond the scope of this article, we use a tool called Mozilla sops to encrypt the secrets.yaml file to a file called secrets.enc.yaml using an AWS KMS key.  We then commit secrets.enc.yaml to the repo.

In my experience, most of the ‘not working’ issues are related to config and secrets, so you must be careful when setting these up.  Now that we have defined the secrets in the cluster’s tooling namespace, the cronjob must reference them.  The revised cronjob that mounts the environment variables that our program requires from the k8s secret store goes something like this:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: veracode-confluence-updater
  namespace: tooling
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: veracode-confluence-updater
            image: [REDACTED].dkr.ecr.us-east-1.amazonaws.com/veracode-scanner:1.0.3
            imagePullPolicy: IfNotPresent
            command:
            - python
            - scan-and-report.py
            - --confluence-publish
            env:
              - name: BEARER_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: veracode-publisher-secrets
                    key: bearer-token
                    optional: false 
              - name: MONGO_CONNECTION_STRING
                valueFrom:
                  secretKeyRef:
                    name: veracode-publisher-secrets
                    key: mongo-connection-string
                    optional: false 
          restartPolicy: OnFailure

The */1 indicates that you should run every minute. This is useful for testing. To stop the cronjob after it has been installed, use kubectl delete -f publish-cronjob. yaml. Consider the mongo database before pressing the trigger to apply. It’s empty. Before we run our cronjob, we need to run our fake DB population script to ensure that there is data in the database. The advantage of doing so is that we can mount the secrets in an interactive container and use the mongo shell to see if we have any authentication or server address (connection string) issues. How do we start an interactive container with our Python program? It’s a new yaml manifest!

apiVersion: v1
kind: Pod
metadata:
  name: veracode-test-pod
  namespace: tooling
spec:
  containers:
    - image: 812928068820.dkr.ecr.us-east-1.amazonaws.com/veracode-scanner:1.0.3
      name: veracode-confluence-test
      command: [ "ash", "-c", "tail -f /dev/null" ]
      env:
        - name: BEARER_TOKEN
          valueFrom:
            secretKeyRef:
              name: veracode-publisher-secrets
              key: bearer-token
              optional: false 
        - name: MONGO_CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: veracode-publisher-secrets
              key: mongo-connection-string
              optional: false 
   restartPolicy: Never

Save as pod.yaml.  Run kubectl apply -f pod.yaml

Next, we need to shell into the pod, with kubectl exec -it Veracode-test-pod — ash

From there, we can modify the python code to call the code to add the fake DB records.

We are now ready to test the cronjob!

Notice the namespace here is tooling — that is where the cronjobs will start appearing

kubectl apply -f publish-cronjob.yaml

You will now want to execute kubectl get pods, for up to one minute to see your job appear.   The webpage should now update every minute.

Run kubectl delete -f publish-cronjob.yaml to remove the cronjob.   Edit the crontab to run once per day, and re-apply — Bob’s your uncle!

Setting up the second cronjob follows the same principles as explained above.

This concludes this series of articles about static code analysis.  Feedback (comments) are always welcome.

Leave a Reply