A python/boto tool to automatically enable logs on ELB load balancers – Part 2

This is the second of two posts that is dealing with writing a program to automatically configure ELB load balancers to log. If the reader has not read the first post, it is available here.

Kubernetes Job vs cronjob

As previously stated, a Job is a Kubernetes construct that allows me to run a container. With a job, Kubernetes keeps the information about the ‘run’ around, even after the job finishes. Jobs appear as pods, therefore kubectl get pods will list the containers that were started via a job. The cronjob is just an extension of the job and gives me the ability to run a job repeatedly on a schedule. I will examine the job first.

The Job

apiVersion: batch/v1
kind: Job
metadata:
  namespace: tooling
  name: enablelogs
spec:
  template:
    metadata:
      annotations:
        iam.amazonaws.com/role: "arn:aws:iam::12345678:role/elb-logging-worker-role-8fmuw5"
    spec:
      containers:
      - name: apply-logging-job
        image: 12345678.dkr.ecr.us-east-1.amazonaws.com/enablelogs:v1
        imagePullPolicy: IfNotPresent
        command:
        - python3
        - /home/app/enable-logs.py
        - -r
        - us-east-1
        - -l
        - logs-bucket-12345678 
      restartPolicy: Never
  backoffLimit: 1

Let’s get started by discussing line 14 first, the one with the image reference. I will need to package my python program as a docker container, and upload it to a registry. My Kubernetes cluster will then pull down the image to the node. It will then be able to create a pod and container to run my program.

The Dockerfile

FROM python:3.9.15-bullseye
ENV SERVICE_NAME enablelogs
ENV PS1="\u@${SERVICE_NAME}:\w # "

WORKDIR /home/app
COPY . /home/app

RUN pip3 install --no-cache-dir -r Requirements.txt

EXPOSE 80
USER appuser

Next is the Docker file, which, in this is fairly simple. Using the python base image from the docker hub, I copy the program to the container. Then I install the required libraries for the program with pip3. The requirements.txt is very simple and I show it below:

boto3==1.17.112
botocore==1.20.112

Creating and pushing the Docker image

After finishing writing the Dockerfile and requirements.txt, I now wish to build the docker image and upload it to AWS ECR. ECR is amazon’s container registry and is similar to docker hub. I am writing a shell script because with a manual procedure it is easy to miss a step, or make mistakes. It is important to have a script such as this one right from the start of the development. This is preaching to myself, as I usually write such a script after having done the steps manually too many times. I will use the shell script every time I wish to build and push a new version of the container to the repository. One can pass in the version tag as a parameter ($1), instead of hard coding this. Hard coding has the benefit of tracking the current version in git.

#!/bin/bash

AWS_REGION=us-east-1
AWS_ACCOUNT=1234567890
CONTAINER_NAME=fix-elb-logging
AWS_PROFILE=nonprod
VERSION_TAG=v1

aws ecr get-login-password --profile $AWS_PROFILE --region $AWS_REGION  | \
  docker login --username AWS --password-stdin $AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com
docker build -t $CONTAINER_NAME:$VERSION_TAG . --progress plain --no-cache
docker tag $CONTAINER_NAME:$VERSION_TAG $AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com/$CONTAINER_NAME:$VERSION_TAG
aws ecr create-repository --repository-name $CONTAINER_NAME --region $AWS_REGION || true
docker push $AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com/$CONTAINER_NAME:$VERSION_TAG

I am not going to explain in detail what each line does but will say that I need all of the above lines in the script. Please adjust the environment variables in ALL CAPS before running this script. Running this gives me build and push to ECR. I create an ECR repository if it does not yet exist. I update the VERSION_TAG version every time I make changes. There is an incredible amount of internal duplication in this short script. This is the reason I am using environment variables.

Credentials for AWS – Instance Profiles, Roles, Policies

I now briefly wish to discuss credentials. If one reviews the job.yaml line 10 shows a role annotation. This one-liner may seem simple, but it is conceptually very difficult to understand. In this section, I want to explain a role-based approach for giving my program permission to interact with the AWS API. Recall that the program I have written lists ELBs and makes configuration changes to ELBs. For this to work, I need a way to authenticate with AWS to prove that my program has the rights to make the changes to the AWS infrastructure. Three methods for managing authentication/authorization come to mind:

  1. Using IAM roles for service accounts.
  2. Using a third party tool called Kube2Iam.
  3. Using AWS credentials stored on the pod. (Don’t do this!)

Storing AWS credentials in docker images and in a pod/job is not a good idea. I am talking about AWS_ACCESS_KEY, and AWS_SECRET_ACCESS keys generated in the IAM console. My credentials have a linked admin policy, and I don’t want those all over pods inside the cluster. The best way is probably #1, but I will discuss #2, as #1 has more moving parts. Once I have set up kube2iam and a set of assumable roles, assigning a role to a pod becomes a one-liner. Using kube2iam also allows me to explain the concepts better. But before that, let me go over AWS IAM roles and policies.

IAM roles and policies

Likewise, roles and policies are important. An IAM policy expresses something a user or an entity can do. If I create a policy that has the permission s3:putobject, and assign this to a user, then that user will have the right to write to the bucket specified in the policy. Policies are no good on their own. I need to attach my policies somewhere, usually to a user or, more frequently, to a role.

A role is an abstraction of a set of permissions for a user or for some other entity. One links a role with a policy. One can then assign a role to a user. A program running on a node can temporarily assume a role. And there be dragons here. If I create a powerful role such as admin, who and what entities will be allowed to assume the role?

A somewhat contrived analogy

Imagine a country with very archaic and conservative rules. Young people wishing to marry must obtain the consent of their parents before going ahead with the marriage. Let’s assume a man wishes to marry a woman. The man asks his parents for permission to marry his love interest. When his parents approve, he goes to his future wife’s house. Once arrived, there is a meeting with the woman’s parents, in which he asks for her hand. Only after he has received consent from the woman’s parents can the marriage proceed.

Roles and instance profiles

A similar thing happens when I assign an instance profile(think role) to an EC2 instance. I would like the EC2 instance to assume another role to perform some tasks involving AWS permissions, like listing ELBs. I think of an instance profile as an EC2 instance-specific role. The instance profile is associated with only one specific role. Unfortunately, there is no console UI to see instance profiles, but one can see them with aws list-instance-profiles --region us-east-2. The returned instance profile list shows instance profiles and linked roles. Below I show an IAM Kubernetes node-group role, which EKS has assigned to my node group at node-group creation time. All EKS clusters have node groups. Each node group defines a role(Instance Profile) that is used by all nodes in the node group.

EKS also creates a launch configuration and an autoscaling group for each node group. As mentioned before, EKS assigns a unique IAM node group role to each node group. Instances that are part of the node group then have that role — that is an instance profile based on that role. I like to think about each of the worker EC2 instances in the node group as ‘having’ that role. Below, I am showing the AWS console UI for one of a role associated with a specific node group.

Examining an EKS node-group role in a bit more detail

The UI above shows the trust relationship tab. The json for the trust relationship gives this role the permission to assume other roles. This is like the man getting the permission from his parents to marry his love interest. Now the role also has linked permissions which are shown on the Permissions tab.

As an example, one can see that each of the worker nodes has the AmazonEC2ContainerRegistryReadOnly permission. Kubernetes needs this permission to be able to pull docker images from the ECR registry.

The trusted entities that a role needs to have to be assumed

Going back to the analogy, the man’s parent’s permission is insufficient, the couple needs the woman’s parents’ permission as well. I will now examine the role that a hypothetical AWS EKS node/pod running on the node (EC2 instance) may want to assume.

The key is to specify the correct role arn in line 16.

How to find the role ARN for a node group?

How do I find the arn for the node group? One way is to use the AWS cli. aws ecr list-nodegroups will present a list of node groups for the cluster of interest. One then calls aws eks describe-nodegrouppassing in the appropriate region, cluster name and node group name. The program will then display the list of node groups. One can look at node groups via the AWS console as well. Simply use your browser to go to the AWS console. Then navigate to EKS->Clusters->YourCluster->Compute Tab->Node Groups->[your node group] and look for ‘Node IAM role ARN’.

A fool-proof but more complex way of achieving the same for a specific node

The next method that I show below relies on being able to obtain a shell on a specific worker node. I use the krew node-shell plugin for this, but other applications, such as lens will allow this as well. First, I get a terminal with a valid connection to the desired EKS cluster. Then I ask Kubernetes to show the nodes via kubectl get nodes The system will list the nodes starting with ip-xxx. If I need to know the associated EC2 instance id of the node, I can find out by running kubectl describe node [name]. After shelling into the node (kubectl node-shell [node-name]), I want to discover the node’s instance profile. I can do this by doing a curl to the EC2 metadata service. I would execute curl -s http://169.254.169.254/latest/meta-data/iam from the node shell to get the instance profile. After this, I can execute aws list-instance-profiles and search for the instance profile. This command will then show the role associated to the instance profile and report the role name and its arn.

Beware of IMDSv2 when using kube2iam

The link below explains the EC2 introspection API and the differences between IMDSv1 and IMDSv2.  Why do I care? The curl command will not work without authorization if IMDSv2 is in play. I need to use IMDSv1. You will find more information about the metadata instance service here.  I will acknowledge that IMDSv2 increases security, and understand why Amazon has put this into place.

Note that the kube2iam does not support IMDSv2, it needs IMDSv1. There is a pull request that gives kube2iam the IMDSv2 functionality, but it is not ready at the time of writing. I specify to use IMDSv1 in my cluster-creation script which uses the terraform-eks module. I simply add:

# this is critical for kube2iam to work without IMDSv2, ie otherwise
# the instance does NOT have access to http://169.254.169.254/latest/meta-data/
metadata_options = {
  http_endpoint               = "enabled"
  http_tokens                 = "optional"
  http_put_response_hop_limit = 2
}

It matters which node/nodegroup your Job/Cronjob pod runs on

If I have 2 node groups with 2 instances each then the following is true. Two instances will have one role, and two will have another role. Unless I have two sections in my trusted entities listing each node group role arn, I may be in trouble. Unless I specify a nodeSelector: block, I will be unsure as to which node the job pods will be running on, as the Kubernetes scheduler determines this.

Enter Kube2Iam

Following all of the above, it is now time to discuss Kube2Iam. I can’t really describe the working of this program in detail, otherwise this would be a very long article. I install kube2iam with helm in my cluster (with IMDSv1 nodes) and configure it according to the instructions.

apiVersion: batch/v1
kind: Job
metadata:
  namespace: tooling
  name: enablelogs
spec:
  template:
    metadata:
      annotations:
        iam.amazonaws.com/role: "arn:aws:iam::12345678:role/elb-logging-worker-role-8fmuw5"
    spec:
      containers:

Now, after all the above explanations, the annotation in line 10 should make more sense. Once Kubernetes deploys the pod, the kube2iam pod kicks in and adds an iptables rule to our pod to forward the pod’s introspection requests to itself. It acts as a broker with IAM, returning the right information to the pod, tricking it to thinking it has a different role than just the role inherited from its parent EC2 node.

Note the role MUST list in its trusted entities, the arn of the node group that the EC2 parent worker node is part of. Otherwise, the pod can’t assume the role, and I will get access denied errors when listing ELBs. The policies attached to the role the pod assumes must allow the python program running in the pod to access the AWS resources. The pod then won’t need to have AWS environment variables mounted from secrets. Note that I did not set the nodeSelector on the job in the examples above.

{
    "Statement": [
        {
            "Action": "elasticloadbalancing:*",
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAddresses",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:DescribeVpcClassicLink",
                "ec2:DescribeInstances",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeClassicLinkInstances",
                "ec2:DescribeRouteTables",
                "ec2:DescribeCoipPools",
                "ec2:GetCoipPoolUsage",
                "ec2:DescribeVpcPeeringConnections",
                "cognito-idp:DescribeUserPoolClient"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "iam:CreateServiceLinkedRole",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
                }
            },
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Stmt1632859759389"
        }
    ],
    "Version": "2012-10-17"
}

Above, I show the policy and that links to the elb-logging-worker-role-8fmuw5 role . I show the assignment of pod to role in line 10. It gives the authority that the program needs to do its job.

The cronjob

In conclusion, let me quickly describe the cronjob.yaml, which is just a wrapped kubernetes job.

apiVersion: batch/v1
kind: CronJob
metadata:
  namespace: tooling
  name: apply-logging-for-s3-and-elb
spec:
  schedule: "*/2 * * * *"
  jobTemplate:
    spec:
      template:
        metadata:
          annotations:
            iam.amazonaws.com/role: elb-logging-worker-role-8fmuw5
        spec:
          containers:
          - name: apply-logging-job
            image: 1234567890.dkr.ecr.us-east-1.amazonaws.com/enablelogs:v1
            imagePullPolicy: IfNotPresent
            command:
            - python3
            - /home/app/enable-logs.py
            - -r
            - us-east-1
            - -l
            - logs-bucket-1234567890
          restartPolicy: Never

I choose a frequent schedule to test when developing, and then change it to run at the desired interval. I can use any of the online tools to generate the ‘schedule’ statement, as this is compatible with the standard Unix crontab configuration.

Possible Enhancements

I have added two features related to logging — enabling logging for aws buckets, and enabling versioning on buckets. The python implementation for this is left up as en exercise for the reader.

Conclusion

It is my hope that this post will help others who try to automate AWS-related tasks automatically. Feedback via comments would be greatly appreciated.

Leave a Reply