Posting the latest Veracode scan information to a Dashboard

Introduction

The big visible chart is what we want next — that is a view of our Veracode Static Code Analysis metrics.  This article focuses on Atlassian Confluence and publishing to a server-based company internal wiki. The shown techniques to create HTML should also apply if you are going to be publishing to a web server.

We need a helper that will do the dirty work. After some fiddling, I came up with this python code which one can adapt to other languages. I think I have a .net version of this kicking around somewhere too.  Here I present the complete class.  It exposes just one useful method the write_data.  Of note here is that the user agent is the curl user agent. The server seemed to not like a normal browser user agent.  Also take note of the authentication mechanism for Confluence, which is a Bearer token.  Line 65 in the snippet below shows how it is sending the token as part of the Authorization header. The program will look at an environment variable called BEARER_TOKEN.  This is very much Kubernetes secrets compatible.  We can get a confluence user token by following these steps:

  1. Log into Confluence
  2. Click on your user icon (right-topmost icon), select Settings
  3. On the left sidebar, the last menu item on the bottom is Personal Access tokens
  4. Click the Create token button, give it a name, and pick an expiry (or no expiry), and click create.
  5. The UI displays the Personal Access token – save and put into your password-keeping program.
import json
import os
import requests


class Confluence:
    BASE_URL = "https://YOUR-CONFLUENCE-URL/rest/api/content"
    VIEW_URL = "https://YOUR-CONFLUENCE-URL/pages/viewpage.action?pageId="
    USER_AGENT = "curl/7.84.0"
    bearer_token = None
    page_id = 0

    def __init__(self, page_id):
        self.page_id = page_id
        self.bearer_token = os.environ.get("BEARER_TOKEN")

    def get_page_ancestors(self):
        url = '{base}/{page_id}'.format(
            base=self.BASE_URL,
            page_id=self.page_id)

        r = requests.get(url,
                         headers={'Accept': '*/*', 'User-Agent': self.USER_AGENT,
                                  'Authorization': self.bearer_token},
                         params={'expand': 'ancestors'})
        r.raise_for_status()
        return r.json()['ancestors']

    def write_data(self, html, title=None):
        info = self.get_page_info()
        version = int(info["version"]['number']) + 1
        ancestors = self.get_page_ancestors()
        anc = ancestors[-1]
        del anc['_links']
        del anc['_expandable']
        del anc['extensions']

        if title is not None:
            info['title'] = title

        data = {
            'id': str(self.page_id),
            'type': 'page',
            'title': info['title'],
            'version': {'number': version},
            'ancestors': [anc],
            'body': {
                'storage':
                    {
                        'representation': 'storage',
                        'value': str(html),
                    }
            }
        }

        new_data = json.dumps(data)
        address = '{base}/{page_id}'.format(
            base=self.BASE_URL,
            page_id=self.page_id)

        r = requests.put(
            address,
            data=new_data,
            headers={'Accept': '*/*', 'Content-Type': 'application/json', 'User-Agent': self.USER_AGENT,
                     'Authorization': self.bearer_token})

        r.raise_for_status()
        print("Wrote '%s' version %d" % (info['title'], version))

    def get_page_info(self):
        url = '{base}/{page_id}'.format(
            base=self.BASE_URL,
            page_id=self.page_id)

        r = requests.get(url,
                         headers={'Accept': '*/*',
                                  'User-Agent': self.USER_AGENT,
                                  'Authorization': self.bearer_token})
        r.raise_for_status()
        return r.json()

    @staticmethod
    def pprint(data):
        print(json.dumps(
            data,
            sort_keys=True,
            indent=4,
            separators=(', ', ' : ')))

The first thing to note is that this code does not CREATE a page. We must use a page that already exists.  You will need to find the PageId. One can easily find the PageID in the URL when editing the page in confluence.  We now examine how we create the HTML next.

I’m using a fairly ‘caveman’ approach (as my son Yannik would say) to generate the HTML. I am using two HTML template files, which I will present below. The first one that I will call pg1.html is the template.  This template is for the ‘contents’ of the page.  Since each confluence page already has a <html><head><body>, we don’t need to worry about that.  We do need to worry about setting up an HTML table and are styling the table with confluence-based styles.  We just include the first header row in the HTML for now.

<p>This page was auto-generated on $DATE$.</p>
<br />
<p>The following is a list of current Veracode Applications showing Static Code Analysis</p>
<p>results. This page is being generated by an automatic analysis process; the data is </p>
<p>based on Veracode scan results generated by an automatic Jenkins build. <strong>DO NOT EDIT THIS PAGE</strong> as it will be automatically overwritten. </p><br/>

<div class="table-wrap">
<table class="wrapped relative-table confluenceTable" style="width: 88.871%;">
<colgroup>
<col style="width: 40.0%;" />
<col style="width: 10.0%;" />
<col style="width: 10.0%;" />
<col style="width: 10.0%;" />
<col style="width: 30.0%;" />
</colgroup>
<tbody>
<tr>
<th class="confluenceTh">Application/ Project Name</th>
<th class="confluenceTh">High Findings</th>
<th class="confluenceTh">Medium Findings</th>
<th class="confluenceTh">Low Findings</th>
<th class="confluenceTh">Last scan date</th>
</tr>

Next is pg2.html, which is just a template for a row.

<tr>
<td class="confluenceTd">$APP$</td>
<td class="confluenceTd">$HI$</td>
<td class="confluenceTd">$MEDIUM$</td>
<td class="confluenceTd">$LOW$</td>
<td class="confluenceTd">$DATE$</td>
</tr>

With this in place comes the code that retrieves the latest records from the DB for each project. It then creates a row for each project and finally publishes the report.

def write_to_confluence():
    with open("pg1.html", "r") as f:
        header = f.read()
    time_now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    html = header.replace("$DATE$", time_now)

    with open("pg2.html", "r") as f2:
        row_template = f2.read()

    client = pymongo.MongoClient(os.environ.get('MONGO_CONNECTION_STRING'))
    db = client["veracode"]
    collection = db["veracode_reports"]
    # we want to report on all top level unique 'name' records
    projects = collection.distinct('name')

    for project in projects:
        query = {"name": project}
        documents = collection.find(query)
        # we want to report on the most recent record, representing the latest scan for each project
        document = documents.sort('value.generation_time', pymongo.DESCENDING).limit(1)[0]
        row = row_template
        row = row.replace('$APP$', document['value']['application_name'])
        value_veryhigh = 0
        value_high = 0
        # aggregate very high and high to a single value
        if 'veryhigh' in document['value']:
            value_veryhigh = int(document['value']['veryhigh'])
        if 'high' in document['value']:
            value_high = int(document['value']['high'])
        row = row.replace('$HI$', str(value_veryhigh + value_high))
        if 'medium' in document['value']:
            row = row.replace('$MEDIUM$', document['value']['medium'])
        else:
            row = row.replace('$MEDIUM$', "0")
        if 'low' in document['value']:
            row = row.replace('$LOW$', document['value']['low'])
        else:
            row = row.replace('$LOW$', "0")
        html += row.replace('$DATE$', document['value']['generation_time'])

    html += '</tbody></table></div>'
    # you can find out the confluence page id by editing the page and looking at the url
    confluence = Confluence("176872938")
    confluence.write_data(html)

 


On line 14, we retrieve the array of unique projects. We want to report on all projects we have data for.  Remember that there are potentially multiple records per project. Each scan will generate a record.  The record will have an iso format generated_date. When we are reporting, we want to report on the latest scan. Line 20 does the sorting job, descending by generated_date with the database doing all the hard work.  Note that the python syntax is a bit different than the mongo syntax you would be using in the console. Getting this right was only possible because I had generated multiple dummy scan records, with different generated_date fields.

Note that the code will aggregate VERY_HIGH and HIGH Veracode issues into high, this happens on line 36. In reality, people only care about Highs and Mediums. We will count the Very High rated issues as high issues. Line 47 just ensures well-formed HTML, and that the program is closing all the tags. Finally, on line 49 we see the program calling the constructor with the confluence page id.

It makes sense to integrate this functionality into the Veracode-analysis python program, and give it a command line option to run in the ‘publish to confluence’ mode, instead of the ‘Veracode scan mode’.  Doing this avoids creating and managing an extra docker container and its image. This change adds the following code:

parser.add_argument("-c", "--confluence-publish", type=bool,
                    help="Run in confluence publish mode (needs connection string in env).",
                    required=False, default=False,
                    action=argparse.BooleanOptionalAction)

We then need to change the main() to conditionally call the write_to_confluence() method. We call this method if the –confluence-publish flag is set as a command line argument.

# Note: for this to work you must have your api credentials
# in the file ~/.veracode/credentials
def main():
    args = check_arguments()
    if args.confluence_publish:
        write_to_confluence(args)
        sys.exit(0)
    application = args.application[0]
    file = args.file[0]
    use_json = args.json
    use_mongo = args.mongo
    app_info = get_application_info(application)
    upload_file(app_info['id'], file)
    build_id = begin_pre_scan(app_info['id'])
    wait_for_report_to_be_ready(app_info['id'], build_id)
    summary_report = get_summary_report(app_info['guid'])
    findings_report = get_findings_report(app_info['guid'])
    print_summary_analysis(summary_report, findings_report, use_json, use_mongo)
    if args.download:
        download_report(build_id, file)
    return 0

One last piece of refactoring is needed; the list of projects is hard-coded.  The program needs to retrieve the list of projects dynamically, and then retrieve them from the database.

Once run, we will get an updated page that looks somewhat like this:

All of this is shaping up, but we now need to package and publish this program as a Docker image, so that it can run as a Kubernetes cronjob.  We will discuss this in the next blog post.

Next blog post: Cron Jobs

Leave a Reply