COVID Updates using AWS Lambda and SMS

After completing my AWS Solution Architect certification, I decided to put AWS to use and create a function that would send daily SMS updates of new COVID case counts in my area (BC, Alberta, and Canada wide). An AWS serverless Lambda function seemed to be a good match, as the function only needed to run once daily. Spinning up a container or sever just to execute a small function was overkill, and this would be a good use case to test Lambda in action.

The Initial Concept:

Application Model (Archimate)

Scheduled trigger: A CloudWatch/EventBridge event can be scheduled to kick off the Lambda function daily. The trigger options are a specific time of day or a file update in S3.

Lambda serverless function: The Lambda function has several language options and I chose Python as the preferred language in case I needed to screen scrape a webpage, do data filtering, or use other Python data libraries.

Destination: The primary output will be be a SMS message to a small number of phones (friends and family). A second nice-to-have option is to also send an email. The message will be a brief sentence showing the new COVID case numbers.

Message format: “COVID Update: 07-15-2020 New BC Cases: 161, Tested: 7310. New Alberta Cases: 122, Tested 2387. New Canada cases: 2124, Total Tested: 67153”

Data source: The ideal data source is a reliable, official government data feed with data for each province and a summary for Canada. A web service or a data file is the preferred interface. If not available, possible other options are a newspaper or open source data file. Finally, if no file or web API data source is available, check if the data can be screen scraped from a web page. The COVID data is available on provincial websites in different formats, but ideally the data should come from one common source to simplify the data gathering code required to parse the data.

Robustness: The application is not critical, as the same information can be accessed from existing websites.

Cost: This is a personal project, so low cost! AWS Lamdba and SNS are extremely inexpensive for this volume and may be within the free tier.

The Search for Data Sources:

Note: This project was completed on August 5th 2020 and used the best available data source at the time. It’s possible that better data sources or even complete COVID SMS message services will be available in the future.

The Canadian government provides open data services to its citizens at no cost on several sites, so that’s where my search started.

  1. Statistics Canada has a variety of data available as file downloads and via their web service interface. I started at the developer page (https://www.statcan.gc.ca/eng/developers) and then back to their homepage to see if daily COVID data was available. They have raw data on every case identified in Canada, and it can be downloaded as a partial CSV, complete SDMX, or using their Web Services interface. Though this could be very useful on a future data science project, it did not show new cases per day, only the ‘episode week’ for each case. That doesn’t meet our goal of reporting new cases per day. The data also wasn’t in ideal format and would requires a library like python pandas to summarize new cases by week and region.
Case identifier number6Region 7Episode week 8 9Gender 10Age group 11Occupation 12Asymptomatic 13Onset week of symptoms 9 14Hospital status 15Recovered 16Recovery week 9 17Death 18Transmission 19
1219284919329911
2325131925919921
3432119232913429
 Table 13-10-0781-01 Detailed preliminary information on confirmed cases of COVID-19 (Revised), Public Health Agency of Canada

2. Canada.ca was the next Canada Government website to check.

A search on COVID brought me to the COVID 19 Epidemiological Summary page that has infographics and a table of the new cases by province. That’s what I need, but the page doesn’t provide a link to the raw data.

LocationNew casesNew deathsNumber of people tested
Canada1,6851962,178
British ColumbiaN/AN/A5,160
AlbertaN/AN/AN/A
https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html

4. Various sites: I continued to search for the information I needed as a raw data file or web service, but did not find anything better than theCanada.ca page. The US CDC data source is US only, a covid19tracker.ca was a University of Saskatchewan student project, covid19statistics.org did not have the detailed Canada data, and several Canada newspaper websites had COVID-19 data but not in the format I wanted.

5. John Hopkins GITHUB has multi-country data that is available in CSV format:

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

This has total cases by date, so the new cases could be calculated by subtracting current count minus previous day count. Canada total would have to be calculated by finding the delta for each province/territory and summing. The file does not have total tested. Close, but not ideal. A search on the homepage revealed that the Canada is sourced from the Canadian government (likely canada.ca) and looking at the file dates shows there is a delay updating the file. Going directly to the source is better.

6. The Alberta and BC governments have web pages with data for their province, but this would required code for each page and for all Canada.

Deciding when to end a data search is not an exact science. If you miss a good data source, it can take longer to code for an imperfect data source and can cause the application to break if the data source changes.

Based on my searching, Canada.ca was the best data source. Pros: Official government site, exact data I needed, prompt updates, referenced by other sites as their Canadian data source. Cons: It’s a webpage that needs to be scraped instead of a web service or data file.

Decision: Pull data from the Canada.ca webpage. Screen scrape from the displayed table if needed using a python package.

Examining the Data Source (Canada.ca web page)

Python has an excellent web scraping package called BeautifulSoup. The technique actually predates the web. I used HLLAPI with MS Basic to scrape mainframe 3270 screens in 1989, and throughout the 90s when designing client-server applications. The technique has been criticized because a change to the source webpage or screen often breaks the data extraction and the app. This can be overcome by coordinating testing with the source system developers, which we did when scraping mainframe screens. It’s not usually possible though for a 3rd party webpage.

Viewing the page source:

The Canada.ca COVID webpage has the following HTML table source:

        <table id="newCases" class="table table-bordered table-striped">
            <caption class="text-left">
                <!-- No date wrap -->
                Table 1. Daily* change in the number of cases, deaths and people tested, by location in Canada as of <span class="updateTime"></span>
            </caption>
            <thead>
                <tr class="bg-primary">
                    <th scope="col">Location</th>
                    <th scope="col">New cases</th>
                    <th scope="col">New deaths</th>
                    <th scope="col">Number of people tested</th>
                </tr>
            </thead>
            <tbody class="nationalNumbers-data">
            </tbody>
        </table>

This shows us the table is not populated by the backend, but likely by javascript on the frontend. That’s good because the javascript may be reading a raw data file that we can use. A further check on the webpage reveals that the interactive map has a ‘download CVS’ button with a link to a csv file:

https://health-infobase.canada.ca/src/data/covidLive/covid19.csv

pruid,prname,prnameFR,date,numconf,numprob,numdeaths,numtotal,numtested,numrecover,percentrecover,ratetested,numtoday,percentoday,ratetotal,ratedeaths,deathstoday,percentdeath,testedtoday,recoveredtoday,percentactive,numactive,rateactive,numtotal_last14,ratetotal_last14,numdeaths_last14,ratedeaths_last14

59,British Columbia,Colombie-Britannique,31-07-2020,3641,0,195,3641,224580,3168,87.01,44284,50,1.39,71.80,3.85,1,5.36,2853,13,7.64,278,5.48,443,8.74,6,0.12
48,Alberta,Alberta,31-07-2020,10843,0,196,10843,571839,9261,85.41,130816,127,1.19,248.05,4.48,1,1.81,5785,148,12.78,1386,31.71,1624,37.15,29,0.66
47,Saskatchewan,Saskatchewan,31-07-2020,1319,0,18,1319,85228,1008,76.42,72568,13,1.00,112.31,1.53,0,1.36,1490,24,22.21,293,24.95,383,32.61,3,0.26

That’s the exact raw data we need! We can forget the web scraping and go directly to the CSV file.

Writing the Python Lambda function:

I started with a typical Python data science stack that included pandas to read the CSV file and parse the data. Lambda by default does not include the pandas library, so this either needed to be installed (!conda install pandas, or !pip3 install pandas) or a Lamdba layer added that already had pandas installed. The Lamdba layer is the better option, but after looking at the CSV, I decided to simplify even further and just use the standard python io package.

from __future__ import print_function

import json
import urllib3
import boto3
import io
from datetime import date
from datetime import timedelta
from datetime import datetime

print('Loading message function...')

def process_covid_stats():
    http = urllib3.PoolManager()
    r = http.request('GET', 'https://health-infobase.canada.ca/src/data/covidLive/covid19.csv', preload_content=False)
    r.auto_close = False
    resultsBC = ""
    resultsAB = ""
    resultsCA = ""
    #todayDMY = (date.today()).strftime("%d-%m-%Y") #UTC
    todayDMY = (datetime.now() - timedelta(hours=7)).strftime("%d-%m-%Y") #Pacific not UTC
    print("Today is:" , todayDMY)

    for line in io.TextIOWrapper(r):
        #print(line)
        row = list(line.split(","))
        if (row[3] == todayDMY ):
            #print("Record is for today " , row[3] , ". Process it.")
            if row[0] == '1':
                resultsCA = " New Canada Cases: " + row[12] + ", Total Tested: " + row[18] + "."
                print(resultsCA)
            elif row[0] == '48':
                resultsAB = " New Alberta Cases: " + row[12] + ", Tested: " + row[18] + "."
                print(resultsAB)
            elif row[0] == '59':
                resultsBC = " New BC Cases: " + row[12] + ", Tested: " + row[18] + "."
                print(resultsBC)

    print( "Results: " , resultsBC, resultsAB, resultsCA)
    r.close()
    results = resultsBC + resultsAB + resultsCA
    #Only return results if today's data is found. No return = no message sent.
    if( len(results) > 0 ):
        results = todayDMY + results
    return(results )
    
def send_covid_daily_stats(event, lambda_context):
    print('Invoking covid daily stats')
    sns = boto3.client('sns')
    sns.publish(
        TopicArn="arn:aws:sns:us-west-2:<id removed for security reasons>:covid-sns-topic",
        Subject="COVID Update",  #note this doesn't work for SMS.
        Message= process_covid_stats()
    )
    return ('Sent a message to an Amazon SNS topic.')

#Function to check environment libraries/packages. Invoke by changing Lambda handler.
def check_env(event, lambda_context):
    from pkg_resources import working_set
    for p in list(working_set):
        print(p.project_name, p.version)

Now two more steps are needed: A trigger event and an AWS SNS message definition.

AWS EventBridge/CloudWatch event: Create an event rule called ‘covid-daily-event’ and set it to run by event schedule using a chron expression ‘cron(59 23 ? * MON-FRI *)’ , which is just before midnight UTC or 4:59pm West Coast. (The Canada.ca file is supposed to updated at 4pm, but tends to be later.) The target is a Lambda function, and the function is ‘sns_covid_update’.

AWS Simple Notification Service (SNS): Create a new SNS topic called ‘covid-sns-topic’, then create a subscription for each phone number you want to send a SMS to, and each email address you want to send to.

There are several security steps required for Lambda, EventBridge, and SNS that are covered in the AWS documentation and sample code. To test the Lambda function, open the Lambda ‘sns_covid_update’ page and use the dropdown beside the ‘Test’ button to create a new test event. Name it anything and leave the default json key1-3 example, since we aren’t passing any data and just want to kick off the Lambda function.

I currently have the SNS call in the code but could make it a destination on here instead. The destination can be configured differently based on the Lambda function success or failure.

The Lambda function has been running since August. The only issue I have seen is the data file has not been updated at 4pm Pacific time as described on the Canada website. A polling option to reschedule the function on failure would work, but I instead just changed the execution time to 4:59pm.

That’s it! I plan to use Lambda again for a more complex solution in the future, with a deployment using CloudFormation. In the meantime, the daily SMS messages show that Lambda is a reliable and inexpensive cloud solution.

Leave a Comment