Add Athena Partition for ELB Access Logs
@ Theo | Tuesday, Jul 31, 2018 | 3 minutes read | Update at Tuesday, Jul 31, 2018

If you’ve worked on a load balancer, then at some point you’ve been witness to the load balancer taking the blame for an application problem (like a rite of passage). This used to be difficult to exonerate, but with AWS Elastic Load Balancing you can capture Access Logs (Classic and Application only) and very quickly identify whether the load balancer contributed to the problem.

Much like any log analysis, the volume of logs and frequency of access are key to identify the best log analysis solution. If you have a large store of logs but infrequently access them, then a low-cost option is Amazon Athena. Athena enables you to run SQL-based queries against your data in S3 without an ETL process. The data is durable and you only pay for the volume of data scanned per query. AWS also includes documentation and templates for querying Classic Load Balancer logs and Application Load Balancer logs.

This is a great model, but with a potential flaw–as the data set grows in size, the queries become slower and more expensive. To remediate, Amazon Athena allows you to partition your data. This restricts the amount of data scanned, thus lowering costs and increasing speed of the query.

ELB Access Logs store the logs in S3 using the following format:

s3://bucket[/prefix]/AWSLogs/{{AccountId}}}}/elasticloadbalancing/{{region}}/{{yyyy}}/{{mm}}/{{dd}}/{{AccountId}}_elasticloadbalancing_{{region}}_{{load-balancer-name}}_{{end-time}}_{{ip-address}}_{{random-string}}.log

Since the prefix does not pre-define partitions, the partitions must be created manually. Instead of creating partitions ad-hoc, create a CloudWatch Scheduled Event that runs daily targeted at a Lambda function that adds the partition. To simplify the process, I created buzzsurfr/athena-add-partition.

This project is both the Lambda function code and a CloudFormation template to deploy the Lambda function and the CloudWatch Scheduled Event. Logs are sent from the Load Balancer into a S3 bucket. Daily, the CloudWatch Scheduled Event will invoke the Lambda function to add a partition to the Athena table.

Using the partitions requires modifying the SQL query used in the Athena console. Consider the basic query to return all records: SELECT * FROM logs.elb_logs. Add/append to a WHERE clause including the partition keys with values. For example, to query only the records for July 31, 2018, run:

SELECT *
FROM logs.elb_logs
WHERE
  (
    year = '2018' AND
    month = '07' AND
    day = '31'
  )
 

This query with partitions enabled restricts Athena to only scanning s3://bucket/prefix/AWSLogs/{{AccountId}}/elasticloadbalancing/{{region}}/2018/07/31/ instead of s3://bucket/prefix/AWSLogs/{{AccountId}}/elasticloadbalancing/{{region}}/, resulting in a significant reduction in cost and processing time.

Using partitions also makes it easier to enable other Storage Classes like Infrequent Access, where you pay less to store but pay more to access. Without partitions, every query would scan the bucket/prefix and potentially cost more due to the access cost for objects with Infrequent Access storage class.

This model can be applied to other logs stored in S3 that do not have pre-defined partitions, such as CloudTrail logs, CloudFront logs, or for other applications that export logs to S3, but don’t allow modifications to the organizational structure.

About Me

Self-described technology enthusiast working with containers, DevOps, networking, load balancing, etc.

Career

After college, I came back to the family business, this time to force-feed technology into the business instead of passing around a QuickBooks file and design templates on a Zip disk. This ended up as a good trade–I was able to both freely learn and implement new(er) technology and gain powerful business experience. I am fully capable of explaining any technical topic to a non-technical audience. I taught my mother about files/folders on a hard disk by showing her the files and folders in her file cabinets.

I spent a short time as a law firm doing more of the same, but wanted more. I joined a state-level government agency and began to specialize in networking. I quickly moved through the ranks moving from Junior to Senior status, and spent a few years as a Network Manager. I dove into “network service” technologies and tools like load balancing, name resolution, monitoring, logging, and analysis. My success there came from four principles:

  • Work with the customer–ensure your decisions are for their benefit.
  • The borders of your responsibility are soft–learn about how your department affects other departments. A little cross-team knowledge goes a long way.
  • Don’t waste time repeating processes–if you’ll repeat it, script it and let the system work for you.
  • Automate yourself out of a job–if you do, they’ll give you a better one.

College

I went to Florida Institute of Technology in Melbourne, FL, USA and received my Bachelor of Science in Computer Science. While the degree is a great résumé builder, the knowledge and experience gained were much more valuable.

We didn’t just focus on learning a programming language–we learned WHY a language was developed and what separates it from others. Concepts were more important, because that led to a language-independent programming skill. As a result, I can now write code in any language.

I also got a taste at other IT-related skills. The program provided enough electives for us to branch out and “test the waters” around different disciplines. As a result, I got a breadth of skills to help complement my degree: cryptography, computer vision, system administration, OS concepts, database design, etc.

My senior project was a collaboration between Aerospace, Mechanical, Computer, Electrical, and Software Engineers. We built a scale model of a V-22 “Osprey” with a design for mid-air transition while carrying heavy cargo. Since it was a scale model, we also used a wireless serial transmitter and ground interface to control the osprey using a Radio Controller hooked up to a computer screen. My job was the GUI/software for the Flight Control System and interface as well as the scripts to perform the advanced aeronautical calculations. It was a great team experience that further expanded my breath of skills and abilities.

Moving

My family and I wanted to move from Tallahassee, FL, USA, to Charlotte, NC, USA and we got the opportunity when I was offered a Network Engineer position with an insurance company that had a regional headquarters in Charlotte. I joined the Network Services team and found my passion for improving processes through orchestration/automation. I also got my first taste of cloud and cloud networking, which required a new education on cloud networking. For many years, I had watched other network professionals accelerate their knowledge and experience on networking to a point, then stick with that knowledge until otherwise forced to change. I realized that I’m not an “old school” networker, as I think being an expertise in networking doesn’t mean knowing every command in a CLI. Cloud networking is different, and requires a new way of architecting–traditional networking tools only work until the cloud border. Ultimately, I spent a short time at the insurance company because I was recruited by Amazon.

The Early Years

Ask my mother, and I was always going to work in technology. At age 5, I set the clock on the VCR and programmed it to record my shows.

My family owns a swimming pool contracting business in Tallahassee, FL, USA and I spent my childhood and teenage years learning how to run a business. Technology was a hobby, and I had fun exploring building my own gaming rig, writing plugins for software, and begrudgingly providing free technical support to friends and family.