I had grand aspirations of maintaining a personal blog on a weekly basis, but sometimes that isn’t always possible. I’ve been using my iPad and Working Copy to write posts, but had to use my regular computer to build and publish. CI/CD pipelines help, but I couldn’t find the right security and cost optimizations for my use case…until this year.
My prior model had my blog stored on GitLab because it enabled a free private repository (mainly to hide drafts and future posts). I was building locally using a Docker container and then uploading to Amazon S3 via a script.
At the beginning of the year, GitHub announced free private repositories (for up to 3 contributors), and I promptly moved my repo to GitHub. (NOTE: I don’t use CodeCommit because it’s more difficult to plumb together with Working Copy.)
I was able to now plumb CodePipeline and CodeBuild to build my site, but fell short on deploying my blog to S3. I had to build a Lambda function to extract the artifact and upload to S3. The function is only 20 lines of Python, so it wasn’t difficult.
If you’ve worked on a load balancer, then at some point you’ve been witness to the load balancer taking the blame for an application problem (like a rite of passage). This used to be difficult to exonerate, but with AWS Elastic Load Balancing you can capture Access Logs (Classic and Application only) and very quickly identify whether the load balancer contributed to the problem.
Much like any log analysis, the volume of logs and frequency of access are key to identify the best log analysis solution. If you have a large store of logs but infrequently access them, then a low-cost option is Amazon Athena. Athena enables you to run SQL-based queries against your data in S3 without an ETL process. The data is durable and you only pay for the volume of data scanned per query. AWS also includes documentation and templates for querying Classic Load Balancer logs and Application Load Balancer logs.
This is a great model, but with a potential flaw–as the data set grows in size, the queries become slower and more expensive. To remediate, Amazon Athena allows you to partition your data. This restricts the amount of data scanned, thus lowering costs and increasing speed of the query.
ELB Access Logs store the logs in S3 using the following format:
Since the prefix does not pre-define partitions, the partitions must be created manually. Instead of creating partitions ad-hoc, create a CloudWatch Scheduled Event that runs daily targeted at a Lambda function that adds the partition. To simplify the process, I created buzzsurfr/athena-add-partition.
This project is both the Lambda function code and a CloudFormation template to deploy the Lambda function and the CloudWatch Scheduled Event. Logs are sent from the Load Balancer into a S3 bucket. Daily, the CloudWatch Scheduled Event will invoke the Lambda function to add a partition to the Athena table.
Using the partitions requires modifying the SQL query used in the Athena console. Consider the basic query to return all records: SELECT * FROM logs.elb_logs. Add/append to a WHERE clause including the partition keys with values. For example, to query only the records for July 31, 2018, run:
year = '2018' AND
month = '07' AND
day = '31'
This query with partitions enabled restricts Athena to only scanning
resulting in a significant reduction in cost and processing time.
Using partitions also makes it easier to enable other Storage Classes like Infrequent Access, where you pay less to store but pay more to access. Without partitions, every query would scan the bucket/prefix and potentially cost more due to the access cost for objects with Infrequent Access storage class.
This model can be applied to other logs stored in S3 that do not have pre-defined partitions, such as CloudTrail logs, CloudFront logs, or for other applications that export logs to S3, but don’t allow modifications to the organizational structure.
It’s been over 10 years since I had a blog, or at least maintained one. I want to promote my personal brand but have often not put forth the effort. I have a significant amount of experience, so it’s just a matter of putting my experiences down “on paper”…and having the right tool to publish.
Enter Hugo. I’ve been a fan of Markdown for awhile, and make avid use of it for projects on GitHub or written for mkdocs. I wanted something that could deploy to a static site since my actual code rarely changes, and to save overall costs. My current site is built on GitHub Pages, but does not allow me the necessary capabilities, and I wanted something similar to mkdocs but that I could easily deploy a scaffold and work.
While I often travel with a laptop, I’ve also been looking at my mobile productivity, and I feel that I could accomplish more by using my mobile device. When I have an idea, I want to commit quickly. My tablet is an easy way to do so since it takes up less room and has less time to boot, but has lacked a sufficient productivity tool.
I’ve typed up this post partially using Working Copy. For me, it has the right blend of git integration and file editor (with Markdown syntax highlighting). You can’t push without the in-app purchase, but the free version plus a 10-day trial lets you test before buying, which let me make sure it fits my workflow.
For my blog content, I plan to document my experiences through my IT journey in hopes that it will also help others. I’ve always embraced the IT community, and a blog is my latest way of giving back. I’ve always struggled with trying to get the best structure and methods before pushing something new, and that’s always led to me never launching. This time, I’m accepting that the blog may not be perfect, but it’s out there and functional. I’ll be able to make improvements over time and grow this into a resource for all.