Humble Fail: Is tech DIY worth it?

What happened?

I run a splash site to easily link to all my profile sites (think LinkTree but I own it). The site is built on Astro and uses tailwindcss and Astro Icon.

I was in the process of adding new profile links for D&D Beyond and Roll20. I could run the development server locally and everything worked, but the public site wasn’t updating. I go to check my GitHub Actions logs and find this message:

Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-node@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

I follow the link and read on, the change is as simple as changing my workflow to use v4 and specifying Node.js 20. I push the change, only to find that it broke again. This time it was the astro-icon module. I was running v0.8.1 so I upgraded to v1.1.0 (following instructions), but it kept failing. I spent hours searching for a fix, and finally was going to file a bug report on astro-icon, when I came across their issue template that reads:

✅ I am using the latest version of Astro Icon.
✅ Astro Icon has been added to my astro.config.mjs file as an integration.
✅ I have installed the corresponding @iconify-json/* packages.
✅ I am using the latest version of Astro and all plugins.
✅ I am using a version of Node that Astro supports (>=18.14.1)

Source: https://github.com/natemoo-re/astro-icon/blob/main/.github/ISSUE_TEMPLATE/bug.yml

I’m typically quick to dismiss these, but my days in support meant that I had to run through each of them. I get down to “the latest version of Astro”. I’m running v2.9.7–which sounds like the latest version. Out of curiosity, what’s the latest version?

v4.3.2 🤬

Sure enough, upgrading fixed it.

What did you learn?

There were a few takeaways from my Saturday morning shenanigans:

  1. “Build over buy” still has consequences. I didn’t want to pay Linktree’s subscription, thinking I could maintain the site cheaper. I still think that’s a good choice, but had I used LinkTree then I wouldn’t have lost this Saturday morning. I also chose to build my own because of the level of customizations I wanted/plan to add. I chose “build over buy”, and got burned 🔥 (a little).
  2. “Sharpening the saw” only works if you keep up with it. I chose Astro after seeing someone else’s site built that way (sorry don’t remember who) and I liked the simple use of tags to convey intent, then putting the design of it (which is repeated many times) into a separate file. I don’t use Astro for anything else, and that directly contributed to how long I spent on the issue. I feel that I would have solved this almost immediately had I spent more than 20 minutes every 6 months using Astro.
  3. Corporations go through this on a much larger scale. My problem is IDENTICAL to major corporations who invest in agile development, then ignore the practices. If I spent more time working on this splash site, I would’ve kept it updated and would have built up the experience to know to check for the latest version. When I didn’t–I spent a lot of time trying to figure out why.
  4. Good community hygiene works. I now avoided filing an issue on a project because of Astro Icon’s issue template. I don’t see good issue templates often, but this one was concise and direct…and showed me the problem.

Ultimately, this problem got me thinking about whether DIY in tech is worth it. I don’t think I considered troubleshooting time when I decided to “build”, but I still like the end result and will continue to build my splash site. I’ve also gone back on this blog between Hugo and WordPress (and different vendors). The key is knowing and understanding the tradeoffs, then being able to move when the need arises.

Bootstrapping pi-bernetes: including the wheels

In a previous post, I shared my journey through creating a repeatable build of my homelab cluster using ansible. I can now rebuild Kubernetes anytime I need/want to, but what should I do with it?

Finding my problem while eating humble pie

One idea is to have a locally-hosted all-in-one git service like Gitea. In previous builds, I started installing gitea using a helm chart. I could then forward the port to my local workstation and I had git!

However, I’m not always at that workstation and need to access gitea without necessarily using kubectl, so I opted to create a Load Balancer. K3s does include ServiceLB, but it lacks features and didn’t work out of the box on my network. MetalLB has the support and community, so I went and grabbed that helm chart and installed it. Presto! Now I can support load balancers.

Then, I had to restart a pod–and lost my gitea installation. I didn’t enable persistent storage on my gitea deployment. Well to do that, I need to check the CSI drivers. There’s the default local-path, but that doesn’t allow my pods to move. Since Rancher makes both K3s and Longhorn, I fetched the longhorn helm chart and had persistent storage.

Then I needed to customize Traefik (installed by default) and broke it…

…and wanted to monitor everything, so put prometheus on, and broke it again…

..and there came a point where I questioned whether I was really experienced at Kubernetes at all!1

My problem wasn’t experience or knowledge based, but rather how I had chosen to operate. Every time I rebuilt the cluster, I would say to myself “I should probably automate this–I’ll do it after I build it”…and never go back to it.

I realized that most of my IT career had been spent watching customers and clients install a package to a linux server, or build a new S3 bucket in the AWS console, or apply a schema patch to a database…

…and I had just done the same thing!

My proposed solution had always been the same: just automate it. So I did.

Now, I can completely wipe off k3s from the SBDs, and in one command get it running again.

Attaching the wheels to the frame

With a Kubernetes cluster, I have a frame(work) that I can put widgets on. Like a car can’t go anywhere without wheels (still waiting for my flying car, thanks Back to the Future Part II), my Kubernetes cluster needs some support before I can use it for my true goals. I need MetalLB, a CSI, a customized traefik, etc.

One reason I picked ansible for building the cluster was that I could use it to both deploy the cluster AND the Kubernetes resources. I also considered OpenTofu (not Terraform–here’s why) and had a few other suggestions (which I haven’t really looked at yet). I may go that direction in the future, but borrowing the leadership principle Bias for Action, I picked one and can always change it later.

Bias for Action
Speed matters in business. Many decisions and actions are reversible and do not need extensive study. We value calculated risk taking.

-Amazon Leadership Principles

I started with a basic playbook template to make sure I could query Kubernetes by listing the namespaces in the cluster.

---
- name: Kubernetes Components
  hosts: kubernetes
  gather_facts: false
  tasks:
    - kubernetes.core.k8s_info:
        context: k3s-ansible
        kind: Namespace
      register: ns
    - ansible.builtin.debug:
        var: ns.resources | map(attribute='metadata.name') | list

I have this host entry in my inventory.yaml file as well. This lets me specify kubernetes as the host above.

kubernetes:
  hosts:
    k8s-azeroth:
  vars:
    ansible_connection: local
    ansible_python_interpreter: "{{ansible_playbook_python}}"

As a quick test, I get this output.

PLAY [Kubernetes Components] ******************************************************************************

TASK [kubernetes.core.k8s_info] ***************************************************************************
ok: [k8s-azeroth]

TASK [ansible.builtin.debug] **#***************************************************************************
ok: [k8s-azeroth] => {
    "ns.resources | map(attribute='metadata.name') | list": [
        "kube-system",
        "kube-public",
        "kube-node-lease",
        "default"
    ]
}

PLAY RECAP ************************************************************************************************
k8s-azeroth        : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

I now have an easy mechanism to call the kubernetes API from within my same ansible structure!

Adding the first component – MetalLB

After looking at the MetalLB installation guide, it also supports kustomize, so I tried to setup kustomize through ansible. The task is still kubernetes.core.k8s, but there’s a lookup module specifically for kustomize). The task looks like this:

    - name: Network - MetalLB
      kubernetes.core.k8s:
        state: present
        namespace2: metallb-system
        definition: "{{ lookup('kubernetes.core.kustomize', dir='github.com/metallb/metallb/config/native?ref=v0.13.12' ) }}"
      tags: network

It took some investigation, but the task above is the equivalent of this kubectl command:

kubectl create -n metallb-system -k github.com/metallb/metallb/config/native?ref=v0.13.12

Each task supports tags, which I can use later to only install a certain type of component. In this case, I could limit the tasks to the network tag. While it’s not necessary now, it becomes useful very fast.

MetalLB also takes a little extra configuration, which is provided in the form of CustomResources. In my homelab, I have carved out a specific IP range for the load balancer, and I assign it to this cluster with this task:

    - name: Network - LoadBalancer IP addresses
      kubernetes.core.k8s:
        state: present
        src: ../manifests/metallb/ipaddresspool.yaml
      tags: network

For reference, ipaddresspool.yaml contains:

---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
  - 10.20.40.10-10.20.40.99
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

Alternatively, I can use the full power of the kubernetes.core.k8s module to rearrange and pull files or definitions as necessary. For example, I could combine both files into two ansible tasks, placing the resource definition verbatim under the definition: property.

    - name: IPAddressPool
      kubernetes.core.k8s:
        state: present
        definition:
          apiVersion: metallb.io/v1beta1
          kind: IPAddressPool
          metadata:
            name: default
            namespace: metallb-system
          spec:
            addresses:
            - 10.20.40.10-10.20.40.99
      tags: network
    - name: L2Advertisement
      kubernetes.core.k8s:
        state: present
        definition:
          apiVersion: metallb.io/v1beta1
          kind: L2Advertisement
          metadata:
            name: default
            namespace: metallb-system
      tags: network

This is the flexibility I was looking for, and I’m using the same tool for everything thus far!

Foreward

I’m documenting my complete stack (“eventually”), but I can use this pattern to add different tasks and plays in the same way I’d manage helm charts, resource definitions, or kustomizations. I’d like to try the same setup with terraform or other tools (or to read someone else’s blog about it!), but first I have more components to install before I can put Gitea on my cluster!

  1. Imposter syndrome is real! After all these years, I still feel like an imposter, even if I’ve talked about a topic a hundred times before. You don’t have to know it all–but share what you do know and help someone else learn! ↩︎
  2. Ansible additionally creates the namespace if it does not exist, since it would be required for the state to succeed. ↩︎

Rebuilding pi-bernetes over and over again

While I use my homelab cluster for internal hosting and testing, I also spend significant time fixing and rebuilding it. Since I first posted about building the cluster, I’ve had to stop and rebuild it about 4-5 times. I’ve made various improvements over time and kept them documented in git, but at some point I don’t have a repeatable build for the homelab cluster.

At the same time, my new job has led me to dust off ansible as an operational tool. I’ve used it in the past (and even ran meetups on it), but I hadn’t actually written any playbooks in years. This seemed like a good time to solve both problems at once!

Reacquainting with ansible & building the playbook

I remembered the syntax and logic of ansible, but there were a few changes since I last used ansible. Fortunately, one of those changes included a vscode extension for both ansible and a linter! Most of my past playbooks were for F5 and other network devices. Instead of trying to find a device, I just started to build my inventory using the existing pi cluster and gather facts about the hosts.

I found the official k3s-ansible playbook but didn’t want to start off using it. Ansible does a good job of abstracting away the mechanics and leaves the end user able to declare their intent–but that’s not great for learning. I decided to start from scratch (for now) and create my own playbook based on my current installation with k3sup1. Based on my many installations to this same group of hardware, my current installation script looks like:

k3sup install --host azeroth.local \
  --user pi \
  --ssh-key ~/.ssh/pi_cluster \
  --context azeroth \
--cluster \
--local-path ~/.kube/config \
--merge \
--k3s-extra-args '--flannel-backend=wireguard-native --disable=servicelb --disable=traefik' \
--k3s-version=v1.28.2+k3s1
for host (brokenisles eastking kalimdor northrend pandaria)
  do k3sup join --host ${host}.local --server-host 10.20.40.100 --user pi --ssh-key ~/.ssh/pi_cluster --k3s-version=v1.28.2+k3s1
done

With ansible, you need both a playbook (which has plays and tasks) as well as an inventory file. To keep it simple, I wanted an inventory where I just list the hosts and ansible determines who has the control plane role. (Yes, my theme this time is World of Warcraft worlds/continents–Lok’tar ogar!).

[k3s]
azeroth
eastking
kalimdor
brokenisles
northrend
pandaria

For the playbook, I used the same strategy and just moved the arguments into a new playbook where I ran the commands based on which host.

- name: K3S control plane
  hosts: k3s[0]
  tasks:
    - name: Install K3S
      ansible.builtin.command:
        argv:
          - k3sup
          - install
          - --host={{ ansible_facts['hostname'] }}.local
          - --user
          - pi
          - --ssh-key
          - ~/.ssh/pi_cluster
          - --context
          - azeroth
          - --cluster
          - --local-path
          - ~/.kube/config
          - --merge
          - --k3s-extra-args
          - '--flannel-backend=wireguard-native --disable=servicelb --disable=traefik'
          - --k3s-version=v1.28.2+k3s1
      delegate_to: localhost
    - name: Record control plane IP
      ansible.builtin.set_fact:
        server_host: "{{ ansible_facts['default_ipv4']['address'] }}"
- name: K3S worker plane
  hosts: k3s[1:]
  tasks:
    - name: Host and IP (debug)
      ansible.builtin.debug:
        msg: "{{ ansible_facts['hostname'] }}: {{ ansible_facts['default_ipv4']['address'] }}"
    - name: Install K3S
      ansible.builtin.command:
        argv:
          - k3sup
          - join
          - --host={{ ansible_facts['hostname'] }}.local
          - --server-host {{ server_host }}
          - --user
          - pi
          - --ssh-key
          - ~/.ssh/pi_cluster
          - --k3s-version=v1.28.2+k3s1
      delegate_to: localhost

Breaking it down, this playbook repeats my custom installation, but wraps it around with ansible. It’s not ideal but did give me enough exposure to ansible (again) to move on to my goal: using the k3s-ansible playbook.

Adding k3s-ansible to the project

After nuking the cluster once again…I was able to clone the project, change my inventory to match the new format, and got the cluster up and running again pretty easily! I then tried moving the playbook into my homelab folder, ran it…and it broke!

I had copied the playbooks, but not the roles, and I had to get the directory structure in proper order. I also knew that by copying files from the project, I’d lose any updates made to the public repo. I wanted to pull updates down, so I instead imported the repo as a submodule and then symlinked the folders I needed to the right spot.

I wanted to hide the submodule(s) (anticipating more for this pattern) and be able to symlink the parts I need from a hidden folder. Thus, I created the folder .submodules and added the submodule to that folder.

mkdir .submodules
git submodule add https://github.com/k3s-io/k3s-ansible.git .submodules/k3s-ansible
git submodule init
mkdir playbooks

For the playbooks, I wanted a place where I could pull in the submodule playbooks but also store and create my own. I anticipate needing to add a few things to the cluster immediately after it’s built (LoadBalancerClass, CSI, etc.) and I want a singular playbook folder at the root of the project.

mkdir playbooks
ln -s .submodules/k3s-ansible/playbooks playbooks/k3s-cluster

I need the roles to make the playbooks work, but wanted to carry them over individually in case I add roles of my own.

mkdir roles
for role in $(ls .submodules/k3s-ansible/roles/)
do
    ln -s .submodules/k3s-ansible/roles/$role roles/$role
done

I had to redirect the role and inventory lookup to the root of the project. I also enabled caching for my inventory–for what I do, it doesn’t hurt.

[defaults]
roles_path = ./roles
inventory  = ./inventory.yaml
fact_caching = jsonfile
fact_caching_connection = ~/.ansible/cache

A repeatable working cluster

With all this done, I can run ansible-playbook playbooks/k3s-cluster/site.yaml and off we go!

PLAY [Cluster prep] ***************************************************************************************

TASK [Gathering Facts] ************************************************************************************
ok: [kalimdor]
ok: [northrend]
ok: [eastking]
ok: [brokenisles]
ok: [azeroth]
ok: [pandaria]

...

PLAY RECAP ************************************************************************************************
azeroth            : ok=31   changed=6    unreachable=0    failed=0    skipped=46   rescued=0    ignored=0
brokenisles        : ok=20   changed=3    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0
eastking           : ok=20   changed=3    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0
kalimdor           : ok=20   changed=3    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0
northrend          : ok=20   changed=3    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0
pandaria           : ok=20   changed=3    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0

There’s still work to do. I need to add all the components and operators that I plan to use, and to also put my services back in a reusable (and backed up) format. Stay tuned!2

  1. I used k3sup to build this cluster before (and during). I still think it’s a great project and makes it easy for someone playing around to get started. My needs have changed, and thus k3sup isn’t optimal for me right now. ↩︎
  2. …assuming I actually write those blog posts! Encouragement helps! ↩︎

What’s Your Exit Strategy?

Why are we afraid of “lock in”? Typically we hear the term and automatically assume it’s bad. It certainly can be, but doesn’t mean that every situation you’re in is a bad one.

On February 8, 2019, I gave an Ignite talk regarding Exit Strategies and “lock in” at DevOpsDays Charlotte. We broke down “lock in” and the varying degrees of it, then talked about how you can use it to your advantage by having an Exit Strategy (which is exactly as it sounds).

“lock in” isn’t exclusive to technology–what about your current employer?