k3s Behind Nginx

2024-04-06

Here on illuminatedcomputing.com I’ve got a bunch of sites served by nginx, but I’d like to run a little k3s cluster as well. The main benefit would be isolation. That is always helpful, but it especially matters for staging sites for some customers who don’t update very often.

Instead of migrating everything all at once, I want to keep my host nginx but let it reverse proxy to k3s for sites running there. Then I will block direct traffic to k3s, so that there is only one way to get there. I realize this is not really a “correct” way to do k8s, but for a tiny setup like mine it makes sense. Maybe I should have just bought a separate box for k3s, but I find pushing tools a bit like this is a good way to learn how they really work, and that’s what happened here.

It was harder than I thought. I found one or two people online seeking to do the same thing, but there were no good answers. I had to figure it out on my own, and now maybe this post will help someone else.

The first step was to run k3s on other ports. I’m using the ingress-nginx ingress controller via a Helm chart. In my values.yaml I have it bind to 8080 and 8443 instead:

ingress-nginx:
  controller:
    enableHttp: true
    enableHttps: true
    service:
      ports:
        http: 8080
        https: 8443

Then I can see the Service is using those ports:

paul@tal:~/src/illuminatedcomputing/k8s$ k get services -A
NAMESPACE         NAME                                         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                         AGE
ingress           ingress-ingress-nginx-controller             LoadBalancer   10.43.91.109    107.150.34.82   8080:31333/TCP,8443:30702/TCP   7d20h
...

Setting up nginx to reverse proxy was also no problem. For example here is a private docker registry I’m running:

server {
  listen 443 ssl;
  server_name docker.illuminatedcomputing.com;

  ssl_certificate ssl/docker.illuminatedcomputing.com.crt;
  ssl_certificate_key ssl/docker.illuminatedcomputing.com.key;

  location / {
    proxy_pass https://127.0.0.1:8443;
    proxy_set_header Host "docker.illuminatedcomputing.com";
  }
}

server {
  listen 80;
  server_name docker.illuminatedcomputing.com;

  location / {
    proxy_pass http://127.0.0.1:8080;
    proxy_set_header Host "docker.illuminatedcomputing.com";
  }
}

The only tricky part is the ssl cert. I already had the cluster built to get certs from LetsEncrypt with cert-manager. So I have a little cron script that pulls out the k8s Secret and puts it where the host nginx can find it:

#!/bin/bash

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

# docker.illuminatedcomputing.com
kubectl get secret -n docker-registry docker-registry-tls -o json | jq -r '.data["tls.crt"] | @base64d' > /etc/nginx/ssl/docker.illuminatedcomputing.com.crt
kubectl get secret -n docker-registry docker-registry-tls -o json | jq -r '.data["tls.key"] | @base64d' > /etc/nginx/ssl/docker.illuminatedcomputing.com.key

Probably it would be easier to run certbot on the host and push the cert into k8s (or just terminate TLS), but using cert-manager is what I’d do for a customer, and I’m hopeful that eventually I’ll drop the reverse proxy altogether.

So at this point connecting works:

curl -v https://docker.illuminatedcomputing.com/v2/_catalog

(Of course it will be a 401 without the credentials, but you are still getting through to the service.)

The problem is that this works too:

curl -v https://docker.illuminatedcomputing.com:8443/v2/_catalog

So how can I block that port from everything but the host nginx? I tried making the controller bind to just 127.0.0.1, e.g. with this config:

ingress-nginx:
  controller:
    config:
      bind-address: "127.0.0.1"
    enableHttp: true
    enableHttps: true
    service:
      externalIPs:
        - "127.0.0.1"
      ports:
        http: 8080
        https: 8443

The bind-address line adds to a ConfigMap used to generate the nginx.conf. It doesn’t work though. The 127.0.0.1 is from the perspective of the controller pod, not the host 127.0.0.1.

Using externalIPs (with or without bind-address) also fails. When I add those two lines k3s gives this error:

Error: UPGRADE FAILED: cannot patch "ingress-ingress-nginx-controller" with kind Service: Service "ingress-ingress-nginx-controller" is invalid: spec.externalIPs[0]: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8, ::1/128)

So I gave up on that approach.

But what about using iptables to block 8443 and 8080 from the outside? That’s probably simpler anyway—although k3s adds a big pile of its own iptables rules, and diving into that was a bit intimidating.

The first thing I tried was putting a rule at the top of the INPUT chain. I tried all these:

iptables -I INPUT -p tcp \! -s 127.0.0.1 --dport 8443 -j DROP
iptables -I INPUT -p tcp \! -i lo --dport 8443 -j DROP
iptables -I INPUT -p tcp -i enp2s0 --dport 8443 -j DROP

But none of those worked. I could still get through.

At this point a friend asked ChatGPT for advice, but it wasn’t very helpful. It told me

Instead of having the ingress controller listen on an external IP or trying to make it listen only on 127.0.0.1, configure your host’s nginx to proxy_pass to your k3s services.

Yes, I had explained I was doing that. Also:

You could create a network policy that only allows traffic to the ingress-nginx pods from within the cluster itself.

But that will block the reverse proxy too.

So the cyber Pythia was not coming through for me. I was going to have to figure it out on my own. That meant coming to grips with all the rules k3s was installing.

I started with adding some logging, for example:

iptables -I INPUT -p tcp -d 107.150.34.82 -j LOG --log-prefix '[PJPJPJ] '

Tailing /var/log/syslog, I could see messages for 443 requests, but nothing for 8443!

So I took a closer look at the nat table (which is processed before the filter table), and I found some relevant rules:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

-A KUBE-EXT-2ZARXDYICCJUF4UZ -m comment --comment "masquerade traffic for ingress/ingress-ingress-nginx-controller:https external destinations" -j KUBE-MARK-MASQ
-A KUBE-EXT-2ZARXDYICCJUF4UZ -j KUBE-SVC-2ZARXDYICCJUF4UZ
-A KUBE-EXT-DBDMS67BVV2C2LTP -m comment --comment "masquerade traffic for ingress/ingress-ingress-nginx-controller:http external destinations" -j KUBE-MARK-MASQ
-A KUBE-EXT-DBDMS67BVV2C2LTP -j KUBE-SVC-DBDMS67BVV2C2LTP

-A KUBE-SEP-RQCBIXXO7M53R2WC -s 10.42.0.42/32 -m comment --comment "ingress/ingress-ingress-nginx-controller:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-RQCBIXXO7M53R2WC -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https" -m tcp -j DNAT --to-destination 10.42.0.42:443
-A KUBE-SEP-TXLMBMTNQTOOKDI3 -s 10.42.0.42/32 -m comment --comment "ingress/ingress-ingress-nginx-controller:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-TXLMBMTNQTOOKDI3 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http" -m tcp -j DNAT --to-destination 10.42.0.42:80

-A KUBE-SERVICES -d 107.150.34.82/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https loadbalancer IP" -m tcp --dport 8443 -j KUBE-EXT-2ZARXDYICCJUF4UZ
-A KUBE-SERVICES -d 107.150.34.82/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http loadbalancer IP" -m tcp --dport 8080 -j KUBE-EXT-DBDMS67BVV2C2LTP

-A KUBE-SVC-2ZARXDYICCJUF4UZ ! -s 10.42.0.0/16 -d 10.43.91.109/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https cluster IP" -m tcp --dport 8443 -j KUBE-MARK-MASQ
-A KUBE-SVC-2ZARXDYICCJUF4UZ -m comment --comment "ingress/ingress-ingress-nginx-controller:https -> 10.42.0.42:443" -j KUBE-SEP-RQCBIXXO7M53R2WC
-A KUBE-SVC-DBDMS67BVV2C2LTP ! -s 10.42.0.0/16 -d 10.43.91.109/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http cluster IP" -m tcp --dport 8080 -j KUBE-MARK-MASQ
-A KUBE-SVC-DBDMS67BVV2C2LTP -m comment --comment "ingress/ingress-ingress-nginx-controller:http -> 10.42.0.42:80" -j KUBE-SEP-TXLMBMTNQTOOKDI3

If you follow how that bounces around, it eventually gets rerouted to 10.42.0.42, either :443 or :80. So that’s why a connection to 8443 never hits the INPUT chain.

So the solution was to drop the traffic in the nat table instead:

root@www:~# iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8443 -j DROP
iptables v1.8.4 (legacy):
The "nat" table is not intended for filtering, the use of DROP is therefore inhibited.

Oops, just kidding!

But instead I can just tell 8080 & 8443 to skip all the k3s rewriting:

iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8443 -j RETURN
iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8080 -j RETURN

Now those do show up on the INPUT chain, but I don’t even need to DROP them there. There is nothing actually listening on those ports. The controller is still binding to 443 and 80, and k3s is using iptables trickery to reroute connections to those ports. So those two lines above are sufficient, and someone connecting directly gets a Connection refused.

That’s it! I hope this is helpful or you at least enjoyed the story.

k3s Behind Nginx

Cozy Toes

2024-03-03

The kids and I love to play games on the weekend. ~~Our~~ My favorite is Agricola, a board game about farming in the Middle Ages. We also play a lot of matching games. Saturday morning Elsa invented her own matching game using the cards from Abandon All Artichokes. Then I made up a matching game too. I called it “Cozy Toes”.

cozy toes

It was a money game. Whoever got the most pairs of matching socks won, and everyone else owed that person five cents for each pair they were short.

I explained how you had to compare each sock to each other sock, which meant the work to find one sock’s match was as big as all the other socks. The work for all the socks—say there were n of them—was n times n. So if you let the pile get too big, you have a lot of work to do.

I suppose with several workers—say m of them—the work was less, but I don’t know for sure what it was. In practice it is a hard job for many workers to share. In our own game there was a lot of contention.

Later when they thought they had matched all the socks they could, and only odd singles remained, I asked how they could be sure. They decided to sort the socks from longest to shortest. Then they could see that there were no more matches. But they still compared every sock to every other.

unmatched socks

If we keep playing this game maybe I will teach them how to bubble sort.

Cozy Toes

Temporal PKs Merged!

2024-01-24

Today first thing in the morning I saw that the first part of my temporal tables work for Postgres got merged. It was two patches actually: a little one to add a new GiST support function and then the main patch adding support for temporal primary keys and unique constraints based on range types. The support for SQL:2011 PERIODs comes later; for now you must use ranges—although in my opinion that is better anyway. Also this patch allows multiranges or, keeping with Postgres’s long history of extensibility, any type with an overlaps operator. So unless some big problem appears, PKs and UNIQUE constraints are on track to be released in Postgres 17.

Probably I can get (basic) foreign keys into v17 too. Temporal update/delete, foreign keys with CASCADE, and PERIODs will more likely take ’til 18.

If you are interested in temporal features, early testing is always appreciated! :-)

Getting this into Postgres has been a ten-year journey, and the rest of this post is going to be a self-indulgent history of that work. You’ve been warned. :-)

It started in 2013 when I kept noticing my clients needed a better way to track the history of things that change over time, and I discovered Richard Snodgrass’s book Developing Time-Oriented Database Applications in SQL. He offered a rigorous, systematic approach, with working SQL solutions for everything. This was exactly what I needed. His approach was vastly better than the ad hoc history-tracking I’d seen so far. But no one had implemented any of it!

My first Postgres patch in 2015 was motivated by temporal databases: I added UUID support to the btree_gist extension. A temporal primary key is basically an exclusion constraint on (id WITH =, valid_at WITH &&), and I had a project with UUID ids. But that exclusion constraint requires a GiST index that knows how to perform equal comparisons against the id column and overlap comparisons against the valid_at column. Out-the-box GiST indexes can’t do that (unless your ids are something weird like range types). If your ids are integers, you can install btree_gist to create a GiST opclass that knows what integer = means, but at the time UUIDs were not supported. So I started there. I liked that temporal databases had a manageable feature set and a manageable body of literature, so that even a working programmer like me could break new ground (not like Machine Learning or even Time Series databases). Nonetheless that patch took a year and a half to get committed, and it was really other people like Chris Bandy who finished it.

I kept reading about temporal databases, and in 2017 I wrote a proof-of-concept for temporal foreign keys, mostly at AWS Re:Invent. I happened to be given a free registration & hotel room, but it was too late to register for any of the good talks. But all that time with nothing to do was fantastically productive, and I remember by the flight home I was adding tons of tests, trying to cover every feature permutation—ha, as if. A few days after I returned I also published my annotated bibliography, which I’ve updated many times since.

In Snodgrass a temporal foreign key is a page-and-a-half of SQL, mostly because a referencing row may need more than one referenced row to completely cover its time span. But I realized we could make the check much simpler if we used an aggregate function to combine all the relevant rows in the referenced table first. So I wrote range_agg, first as an extension, then as a core patch. Jeff Davis (who laid the foundation for temporal support with range types and exclusion constraints) said my function was too narrow and pushed me to implement multiranges, a huge improvement. Again it took a year and a half, and I had trouble making consistent progress. There was a lot of work at the end by Alvaro Herrera and Alexander Korotkov (and I’m sure others) to get it committed. That was a few days before Christmas 2020.

Although the Postgres review process can take a long time, I cherish how it pushes me to do better. As a consultant/freelancer I encounter codebases of, hmm, varying quality, and Postgres gives me an example of what high standards look like.

One thing I still remember from reading Programmers at Work many years ago was how many inteviewees said they tried to build things at a higher level of abstraction than they thought they’d need. I’ve seen enough over-engineered tangles and inner-platform effects that my own bias is much closer to YAGNI and keeping things concrete, but the advice in those interviews still prods me to discover good abstractions. The Postgres codebase is full of things like that, and really it’s such a huge project that strong organizing ideas are essential. Multiranges was a great example of how to take a concrete need and convert it into something more general-purpose. And I thought I was doing that already with range_agg! I think one thing that makes an abstraction good is a kind of definiteness, something opinionated. So it is not purely general, but really adds something new. It always requires an act of creation.

The coolest thing I’ve heard of someone doing with multiranges was using them in astronomy to search for neutrinos, gravitational waves, and gamma-ray bursts. By using multiranges, they were able to compare observations with maps of the night sky “orders of magnitude faster” than with other implementations. (Hopefully I’ve got that right: I read a pre-print of the paper but it was not all easy for me to understand!)

My first patch for an actual temporal feature was primary keys back in 2018. Then foreign keys followed in 2019, just a couple weeks before I gave a talk at PgCon about temporal databases. By the end of the year I had FOR PORTION OF as well. At first FOR PORTION OF was implemented in the Executor Phase, but when I gave a progress report for PgCon 2020 I was already working on a trigger-based reimplementation, though it wasn’t submitted until June 2021. I also pulled in work by Vik Fearing from 2018 to support ADD/DROP PERIOD.

Soon after that progress got harder: my wife and I had our sixth baby in August, and somehow he seemed to be more work than the others. I took over daily math lessons (we homeschool), and I had to let go my biggest client, who needed more hours than I could give. (I’m proud to have given them an orderly transition over several months though.) In January 2022 Peter Eisentraut gave me a thorough review, but I went silent. Still, I had a lot of encouragement from the community, especially Corey Huinker, and eventually doing Postgres got easier again. I had a talk accepted for PgCon 2023, and I worked hard to submit new patches, which I did only weeks before the conference.

The best part of PgCon was getting everyone who cared about my work together in the hallway to agree on the overall approach. I had worried for years about using ranges as well as PERIODs, since the standard doesn’t know anything about ranges. The second-best part was when someone told me I should stop calling myself a Postgres newbie.

At PgCon Peter asked me to re-organize the patches, essentially implementing PERIODs as GENERATED range columns. It made the code much nicer. I also went back to an Executor Phase approach for FOR PORTION OF. Using triggers had some problems around updateable views and READ COMMITTED transaction isolation.

Since May I’ve felt more consistent than during my other Postgres work. I’ve been kept busy by excellent feedback by a meticulous reviewer, Jian He, who has caught many bugs. Often as soon as I get caught up, before I’ve written the email with the new patch files, he finds more things!

Another thing that’s helped is going out once a week (for nearly a year now) to get early dinner then work on Postgres at a local bar. Somehow it’s much easier to do Postgres from somewhere besides my home office, where I do all my normal work. Getting dinner lets me read something related (lately Designing Data-Intensive Applications by Martin Klepmann and PostgreSQL 14 Internals by Egor Rogov), and it’s fun. Doing just a little every week helps me keep momentum, so that fitting in further progress here and there seems easy. I’m lucky to have a wife who has supported it so often, despite leaving her with the kids and dishes.

I think I have years more work of temporal features to add, first finishing SQL:2011 then going beyond (e.g. temporal outer joins, temporal aggregates, temporal upsert). It’s been a great pleasure!

Temporal PKs Merged!

Git for Postgres Hacking

2023-11-06

In Postgres development it’s normal for patch attempts to require many revisions and last a long time. I just sent in v17 of my SQL:2011 application time patch. The commitfest entry dates back to summer of 2021, but it’s really a continuation of this thread from 2018. And it’s not yet done.

My work on multiranges is a similar story: 1.5 years from first patch to committed.

Today I saw this post by Julia Evans about problems people have with git rebase (also see the hn discussion), and it reminded me of my struggles handling long-lived branches.

In my early days with git I avoided rebasing, because I wanted the history to be authentic. Nowaday I rebase pretty freely, both to move my commits on top of the latest master branch work and to interactively clean things up so the commits show logical progress (with generous commit messages explaining the motivation and broad design decisions: the “why”).

But in my paid client work, PRs get merged pretty fast. There is nothing like the multi-year wait of Postgres hacking. Often I’ve wished for more history there. It’s not my day job, so it’s hard to remember fine details about something from months or years ago. And I’ve changed direction a couple times, and sometimes I want a way to consult that old history.

But with Postgres you don’t have any choice but to rebase. You send your patch files to a mailing list, and if they don’t apply cleanly no one will look at them. I’ve spent hours and hours rebasing patches because the underlying systems changed before they could get committed.

With multiranges this was tough, but at least it was just one patch file. Application time is a series of five patches, which over time have changed order and evolved from four. When it’s time to send a new version, I run git format-patch, which turns each commit into a .patch file. So I need to wind up with five well-groomed commits rebased on the latest master.

My personal copy of the postgres repo on github has a bunch of silly-named branches for stashing work when I want to change direction, so the history isn’t totally lost. But for a long time I had no system. It feels like when you see a spreadsheet named Annual Report - Copy of Jan 7.bak - final - FINAL.xls. After all these years it’s unmanageable. (Okay at least I know not to name any Postgres submission “final”! ;-)

I think I finally found a way to keep history that works for me. On my main valid-time branch I keep a series of commits for each small change. I rebase to move them up and down, so that they will squash cleanly into the five commits I need at the end. You can see that I have one main commit for each of the five patches, but each is followed by many commits named fixup pks: fixed this or fixup fks: feedback from so-and-so. I rebase on master every so often. I force-push all the time, since no one else uses the repo. (I do work on both a laptop and a desktop though, so I have to remember to git fetch && git reset --hard origin/valid-time.)

When I’m ready to submit new patches, I take a snapshot with git checkout -b valid-time-v17-pre-squash and “make a backup” with git push -u. Then I make a branch to squash things (git checkout -b valid-time-v17). I do a git rebase -i HEAD~60, press * on pick, type cw fixup, then n.n.n.n.n.n., etc. ’til I have just the five commits. Then I have a script to do a clean build + test on each commit, since I want things to work at every point. While that’s running I write the email about the new patch, and hopefully send it in.

So now I’m capturing the fine-grained history that went into each submission, and that won’t change no matter how aggressively I rebase the current work. I’m pretty happy with this flow. I wish I had started years ago.

One git feature I could almost use is git rebase -i --autosquash. (Here are some articles about it.) If your commit messages are named fixup! foo, then git will automatically set those commits to fixup, not pick, and it will move them to just below whatever commit matches foo. I follow this pattern but with fixup not fixup!, to keep it all manual. At first I just didn’t trust it (or myself).

Now I’m ready to move to this workflow, but I’m not sure how to “match” one of my five main commits. I want a meaningful title (i.e. the first line of the commit message) for each little commit, so I use short abbreviations for the patch they target, e.g. fixup pks: Add documentation for pg_constraint.contemporal column. Git doesn’t know that it should match pks to Add temporal PRIMARY KEY and UNIQUE constraints and ignore everything after the colon. If there were a way to preserve tags after a rebase I think I could tag the main commit as pks and it might work (but maybe not with the extra stuff after the colon).

You can have git generate the new commit message for you with git commit --fixup $sha, but it just copies the whole title verbatim, which is not what I want. Also who wants to remember $sha for those five parent commits? And finally, I want to move these commits into place immediately, so I can build & test against each patch as I work. Git can’t move them for me without squashing them.

The Thoughtbot article linked above says you can use a regex, e.g. git commit --fixup :/pks, but: (1) The regex is used immediately to find the parent, but it gets replaced with that parent’s title. It doesn’t stay in your commit message. (2) If you give an additional commit message, it goes two lines below the fixup! line, so it’s not in the commit title. This only solves having to remember $sha.

What I really want is fixup! ^: blah blah blah where ^ means “the closest non-squashed parent”, and the ^ is resolved at rebase time, not commit time, and everything after the colon is not used for matching. (If it needs to be a regex then :/. is sufficient too.)

Anyway I’m using my manual process for now, since with vim I can change 60 picks to fixup in a few seconds. I’m not willing to lose meaningful titles to save a few seconds with fixup!.

Nonetheless it would be nice to have one less step I have to remember. Involuntarily I keep thinking about how I can make this feature work for me. If someone has a suggestion, please do let me know.

Another approach is “stacked commits”. I went as far as installing git branchless and reading the docs and some articles, but to be honest I never went beyond a few tests, and I haven’t thought about it for a few months. It’s in the back of my head to give it a more honest effort.

Git for Postgres Hacking

Rails ActionMailer Internals

2023-10-16

When it comes to sending email in Rails, I’ve wondered for years about the gap between this:

class UserMailer < ApplicationMailer
  def welcome(user)
    @user = user
    mail(to: user.email)
  end
end

and this:

class User
  def send_welcome_notification
    UserMailer.welcome(self).deliver_later
  end
end

We are defining an instance method, but we are calling a class method. What’s going on there? I finally decided to take a closer look.

Well naturally this is implemented by method_missing. When you call UserMailer.welcome, the class will call your instance method—sort of! Actually method_missing just returns a MessageDelivery object, which provides lazy evaluation. It’s like a promise (but not asynchronous). Your method doesn’t get called until you resolve the “promise,” which normally would happen when you say deliver_now. You can also call #message which must resolve the promise (and returns whatever your method returned—sort of!).

What if you say deliver_later? That still doesn’t call your method. Instead it queues up a job, and later that will say deliver_now to finally call your method.

But if you’re using Sidekiq (with config.active_job.queue_adapter = :sidekiq), you might wonder how that #welcome method works, since we’re passing a User class and Sidekiq can only serialize primitive types. But it does work! The trick is that Rails’ queue adapter for Sidekiq does its own serialization before handing off the job to Sidekiq, and it tells Sidekiq to run its own Worker subclass that will deserialize things correctly.

All this assumes that your mailer method returns a Mail::Message instance. That’s what #mail is giving you. But what if you don’t? What if you call mail but not as the last line of your method? What if you call it more than once?

Well actually #mail (linking to the source code this time) remembers the message it generated, so even if you don’t return that from your own method, Rails will still send it properly. In fact it doesn’t matter what your own method returns!

And if you call #mail multiple times, then Rails will return early and do nothing for the second and third calls—sort of! If you pass any arguments or a block, then Rails will evaluate it again. But it still only knows how to store one Message. So when you finally call deliver_now, only one email will go out (ask me how I know).

Btw it turns out this is pretty much all documented on the ActionMailer::Base class, but it’s not really covered in the Rails Guide, so I never came across it. I only found those docs when I decided to read the code. I don’t know if other Rails devs spend much time reading Rails’ own code, but I’ve found it helpful again and again. It’s not hard and totally worth it!

Another trick I’ve used for years is bundle show actionmailer (or in the old days cd $(bundle show actionmailer), before they broke that with a deprecation notice), and then you can add pp or binding.pry wherever you like. It’s a great way to test your understanding of what’s happening or discover the internals of something.

Rails ActionMailer Internals

Custom Postgres Ubuntu Style

2023-09-29

Ubuntu has a very nice way of organizing multiple versions of Postgres. They all get their own directories, and the commands dispatch to the latest version or something else if you set the PGCLUSTER envvar or give a --cluster option. For instance if you have installed Postgres 14, you will see files in /usr/lib/postgresql/14 and /usr/share/postgresql/14.

In Postgres a single installation is called a “cluster”. It has nothing to do with using multiple machines; it’s just the traditional term for the collection of configuration, data files, a postmaster process listening on a given port and its helper processes, etc.

Elsewhere in the postgres world you say initdb to create a cluster. In Ubuntu you say pg_createcluster. By default Ubuntu creates a cluster named main for each version you install. This gives you directories like /etc/postgresql/14/main (for configuration) and /var/lib/postgresql/14/main (for the data). The log file is /var/log/postgresql/postgresql-14-main.log.

If you want to run an old version of pg_dump, you can say PGCLUSTER=10/main pg_dump --version or pg_dump --cluster=10/main --version. Likewise for pg_restore, etc. (but—sidequest spolier alert—not psql or a couple other things: see the footnote for more).

One command that sadly doesn’t support this is pg_config, which is used to build custom extensions. Personally I just patch my local copy (or actually add a patched version earlier in the path, in my ~/bin), like this:

#!/bin/sh

# If postgresql-server-dev-* is installed, call pg_config from the latest
# available one. Otherwise fall back to libpq-dev's version.
#
# (C) 2011 Martin Pitt <mpitt@debian.org>
# (C) 2014-2016 Christoph Berg <myon@debian.org>
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.

set -e
PGBINROOT="/usr/lib/postgresql/"
#redhat# PGBINROOT="/usr/pgsql-"

# MY CHANGES START HERE
if [ -n "$PGCLUSTER" ]; then
  exec "$PGBINROOT/$PGCLUSTER/bin/pg_config" "$@"
fi
# MY CHANGES END HERE

LATEST_SERVER_DEV=`ls -v $PGBINROOT*/bin/pg_config 2>/dev/null|tail -n1`

if [ -n "$LATEST_SERVER_DEV" ]; then
    exec "$LATEST_SERVER_DEV" "$@"
else
    if [ -x /usr/bin/pg_config.libpq-dev ]; then
        exec /usr/bin/pg_config.libpq-dev "$@"
    else
        echo "You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application." >&2
        exit 1
    fi
fi

Without those changes you can’t build custom C extensions against old versions of Postgres. I’ve mentioned this in the past in this Stackoverflow answer.

But that’s not what this post is about!

This post is about compiling your own Postgres that you can manage like other Postgres versions on Ubuntu. I want an install that includes my temporal patches, so I can convert my timetracking app to use real temporal features. I want the files to live in the normal places, and I want it to start/stop the normal way.

I’ve been hacking on Postgres for many years (Last May someone at PGCon told me I should stop calling myself a newbie. . . .), and I’ve always used ./configure --prefix=~/local ... to keep a dev installation. But I’ve never used it for anything durable. It’s just handy for make installcheck and psql’ing and attaching a debugger. I blow it away all the time with rm -rf ~/local/pgsql/data && ~/local/bin/initdb -D ~/local/pgsql/data. I crash it all the time because that’s how it goes when I’m writing C. ;-) That’s not where my timetracking data should live.

My first attempt was to build Postgres like this:

version=17devel
./configure \
  'CFLAGS=-ggdb -Og -g3 -fno-omit-frame-pointer' \
  --enable-tap-tests --enable-cassert --enable-debug \
  --prefix=/usr/lib/postgresql/${version} \
  --datarootdir=/usr/share/postgresql/${version}
make clean && make world && sudo make install-world

(I might as well keep some dev stuff in there in case I need it.)

Then as the postgres user I tried this:

postgres@tal:~$ pg_createcluster 17devel main
Error: invalid version '17devel'

Alas!

Ubuntu’s multi-version system is controlled by the postgresql-common package, so I got the source for it by running apt-get source postgresql-common. (You might need to uncomment a deb-src line in /etc/apt/sources.list and run sudo apt-get update.) Grepping for “invalid version” I found the message in pg_createcluster from these lines:

my ($version) = $ARGV[0] =~ /^(\d+\.?\d+)$/;
error "invalid version '$ARGV[0]'" unless defined $version;

Instead of fighting with the system I decided to call it version 30. It worked!

Except I had one last problem:

postgres@tal:~$ psql -p 5443
psql: error: connection to server on socket "/tmp/.s.PGSQL.5443" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?

The issue is that the postgresql-common infrastructure dispatches to the latest tools by default, and our “version 30” psql is looking in the wrong place for a socket file. In postgresql.conf you can see this line:

unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories

And taking a peek we have:

paul@tal:~$ ls -A /var/run/postgresql/
10-main.pg_stat_tmp  13-main.pid           9.4-main.pg_stat_tmp  .s.PGSQL.5433.lock  .s.PGSQL.5437       .s.PGSQL.5440.lock
10-main.pid          14-main.pg_stat_tmp   9.4-main.pid          .s.PGSQL.5434       .s.PGSQL.5437.lock  .s.PGSQL.5441
11-main.pg_stat_tmp  14-main.pid           9.5-main.pg_stat_tmp  .s.PGSQL.5434.lock  .s.PGSQL.5438       .s.PGSQL.5441.lock
11-main.pid          15-main.pid           9.5-main.pid          .s.PGSQL.5435       .s.PGSQL.5438.lock  .s.PGSQL.5442
12-main.pg_stat_tmp  30-main.pid           9.6-main.pg_stat_tmp  .s.PGSQL.5435.lock  .s.PGSQL.5439       .s.PGSQL.5442.lock
12-main.pid          9.3-main.pg_stat_tmp  9.6-main.pid          .s.PGSQL.5436       .s.PGSQL.5439.lock  .s.PGSQL.5443
13-main.pg_stat_tmp  9.3-main.pid          .s.PGSQL.5433         .s.PGSQL.5436.lock  .s.PGSQL.5440       .s.PGSQL.5443.lock

(Yeah I run a lot of versions. :-)

This is one way to fix the problem:

postgres@tal:~$ PGCLUSTER=14/main psql -p 5443
psql (17devel)
Type "help" for help.

postgres=#

But that’s too annoying, and the \d commands are going to be broken because they won’t know how to query the latest pg_* tables. (And by the way, why does psql still say it’s 17devel? I haven’t looked into that yet but it’s suspicious.¹) And in fact even using PGCLUSTER=30/main psql still works!

I think it’s a bug in this Perl code from /usr/bin/psql:

# if only a port is specified, look for local cluster on specified port
if ($explicit_port and not $version and not $cluster and not $explicit_host and not $explicit_service) {
    LOOP: foreach my $v (reverse get_versions()) {
        foreach my $c (get_version_clusters $v) {
            my $p = get_cluster_port $v, $c;
            if ($p eq $explicit_port) {
                $version = $v;
                # set PGCLUSTER variable for information
                $ENV{PGCLUSTER} = "$version/$c";
                last LOOP;
            }
        }
    }
}

You can see that it sets $version but not $cluster (just $ENV{PGCLUSTER}). Later if $cluster is set then it will look up the correct socket dir, but it’s only set if we’re explicit. Personally I’m fixing this by adding $cluster = $c; right before the $version = $v line. Then we’ll call get_cluster_socketdir below. It might not be 100% correct but it is good enough for my purposes.

So now I have a custom-patched Postgres running on Ubuntu! I see its /etc files, its data files, and its log file. After systemctl daemon-reload I can start it etc. So I think I’m all set. I’d just better re-run ./configure --prefix=~/local before I forget and re-install something broken on top of it. :-)

If I run into more problems, I’ll update this post.

¹ Oh, the answer is simple. From /usr/bin/psql:

# if we have no version yet, use the latest version. If we were called as psql,
# pg_archivecleanup, or pg_isready, always use latest version
if (not $version or $cmdname =~ /^(psql|pg_archivecleanup|pg_isready)$/) {
    my $max_version;
    if ($version and $version < 9.2) { # psql 15 only supports PG 9.2+
        $max_version = 14;
    }
    $version = get_newest_version($cmdname, $max_version);
}

But that means most of the last paragraph was wrong. Since the non-self-compiled tools find the socket file just fine, there must be a better solution than patching psql (which is technically pg_wrapper btw). So we are not done. Stay tuned for the, ahem, sequel!

Custom Postgres Ubuntu Style

Next: Rails dirty methods

Illuminated Computing

k3s Behind Nginx

Cozy Toes

Temporal PKs Merged!

Git for Postgres Hacking

Rails ActionMailer Internals

Custom Postgres Ubuntu Style

Paul A. Jungwirth

Code

Writing

Talks