My shop has been using consul-template to rotate the vault certificates for us each month, but unfortunately this turned out to not be very reliable. Since my current shop is actually replacing vault for AWS SSM, it hasn’t gotten much TLC lately and there’s really no reason to pour work into this setup to make it more resiliant; plus it’s mostly used by our legacy staging environments, which naturally don’t get much TLC in any environment. My plan is to remove consul-template and replace it with 1 year certificates which gives us time to not worry about certificates expiring while we dismantle things for the next few months. These certs I speak of are just the certs that are used for vault servers to communicate with each other.

So here I am on a Sunday evening (so I don’t interrupt our development workflows during the week), attempting to rip out Consul template and rotate the certs manually to ones with a 1 year TTL. If you haven’t done this before, it’s actually quite simple.

Obviously it would be awkward to rotate the certs you’re currently using, so on all Vault servers in the cluster, in /etc/vault.d/vault.hcl you disable TLS and restart everything so they are communicating now on HTTP.

listener "tcp" {
  address = "0.0.0.0:8200"
-  tls_cert_file = "/etc/ssl/vault/vault.crt"
-  tls_key_file = "/etc/ssl/vault/vault.key"
+  tls_disable = 1
}

Once you restarted the daemons, you then set: export VAULT_ADDR='http://127.0.0.1:8200'. Okay simple enough. Now you go through the whims of unsealing, echo $UNSEAL_KEY1 | vault unseal as many times as you need, finish with a good ole’ vault status – now we’re in business to rotate some certs. Using my domain name & random hostnames in the example so I don’t reveal things. 🙈

$ vault write pki/issue/sudoaccess-dot-com common_name=vault-1.sudoaccess.com alt_names="localhost,*.sudoaccess.com,vault.consul.local" ip_sans="127.0.0.1,192.168.1.200" > vault-1.txt

Cool, let’s look at the file:

Key                 Value
---                 -----
lease_id            pki/issue/sudoaccess-dot-com/1234567890token1234
lease_duration      767h59m59s
lease_renewable     false
ca_chain            [-----BEGIN CERTIFICATE-----
MIIEpQIBAAKCAQEA0onHvatXo8X7Sr5ANkTEnn7ipjpL6z0pSc1uV6F1aLX1I94f

Wait… lease_duration of 767 hours. I don’t want 1 month. Let’s be explicit in our TTL.

$ vault write pki/issue/sudoaccess-dot-com common_name=vault-1.sudoaccess.com alt_names="localhost,*.sudoaccess.com,vault.consul.local" ip_sans="127.0.0.1,192.168.1.200" ttl="8760h" > vault-1.txt
Key                 Value
---                 -----
lease_id            pki/issue/sudoaccess-dot-com/1234567890token1234
lease_duration      767h59m59s
lease_renewable     false
ca_chain            [-----BEGIN CERTIFICATE-----
MIIEpQIBAAKCAQEA0onHvatXo8X7Sr5ANkTEnn7ipjpL6z0pSc1uV6F1aLX1I94f

🤔 well that doesn’t seem right.

And on I go with various ways of writing a 1 year TTL, ttl=8760h, or ttl=31536000, etc. Still nothing.

Alright fine, I’m going to force the max_ttl by doing:

$ export VAULT_TOKEN=<root_vault_token>

$ vault secrets tune -max-lease-ttl=8760h pki
Success! Tuned the secrets engine at: pki/

$ vault secrets tune -max-lease-ttl=8760h pki/issue/sudoaccess-dot-com
Success! Tuned the secrets engine at: pki/issue/sudoaccess-dot-com/

If you don’t have a root vault token, follow these directions to set one up.

Alright this should work. Well guess what? It didn’t. Same result. At this point I’m saying WTF out loud.

After digging around in the CLI, trying to override this in various ways. It became obvious that the sudoaccess-dot-com role is restricting it to 1 month. After reading through this and seeing Note that individual roles can restrict this value to be shorter on a per-certificate basis.

I ended up finding this gem of documentation. It’s the PKI API guide.

Let’s check out this role by running:

$ curl -s \
    --header "X-Vault-Token:1234567890token1234" \
    http://127.0.0.1:8200/v1/pki/roles/sudoaccess-dot-com | jq .
{
  "request_id": "bf389343-73d2-414a-96ff-df37ea15ec5d",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
.... truncated ....
    "key_bits": 2048,
    "key_type": "rsa",
    "key_usage": [
      "DigitalSignature",
      "KeyAgreement",
      "KeyEncipherment"
    ],
    "locality": null,
    "max_ttl": 2764800,
    "no_store": false,
    "not_before_duration": 0,
    "organization": null,
    "ou": null,
    "policy_identifiers": null,
    "postal_code": null,
    "province": null,
    "require_cn": false,
    "server_flag": true,
    "street_address": null,
    "ttl": 2764800,
    "use_csr_common_name": true,
    "use_csr_sans": false
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

There we go: "max_ttl": 2764800,… turns out that’s 768 hours. Let’s save this entire output to a file named payload.json and change it to 31536000. Once that’s done, let’s post it by running:

$ curl -s \
    --header "X-Vault-Token:1234567890token1234" \
    --request POST \
    --data @payload.json \
    http://127.0.0.1:8200/v1/pki/roles/sudoaccess-dot-com

Let’s give the vault write command a shot again and look at the cert output:

Key                 Value
---                 -----
lease_id            pki/issue/sudoaccess-dot-com/1234567890token1234
lease_duration      8759h59m59s
lease_renewable     false
ca_chain            [-----BEGIN CERTIFICATE-----
MIIEpQIBAAKCAQEA0onHvatXo8X7Sr5ANkTEnn7ipjpL6z0pSc1uV6F1aLX1I94f

8759h59m59s!

Woohoo, let’s chop this up into a vault.crt and a vault.key and re-enable TLS in /etc/vault.d/vault.hcl.

listener "tcp" {
  address = "0.0.0.0:8200"
+  tls_cert_file = "/etc/ssl/vault/vault.crt"
+  tls_key_file = "/etc/ssl/vault/vault.key"
-  tls_disable = 1
}

After daemons are restarted, export VAULT_ADDR='https://127.0.0.1:8200', and go through your vault unseal shenanigans.

$ vault status
Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    5
Threshold       3
Version         0.11.6
Cluster Name    vault-cluster-c917641f
Cluster ID      e9361a1d-6e41-4168-9fea-03600feaa035
HA Enabled      true
HA Cluster      https://192.168.1.115:8201
HA Mode         active

👏

So why couldn’t I override it with the CLI? Why did I have to hack away at the API? Upon more reading at least in version 0.11.6, I found that the role will use the TTL no matter what based on the first certificate that is issued by it. Our consul-template service so happened to be the first to do it with this specific role, and since it was using 1-month TTL’s, that’s what the role was stuck with enforcing.

Full disclaimer, this was my second time ever having to troubleshoot vault. But this still seems to be a pretty unknown (and annoying) gotcha, especially amongst my more knowledgable cohorts on this system. As of writing this wasn’t clearly documented anywhere and I had to be a little creative, and I couldn’t find anyone talking about this scenario with all my google-fu. Hopefully this helps someone. Thanks for reading.