<mike_sw294503>
Hi. It's entirely possible I've done something wacky, but my Sandstorm server just stopped working because my sandcats certificate renewal failed.
<mike_sw294503>
I think this is the important bit of the error: Failed to renew certificate (will try again in 6 hours): Error: queryTxt ENODATA _acme-challenge.swierczek.sandcats.io
<mike_sw294503>
at QueryReqWrap.onresolve [as oncomplete] (dns.js:203:19)
<mike_sw294503>
sandstorm/gateway.c++:951: info: Loading TLS key into Gateway
<mike_sw294503>
Obviously 'swierczek' is my sandcats subdomain. To my knowledge, I haven't messed up the server config.
<kentonv>
mike_sw294503, hmm, the log looks like it actually did complete the exchange. The error probably came from a previous attempt, but then it was retried later. Or do you see several of those errors?
<kentonv>
the "Leading TLS key into Gateway" message indicates that a final key/cert was obtained successfully
<mike_sw294503>
I see several copies.
<kentonv>
well, hmm, I do see that you have an expired cert
<mike_sw294503>
Thanks for the quick response. I restarted the server, that didn't help.
<kentonv>
ok, the "Loading TLS key" message probably happened when restarting the server then
<kentonv>
which is normal
<kentonv>
ok, so, the error could be because the machine is having trouble with DNS queries
<kentonv>
What sandstorm does is, it talks to the sandcats API to ask that the _acme-challenge TXT record be set, and then it tries to do a regular DNS query to check that the record is present
<kentonv>
the error indicates that the check is showing no record
<kentonv>
is it possible that outgoing DNS queries are being blocked somehow on your server?
<isd>
kentonv: fwiw, I just ran dig -t TXT _acme-challenge.swierczek.sandcats.io and got no results, so it seems there in fact is no record -- though I don't know if sandcats clears them at a certain point.
<mike_sw294503>
Thanks for the explanation. I'll start digging into it. This server is running on my Linux workstation (logs going back to July 2016, by the way - thanks for making something wonderful). Maybe something Comcast is doing is messing with my DNS?
<kentonv>
isd, I think code in Sandstorm may actually request they be removed when it's done (even in the error case)
<kentonv>
mike_sw294503, I just tried a renewal to verify the server is working. Seems to be. So yeah I guess I'd try to figure out if Comcast is blocking TXT queries or something. Maybe try using 1.1.1.1 as your DNS. :)
<mike_sw294503>
I switched my DNS to 1.1.1.1,1.0.0.1 and restarted Sandstorm, still no dice.
<mike_sw294503>
Yeah, I hear Cloudflare employs some pretty smart people :)
<mike_sw294503>
Damn, I didn't think of this earlier - could IPv6 screw it up? I only enabled that earlier this year.
<kentonv>
I don't think ipv6 would affect this.
<kentonv>
the log is still showing the same error each time you restart?
<mike_sw294503>
Yes. I'm going to restart my workstation (which will drop me from IRC) just in case I've got something wacky going on that way. Be back in a bit.
mike_sw294503 has quit [Quit: Connection closed]
mike_sw294503 has joined #sandstorm
<mike_sw294503>
No dice. `nmcli device show bridge0 | grep IP4.DNS` shows that I'm using Cloudflare DNS, but my Sandstorm log has the error I printed and then a string of: sandstorm/gateway.c++:1057: error: exception = kj/compat/tls.c++:63: failed: OpenSSL error; message = error:10000415:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED
<kentonv>
I think that error is a consequence of the certificate fetch failing, but not a cause of it.
<kentonv>
what does your /etc/resolv.conf say you're using for dNS?
<mike_sw294503>
It refers me to `systemd-resolve --status` ...and the output of that isn't what I expected.
<kentonv>
err
<kentonv>
if your resolv.conf doesn't have a `nameserver 1.1.1.1` line in it then Sandstorm is not going to be able to make DNS queries
<kentonv>
what exactly is the contents of your resolv.conf?
<mike_sw294503>
...and my wife just walked up and informed me we have plans tonight. I'm going to dig into this later, I'll have to learn how to read systemd-resolve output and get 1.1.1.1 into the list of things it's using.
<kentonv>
ah. Hmm well I think that should be OK, assuming the local resolver does actually respond to queries correctly
<mike_sw294503>
I'll come back to this later, thanks for your help! I do see 1.1.1.1 in the systemd-resolve. The output that confused me is related to the mini-Kubernetes cluster I was playing with. When I get back to this later, I'll uninstall that and disable IPv6 just to remove them as factors.