tiup-cluster(tls): support custom TLS certificates instead of the self-signed ones#2703
tiup-cluster(tls): support custom TLS certificates instead of the self-signed ones#2703panda2134 wants to merge 5 commits into
Conversation
Add TLSMode field to GlobalOptions with IsCustomTLS() as the single branching predicate. In custom mode, each component's setTLSConfig() validates user-provided security.*-path keys instead of overwriting them, and buildCertificateTasks()/loadCertificate() are skipped entirely. Manager.TLS() accepts CustomTLSOptions for cert validation, mode transitions (managed↔custom with --force), and client cert backup+copy via swapClientCertFiles(). SwapClientCert() guards standalone cert rotation to custom-mode clusters only. CLI adds --custom, --client-ca/cert/key flags and swap-client-cert as an action in the existing tls <cluster> <action> switch. Claude was used but code has been manually reviewed and polished. This should close pingcap#2693.
Before this change, blackbox_exporter is hardcoded to use self-signed TLS certificates, regardless of whether TLS mode is custom. Now we skip managed cert generation for blackbox_exporter when IsCustomTLS(), and add blackbox_ca/cert/key fields to MonitoredOptions so users can specify cert paths via edit-config.
…lobal config block
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @panda2134! It looks like this is your first PR to pingcap/tiup 🎉 |
Custom TLS Certificate Testing StepsEnvironment: 3 VMs (172.18.0.8, 172.18.0.9, 172.18.0.10), user 1. Baseline: Deploy cluster with managed TLSTopology (no custom TLS): # topo.yaml
global:
user: tidb
pd_servers:
- host: 172.18.0.8
tikv_servers:
- host: 172.18.0.8
- host: 172.18.0.9
- host: 172.18.0.10
tidb_servers:
- host: 172.18.0.8Deploy and enable managed TLS: tiup cluster deploy test-cluster v8.5.5 topo.yaml -u tidb -i ~/.ssh/tidb_deploy
tiup cluster start test-cluster
tiup cluster tls test-cluster enableVerify managed certs: openssl s_client -connect 172.18.0.8:2379 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.8:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.9:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.10:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuerResult: all show 2. Generate custom certificatesCerts must be generated on Linux to avoid Windows line-ending corruption. ssh tidb@172.18.0.8 "bash -s" << 'EOF'
mkdir -p ~/certs && cd ~/certs
# CA
openssl genrsa -out ca.pem 4096
openssl req -new -x509 -days 3650 -key ca.pem -out ca.crt \
-subj "/CN=TiDB BYOC CA" \
-addext "basicConstraints=critical,CA:TRUE" \
-addext "keyUsage=critical,keyCertSign,cRLSign"
gen_cert() {
local name=$1 ip=$2
openssl genrsa -out ${name}.pem 2048
openssl req -new -key ${name}.pem -out ${name}.csr -subj "/CN=${name}"
openssl x509 -req -days 3650 -in ${name}.csr -CA ca.crt -CAkey ca.pem \
-CAcreateserial -out ${name}.crt \
-extfile <(printf "subjectAltName=IP:${ip}\nbasicConstraints=CA:FALSE\nkeyUsage=digitalSignature,keyEncipherment\nextendedKeyUsage=serverAuth,clientAuth")
rm ${name}.csr
}
gen_cert pd 172.18.0.8
gen_cert pd-9 172.18.0.9
gen_cert tikv-8 172.18.0.8
gen_cert tikv-9 172.18.0.9
gen_cert tikv-10 172.18.0.10
gen_cert tidb 172.18.0.8
gen_cert client 127.0.0.1
# Install locally on .8
sudo cp ca.crt pd.crt pd.pem tidb.crt tidb.pem /etc/pki/tidb/
sudo cp tikv-8.crt /etc/pki/tidb/tikv.crt
sudo cp tikv-8.pem /etc/pki/tidb/tikv.pem
EOFDistribute to other nodes: # .9
ssh tidb@172.18.0.8 "cat ~/certs/ca.crt" | ssh tidb@172.18.0.9 "sudo tee /etc/pki/tidb/ca.crt > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/tikv-9.crt" | ssh tidb@172.18.0.9 "sudo tee /etc/pki/tidb/tikv.crt > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/tikv-9.pem" | ssh tidb@172.18.0.9 "sudo tee /etc/pki/tidb/tikv.pem > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/pd-9.crt" | ssh tidb@172.18.0.9 "sudo tee /etc/pki/tidb/pd.crt > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/pd-9.pem" | ssh tidb@172.18.0.9 "sudo tee /etc/pki/tidb/pd.pem > /dev/null"
# .10
ssh tidb@172.18.0.8 "cat ~/certs/ca.crt" | ssh tidb@172.18.0.10 "sudo tee /etc/pki/tidb/ca.crt > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/tikv-10.crt" | ssh tidb@172.18.0.10 "sudo tee /etc/pki/tidb/tikv.crt > /dev/null"
ssh tidb@172.18.0.8 "cat ~/certs/tikv-10.pem" | ssh tidb@172.18.0.10 "sudo tee /etc/pki/tidb/tikv.pem > /dev/null"Copy client cert to control machine: scp tidb@172.18.0.8:~/certs/ca.crt ~/byoc-certs/ca.crt
scp tidb@172.18.0.8:~/certs/client.crt ~/byoc-certs/client.crt
scp tidb@172.18.0.8:~/certs/client.pem ~/byoc-certs/client.pem3. Install patched tiup-cluster# Build
make cluster
# Replace (adjust version as needed)
cp bin/tiup-cluster ~/.tiup/components/cluster/v1.16.5/tiup-cluster4. Switch managed to custom TLSSet cert paths for all instances: tiup cluster edit-config test-clusterAdd config sections: pd_servers:
- host: 172.18.0.8
config:
security.cacert-path: /etc/pki/tidb/ca.crt
security.cert-path: /etc/pki/tidb/pd.crt
security.key-path: /etc/pki/tidb/pd.pem
tikv_servers:
- host: 172.18.0.8
config:
security.ca-path: /etc/pki/tidb/ca.crt
security.cert-path: /etc/pki/tidb/tikv.crt
security.key-path: /etc/pki/tidb/tikv.pem
- host: 172.18.0.9
config:
security.ca-path: /etc/pki/tidb/ca.crt
security.cert-path: /etc/pki/tidb/tikv.crt
security.key-path: /etc/pki/tidb/tikv.pem
- host: 172.18.0.10
config:
security.ca-path: /etc/pki/tidb/ca.crt
security.cert-path: /etc/pki/tidb/tikv.crt
security.key-path: /etc/pki/tidb/tikv.pem
tidb_servers:
- host: 172.18.0.8
config:
security.cluster-ssl-ca: /etc/pki/tidb/ca.crt
security.cluster-ssl-cert: /etc/pki/tidb/tidb.crt
security.cluster-ssl-key: /etc/pki/tidb/tidb.pem
monitored:
blackbox_ca: /etc/pki/tidb/ca.crt
blackbox_cert: /etc/pki/tidb/tikv.crt
blackbox_key: /etc/pki/tidb/tikv.pemSwitch to custom mode: tiup cluster tls test-cluster enable --custom --force \
--client-ca=$HOME/byoc-certs/ca.crt \
--client-cert=$HOME/byoc-certs/client.crt \
--client-key=$HOME/byoc-certs/client.pemResult: all 5 nodes came up successfully. Check if % echo "# swapped-marker" >> ~/byoc-certs/client.crt
% echo "# swapped-marker" >> ~/byoc-certs/client.pem
% echo "# swapped-marker" >> ~/byoc-certs/ca.crt
% tiup cluster tls test-cluster swap-client-cert --client-ca ca.crt --client-cert client.crt --client-key client.pem
% cd /home/jiangyi.liu/.tiup/storage/cluster/clusters/test-cluster/tls/
% grep -r "swapped-marker" # should give 3 files: ca.crt, client.crt, client.pem5. Verify BYOC certs are activeopenssl s_client -connect 172.18.0.8:2379 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.8:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.9:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
openssl s_client -connect 172.18.0.10:20160 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuerResult: 6. Verify blackbox_exporter with custom certsCheck process and health: ssh tidb@172.18.0.8 "ps aux | grep blackbox_exporter"
ssh tidb@172.18.0.8 "curl -s http://localhost:9115/health"Verify config has custom paths: ssh tidb@172.18.0.8 "cat /home/tidb/deploy/monitor-9100/conf/blackbox.yml"Result: Probe TLS endpoints through blackbox: ssh tidb@172.18.0.8 "curl -s 'http://localhost:9115/probe?target=172.18.0.8:2379&module=tls_connect' | grep probe_success"
ssh tidb@172.18.0.8 "curl -s 'http://localhost:9115/probe?target=172.18.0.8:20160&module=tls_connect' | grep probe_success"
ssh tidb@172.18.0.8 "curl -s 'http://localhost:9115/probe?target=172.18.0.9:20160&module=tls_connect' | grep probe_success"
ssh tidb@172.18.0.8 "curl -s 'http://localhost:9115/probe?target=172.18.0.10:20160&module=tls_connect' | grep probe_success"Result: all 7. Display shows TLS modetiup cluster display test-clusterOutput includes: 8. Scale-out PD with custom TLSPD Certificate is already copied to 172.18.0.9. Scale-out topology: # scale-out-pd.yaml
pd_servers:
- host: 172.18.0.9
config:
security.cacert-path: /etc/pki/tidb/ca.crt
security.cert-path: /etc/pki/tidb/pd.crt
security.key-path: /etc/pki/tidb/pd.pemtiup cluster scale-out test-cluster scale-out-pd.yamlVerify: tiup cluster display test-cluster
openssl s_client -connect 172.18.0.9:2379 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
# Expected: issuer=CN=TiDB BYOC CA9. Backward CompatTested swapping 10. Custom -> Managed SwitchFirst, remove all but one PD nodes: Then, perform the switch: Verify the certificate is now managed. 11. Disable TLSDisable TLS: |
|
/cc @kaaaaaaang Please have a look when possible, thanks! |
What problem does this PR solve?
Support custom TLS certificates in TiUP so users don't have to use the self-signed certificates. Close #2693.
What is changed and how it works?
A dedicate TLS mode "custom" is added. Specs of all TiDB components are adjusted, so in that mode, TiUP will not touch TLS certificate related configs (e.g.,
[security]section's ca-cert, client-cert, client-key in PD/TiKV) in config generation, keeping the user-specified certificate paths. TiUP itself is also configured to use a client TLS certificate in custom TLS mode. Users are expected to bring their own certificates for this mode (instead of depending on TiUP generating the certificates), preferably using internal certificates signed and distributed by organizations' internal CA.CLI is modified to allow switching between the default TLS mode (called "managed") and the "custom" TLS mode.
DISCLAIMER: LLM is used for generation of some of the code, but all code is reviewed and polished by hand. Changes are also tested manually and reliably on a 3-node test cluster. See below for detailed steps of testing.
Check List
Tests
Code changes
Side effects
Related changes
Release notes: