一、架构设计
高可用架构图
图表
节点规划
|
主机名 |
IP 地址 |
角色 |
资源配置 |
|
lb1 |
192.168.1.10 |
HAProxy + Keepalived |
2C/4G |
|
lb2 |
192.168.1.11 |
HAProxy + Keepalived |
2C/4G |
|
master1 |
192.168.1.101 |
Kubernetes 控制平面 |
4C/8G |
|
master2 |
192.168.1.102 |
Kubernetes 控制平面 |
4C/8G |
|
worker1 |
192.168.1.201 |
Kubernetes 工作节点 |
8C/32G |
|
worker2 |
192.168.1.202 |
Kubernetes 工作节点 |
8C/32G |
|
worker3 |
192.168.1.203 |
Kubernetes 工作节点 |
8C/32G |
|
storage |
192.168.1.50 |
MinIO 备份存储 |
4C/16G |
二、前置条件准备
1. 所有节点基础配置
bash
# 关闭 SELinux
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 关闭防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld
# 禁用 Swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 设置 hosts 解析
sudo tee -a /etc/hosts < 192.168.1.10 lb1 192.168.1.11 lb2 192.168.1.101 master1 192.168.1.102 master2 192.168.1.201 worker1 192.168.1.202 worker2 192.168.1.203 worker3 192.168.1.50 storage EOF
2. 负载均衡节点配置 (lb1, lb2) bash
# 安装 HAProxy 和 Keepalived sudo dnf install -y haproxy keepalived
# 配置 HAProxy (/etc/haproxy/haproxy.cfg) sudo tee /etc/haproxy/haproxy.cfg < global log /dev/log local0 maxconn 10000 user haproxy group haproxy defaults mode tcp timeout connect 5s timeout client 50s timeout server 50s frontend k8s-api bind *:6443 default_backend k8s-masters frontend kuboard-http bind *:80 default_backend kuboard-http-backend frontend kuboard-https bind *:443 default_backend kuboard-https-backend backend k8s-masters balance roundrobin option tcp-check server master1 192.168.1.101:6443 check fall 3 rise 2 server master2 192.168.1.102:6443 check fall 3 rise 2 backend kuboard-http-backend balance roundrobin server worker1 192.168.1.201:30080 check server worker2 192.168.1.202:30080 check server worker3 192.168.1.203:30080 check backend kuboard-https-backend balance roundrobin server worker1 192.168.1.201:30443 check server worker2 192.168.1.202:30443 check server worker3 192.168.1.203:30443 check EOF
# 启动 HAProxy sudo systemctl enable --now haproxy
3. Keepalived 配置 (lb1 为主节点) bash
# lb1 配置 (/etc/keepalived/keepalived.conf) sudo tee /etc/keepalived/keepalived.conf < vrrp_script chk_haproxy { script "pidof haproxy" interval 2 } vrrp_instance VI_1 { state MASTER interface ens192 # 替换为实际网卡名 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass secretpassword } virtual_ipaddress { 192.168.1.100/24 } track_script { chk_haproxy } } EOF
# lb2 配置 (备用节点) sudo tee /etc/keepalived/keepalived.conf < vrrp_script chk_haproxy { script "pidof haproxy" interval 2 } vrrp_instance VI_1 { state BACKUP interface ens192 # 替换为实际网卡名 virtual_router_id 51 priority 90 advert_int 1 authentication { auth_type PASS auth_pass secretpassword } virtual_ipaddress { 192.168.1.100/24 } track_script { chk_haproxy } } EOF
# 启动 Keepalived sudo systemctl enable --now keepalived
三、Kubernetes 集群部署
1. 所有节点安装容器运行时 bash sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo sudo dnf install -y containerd.io
# 配置 containerd sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml sudo systemctl restart containerd && sudo systemctl enable containerd
2. 所有节点安装 Kubernetes 组件 bash cat < [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF sudo dnf install -y kubelet-1.28 kubeadm-1.28 kubectl-1.28 --disableexcludes=kubernetes sudo systemctl enable kubelet
3. 初始化控制平面 (master1) bash sudo kubeadm init \ --control-plane-endpoint="192.168.1.100:6443" \ --upload-certs \ --pod-network-cidr=10.244.0.0/16 \ --apiserver-advertise-address=192.168.1.101 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
4. 添加其他控制平面节点 (master2) bash
# 在 master1 上获取 join 命令 kubeadm token create --print-join-command
# 在 master2 上执行(添加 --control-plane 参数) sudo kubeadm join 192.168.1.100:6443 --token --discovery-token-ca-cert-hash --control-plane \ --certificate-key
5. 添加工作节点 bash
# 在 worker 节点上执行 join 命令 sudo kubeadm join 192.168.1.100:6443 --token --discovery-token-ca-cert-hash
6. 安装网络插件 (Calico) bash kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
四、存储解决方案部署
1. 安装 Longhorn 分布式存储 bash
# 添加 Helm 仓库 helm repo add longhorn https://charts.longhorn.io helm repo update
# 安装 Longhorn helm install longhorn longhorn/longhorn \ --namespace longhorn-system \ --create-namespace \ --set persistence.defaultClass=true \ --set defaultSettings.defaultDataLocality="best-effort" \ --set defaultSettings.replicaSoftAntiAffinity=true \ --set defaultSettings.storageOverProvisioningPercentage=200 \ --set defaultSettings.storageMinimalAvailablePercentage=15 \ --set defaultSettings.guaranteedEngineCPU=0.25
2. 创建 Kuboard 专用存储类 yaml
# kuboard-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: kuboard-storage provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Retain volumeBindingMode: Immediate parameters: numberOfReplicas: "3" staleReplicaTimeout: "2880"
# 48
小时 dataLocality: "best-effort"
五、高可用 Kuboard 部署
1. 创建 Kuboard 命名空间和 PVC bash kubectl create namespace kuboard-system yaml
# kuboard-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: kuboard-data namespace: kuboard-system spec: accessModes: - ReadWriteMany storageClassName: kuboard-storage resources: requests: storage: 20Gi
2. 部署高可用 Kuboard bash
# 下载原始部署文件 curl -LO https://addons.kuboard.cn/kuboard/kuboard-v3.yaml
# 修改为高可用版本 sed -i 's/replicas: 1/replicas: 3/' kuboard-v3.yaml sed -i '/containers:/i \ volumes:\n - name: data\n persistentVolumeClaim:\n claimName: kuboard-data' kuboard-v3.yaml sed -i '/containers:/,/ports:/ {/imagePullPolicy:/a \ volumeMounts:\n - name: data\n mountPath: /data' kuboard-v3.yaml
# 应用配置 kubectl apply -f kuboard-v3.yaml
3. 配置服务暴露 yaml
# kuboard-service.yaml apiVersion: v1 kind: Service metadata: name: kuboard-v3 namespace: kuboard-system spec: selector: app: kuboard ports: - name: http port: 80 targetPort: 80 nodePort: 30080 - name: https port: 443 targetPort: 443 nodePort: 30443 type: NodePort
4. 创建长期有效的访问 Token bash kubectl -n kuboard-system create serviceaccount kuboard-admin kubectl create clusterrolebinding kuboard-admin-binding \ --clusterrole=cluster-admin \ --serviceaccount=kuboard-system:kuboard-admin
# 创建有效期1年的Token kubectl -n kuboard-system create token kuboard-admin --duration=8760h > kuboard-token.txt
六、备份解决方案
1. 安装 MinIO 备份存储 bash
# 在 storage 节点安装 MinIO sudo dnf install -y minio
# 创建数据目录 sudo mkdir -p /data/backups sudo chown minio-user:minio-user /data/backups
# 配置 MinIO 服务 sudo tee /etc/default/minio < MINIO_VOLUMES="/data/backups" MINIO_OPTS="--address :9000 --console-address :9001" MINIO_ROOT_USER=admin MINIO_ROOT_PASSWORD=StrongPassword123! EOF
# 启动 MinIO sudo systemctl enable --now minio
2. 安装 Velero 备份工具 bash
# 下载 Velero 客户端 wget https://github.com/vmware-tanzu/velero/releases/download/v1.11.1/velero-v1.11.1-linux-amd64.tar.gz tar -zxvf velero-v1.11.1-linux-amd64.tar.gz sudo mv velero-v1.11.1-linux-amd64/velero /usr/local/bin/
# 创建备份凭证 cat < [default] aws_access_key_id = admin aws_secret_access_key = StrongPassword123! EOF
# 安装 Velero velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.7.0 \ --bucket kuboard-backups \ --secret-file ./credentials-velero \ --use-volume-snapshots=true \ --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.50:9000 \ --snapshot-location-config region=minio
3. 配置定期备份 bash
# 每日全量备份 velero schedule create kuboard-daily \ --schedule="0 3 * * *" \ --include-namespaces kuboard-system \ --ttl 72h
# 每周快照备份 velero schedule create kuboard-weekly \ --schedule="0 4 * * 0" \ --include-namespaces kuboard-system \ --ttl 720h \ --snapshot-volumes
4. 备份验证脚本 bash #!/bin/bash
# check-backup.sh
# 检查最新备份状态 LATEST_BACKUP=$(velero backup get | grep kuboard-daily | sort -r | head -n1 | awk '{print $1}') BACKUP_STATUS=$(velero backup describe $LATEST_BACKUP --details | grep Phase | awk '{print $3}') if [ "$BACKUP_STATUS" != "Completed" ]; then echo "Backup $LATEST_BACKUP failed! Status: $BACKUP_STATUS" exit 1 else echo "Backup $LATEST_BACKUP completed successfully" fi
# 添加至 cron 每日检查
# 0 4 * * * /path/to/check-backup.sh | mail -s "Kuboard Backup Report" admin@example.com
七、访问与监控
1. 访问 Kuboard
2. 监控配置 yaml
# kuboard-monitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: kuboard-monitor namespace: kuboard-system spec: selector: matchLabels: app: kuboard endpoints: - port: http interval: 30s path: /metrics
3. 告警规则 yaml
# kuboard-alerts.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: kuboard-alerts namespace: kuboard-system spec: groups: - name: kuboard-rules rules: - alert: KuboardDown expr: up{job="kuboard"} == 0 for: 5m labels: severity: critical annotations: summary: Kuboard pod down in {{ $labels.namespace }} - alert: KuboardHighLatency expr: histogram_quantile(0.95, sum(rate(kuboard_request_duration_seconds_bucket[5m])) by (le) > 3 for: 10m labels: severity: warning annotations: summary: Kuboard high request latency
八、运维与维护
1. 日常维护命令
操作
命令 查看 Kuboard 状态
kubectl -n kuboard-system get pods -l app=kuboard 检查备份状态
velero backup get 查看存储使用
kubectl -n longhorn-system get volumes 重启 Kuboard
kubectl -n kuboard-system rollout restart deployment kuboard-v3
2. 灾难恢复流程 bash velero restore create --from-backup kuboard-daily-latest bash
# 列出可用快照 velero snapshot location get
# 恢复特定卷 velero restore create --from-backup kuboard-daily-latest \ --restore-volumes \ --include-resources persistentvolumeclaims,persistentvolumes
3. 升级策略
图表
九、安全加固
1. RBAC 权限控制 yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: kuboard-system name: kuboard-viewer rules: - apiGroups: [""] resources: ["pods", "services", "deployments"] verbs: ["get", "list", "watch"]
2. 网络策略 yaml apiVersion: projectcalico.org/v3 kind: NetworkPolicy metadata: name: kuboard-access namespace: kuboard-system spec: selector: app == 'kuboard' ingress: - action: Allow protocol: TCP source: namespaceSelector: name == 'ingress-nginx' destination: ports: [80, 443] egress: - action: Allow protocol: TCP destination: ports: [80, 443] - action: Allow protocol: UDP destination: ports: [53]
3. 证书管理 bash
# 为 Kuboard 生成 TLS 证书 openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout kuboard.key -out kuboard.crt \ -subj "/CN=kuboard.example.com" \ -addext "subjectAltName=DNS:kuboard.example.com,IP:192.168.1.100"
# 创建 Kubernetes Secret kubectl -n kuboard-system create secret tls kuboard-tls \ --key kuboard.key \ --cert kuboard.crt
十、性能优化建议
1. Kuboard 资源配置 yaml
# kuboard-resources.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kuboard-v3 namespace: kuboard-system spec: template: spec: containers: - name: kuboard resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "2Gi" cpu: "1"
2. 数据库性能优化 sql
-- 在 Kuboard 的 PostgreSQL 中执行 ALTER SYSTEM SET shared_buffers = '1GB'; ALTER SYSTEM SET work_mem = '32MB'; ALTER SYSTEM SET maintenance_work_mem = '256MB'; ALTER SYSTEM SET effective_cache_size = '3GB';
3. 缓存配置 yaml
# kuboard-cache.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kuboard-v3 namespace: kuboard-system spec: template: spec: containers: - name: kuboard env: - name: CACHE_TYPE value: "redis" - name: REDIS_URL value: "redis://redis.kuboard-system:6379/0"
总结 此方案提供了在 CentOS Stream 8 上部署高可用 Kuboard 的完整解决方案,关键特点包括: 此架构能够支持中等规模生产环境的使用,建议每季度进行一次全链路压力测试,每月验证一次备份恢复流程,确保系统的高可用性和数据安全性。