准备系统
我的机器详情如下,配置至少为 4C4G
hostname | IP | 作用 |
---|---|---|
public | 10.0.0.3 | ingress、apiserver 负载均衡,nfs 存储 |
master1 | 10.0.0.11 | k8s master 节点 |
master2 | 10.0.0.12 | k8s master 节点 |
master3 | 10.0.0.13 | k8s master 节点 |
worker1 | 10.0.0.21 | k8s worker 节点 |
worker2 | 10.0.0.22 | k8s worker 节点 |
每台机器都做域名解析,或者绑定 hosts (可选但建议)
vim /etc/hosts
10.0.0.3 public kube-apiserver
10.0.0.11 master1
10.0.0.12 master2
10.0.0.13 master3
基础环境配置
基础环境是不管 master 还是 worker 都需要的环境
-
禁用 swap
sudo swapoff -a #编辑/etc/fstab ,sudo vim /etc/fstab 注释关于sawp分区的内容
-
修改 hostname
-
允许 iptables 检查桥接流量
-
关闭防火墙
sudo systemctl disable --now ufw
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
安装 runtime( Docker 或者 Containerd)
Docker
curl -fsSL get.docker.com | bash
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://1m20yrx2.mirror.aliyuncs.com"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
sudo systemctl restart docker
Containerd
# 先决条件
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置必需的 sysctl 参数,这些参数在重新启动后仍然存在。
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
# 应用 sysctl 参数而无需重新启动
sudo sysctl --system
# 安装
# 安装依赖
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release
# 信任密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 添加仓库
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 安装containerd
sudo apt update
sudo apt install -y containerd.io
# 配置
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
#修改 /etc/containerd/config.toml
vim /etc/containerd/config.toml
# 修改 SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
...
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
# 重启服务
sudo systemctl restart containerd
# 启动containerd开机自启
systemctl enable --now containerd
# crictl 配置
# 之前使用 docker 的时候,docker 给我们做了很多好用的工具,现在用了 containerd,管理容器我们# 用 cri 管理工具 crictl,创建配置文件
vim /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
debug: false
安装 kubeadm、kubelet 和 kubectl
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
# 查看可用的版本号
sudo apt-cache madison kubeadm
sudo apt install -y kubeadm=1.21.10-00 kubelet=1.21.10-00 kubectl=1.21.10-00
# 锁定版本,不随 apt upgrade 更新
sudo apt-mark hold kubelet kubeadm kubectl
Nginx负载均衡
vim nginx.conf
在文件最后添加
PLAINTEXT
stream {
include stream.conf;
}
然后 vim /etc/nginx/stream.conf
PLAINTEXT
upstream k8s-apiserver {
server master1:6443;
server master2:6443;
server master3:6443;
}
server {
listen 6443;
proxy_connect_timeout 1s;
proxy_pass k8s-apiserver;
}
upstream ingress-http {
server 10.0.0.21:30080; # 这里需要更改成ingress的NodePort
server 10.0.0.22:30080; # 这里需要更改成ingress的NodePort
}
server {
listen 80;
proxy_connect_timeout 1s;
proxy_pass ingress-http;
}
upstream ingress-https {
server 10.0.0.21:30443; # 这里需要更改成ingress的NodePort
server 10.0.0.22:30443; # 这里需要更改成ingress的NodePort
}
server {
listen 443;
proxy_connect_timeout 1s;
proxy_pass ingress-https;
}
因为我们用 nginx 四层负载 ingress,需要监听 80 端口,与 nginx 默认的端口监听冲突,所以需要删除默认的配置文件
rm -f /etc/nginx/sites-enabled/default
创建集群
kubeadm init
拉取镜像到本地,注意镜像版本要与之前安装的一致
kubeadm config images pull --kubernetes-version 1.21.10 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
初始化集群
sudo kubeadm init \
--kubernetes-version 1.21.10 \
--control-plane-endpoint "kube-apiserver:6443" \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--upload-certs \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
非高可用初始化
sudo kubeadm init \
--kubernetes-version 1.25.3 \
--apiserver-advertise-address 192.168.200.20 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--upload-certs \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
使用阿里云镜像初始化失败
# 启动containerd开机自启
systemctl enable --now containerd
# 修改/etc/containerd/config.toml 文件
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
# 重启一下containerd
systemctl restart containerd.service
也可以用 kubeadm config print init-defaults > init.yaml
生成 kubeadm 的配置,并用
kubeadm init --config=init.yaml
来创建集群。
安装网络插件
flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
获取 join 命令,增加新的节点
Node
初始化后会输出在终端上,有效期 2 小时,超时后可以重新生成
kubeadm token create --print-join-command
master
-
生成证书,记录
certificate key
BASH kubeadm init phase upload-certs --upload-certs
-
获取加入命令
kubeadm token create --print-join-command
3.简化命令
echo "$(kubeadm token create --print-join-command) --control-plane --certificate-key $(kubeadm init phase upload-certs --upload-certs | tail -1)"
查看节点和运行情况
# 查看所有节点状态
kubectl get nodes
# 查看pod组件运行状态
kubectl get pods -A
移除节点
移除节点
kubectl drain worker2 --ignore-daemonsets
kubectl delete node worker2
如果是 master 节点还需要移除 etcd member
kubectl exec -it -n kube-system etcd-master1 -- /bin/sh
# 查看etcd member list
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
# 通过ID来删除etcd member
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 12637f5ec2bd02b8
常见问题
一个比较奇怪的初始化失败问题
kubeadm 有个坑的地方,使用 kubeadm image pull
可以事先把镜像拉取下来,但是后面 kubeadm init
会报错:
> journalctl -xeu kubelet -f
Jul 22 08:35:49 master1 kubelet[2079]: E0722 08:35:49.169395 2079 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.6\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.6\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.6\\\": dial tcp 142.250.157.82:443: connect: connection refused\"" pod="kube-system/etcd-master1" podUID=642dcd53ce8660a2287cd7eaabcd5fdc
这里我们已经提前拉取了镜像在本地了, 但是 init 的时候还是会从 gcr.io
拉取镜像,造成 init 失败,如果网络条件比较好的情况下则可以完成初始化。比较坑的地方就是哪怕你指定了阿里云的镜像源,init 的过程都会通过 gcr.io 拉取镜像。
这是 init 前
root@master1:~# crictl images
IMAGE TAG IMAGE ID SIZE
k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB
k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB
k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB
k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB
k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB
k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB
k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB
init 后
root@master1:~# crictl images
IMAGE TAG IMAGE ID SIZE
k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB
k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB
k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB
k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB
k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB
k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB
k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB
k8s.gcr.io/pause 3.6 6270bb605e12e 302kB # 增加了这个image
解决方案
手动拉取镜像
# Docker
docker pull registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/pause:3.6
# Containerd
crictl pull registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/pause:3.6