1、环境说明
1.1 软件及版本
全部软件均为当前最新版,且写本文时博主已经在公司进行商用验证,因此全部环境可用于生产环境。
- OS:ubuntu 14.04.4
- kernel:4.2
- etcd:3.0.9
- flannel:6.0
- kubernetes:1.3.5
- docker:1.12
1.2 主机规划
- master 和 minion:192.168.1.104 node01
- minion:192.168.1.107 node02
- minion:192.168.1.108 node03
测试环境我在hosts中填写了解析,生产环境使用dns进行解析。
root@node01:~# cat /etc/hosts
127.0.0.1 localhost# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.104 node01
192.168.1.107 node02
192.168.1.108 node03
1.3 前提条件
- master免密码访问所有节点(包含master自己,可以使用非root账户),具体实现办法,自行Google。当然如果这一步都搞不定,也就基本没必要尝试kubernetes了。
- 安装网桥:bridge-utils
2、安装etcd
kubernetes自带的etcd是单机安装的,而生产环境必须使用集群,因此这里直接介绍集群部署模式。
2.1 文件准备:
下载并拷贝到所有服务器:
root@node01:~# mkdir /opt/bin
root@node01:~# curl -L https://github.com/coreos/etcd/releases/download/v3.0.9/etcd-v3.0.9-linux-amd64.tar.gz -o etcd-v3.0.9-linux-amd64.tar.gz
root@node01:~# tar xzvf etcd-v3.0.9-linux-amd64.tar.gz
root@node01:~# cp etcd-v3.0.9-linux-amd64/etcd etcd-v3.0.9-linux-amd64/etcdctl /opt/bin/
查看版本:
root@node01:~# /opt/bin/etcd -version
etcd Version: 3.0.9
Git SHA: 494c012
Go Version: go1.6.3
Go OS/Arch: linux/amd64
拷贝到所有需要节点:
root@node01:~# ssh node02 -C "mkdir /opt/bin"
root@node01:~# ssh node03 -C "mkdir /opt/bin"root@node01:~# scp etcd-v3.0.9-linux-amd64/etcd etcd-v3.0.9-linux-amd64/etcdctl node02:/opt/bin
etcd 100% 19MB 19.2MB/s 00:01
etcdctl 100% 18MB 17.6MB/s 00:00
root@node01:~# scp etcd-v3.0.9-linux-amd64/etcd etcd-v3.0.9-linux-amd64/etcdctl node03:/opt/bin
etcd 100% 19MB 19.2MB/s 00:00
etcdctl 100% 18MB 17.6MB/s 00:01
建立环境变量,同样拷贝到所有节点:
root@node01:~# cat /etc/profile.d/k8s.sh
export PATH=$PATH:/opt/bin
export KUBERNETES_PROVIDER=ubuntu
root@node01:~# . /etc/profile.d/k8s.sh
root@node01:~# scp /etc/profile.d/k8s.sh node02:/etc/profile.d/k8s.sh
k8s.sh 100% 61 0.1KB/s 00:00
root@node01:~# scp /etc/profile.d/k8s.sh node03:/etc/profile.d/k8s.sh
k8s.sh 100% 61 0.1KB/s 00:00
2.2 启动etcd集群
2.2.1 第一次启动:
在node01上执行:
root@node01:~# nohup /opt/bin/etcd --name infra0 --data-dir /var/lib/etcd --initial-advertise-peer-urls http://192.168.1.104:2380 --listen-peer-urls http://192.168.1.104:2380 --listen-client-urls http://192.168.1.104:4001,http://127.0.0.1:4001 --advertise-client-urls http://192.168.1.104:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://192.168.1.104:2380,infra1=http://192.168.1.107:2380,infra2=http://192.168.1.108:2380 --initial-cluster-state new &
在node02上执行:
root@node02:~# nohup /opt/bin/etcd --name infra1 --data-dir /var/lib/etcd --initial-advertise-peer-urls http://192.168.1.107:2380 --listen-peer-urls http://192.168.1.107:2380 --listen-client-urls http://192.168.1.107:4001,http://127.0.0.1:4001 --advertise-client-urls http://192.168.1.107:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://192.168.1.104:2380,infra1=http://192.168.1.107:2380,infra2=http://192.168.1.108:2380 --initial-cluster-state new &
在node03上执行:
root@node03:~# nohup /opt/bin/etcd --name infra2 --data-dir /var/lib/etcd --initial-advertise-peer-urls http://192.168.1.108:2380 --listen-peer-urls http://192.168.1.108:2380 --listen-client-urls http://192.168.1.108:4001,http://127.0.0.1:4001 --advertise-client-urls http://192.168.1.108:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://192.168.1.104:2380,infra1=http://192.168.1.107:2380,infra2=http://192.168.1.108:2380 --initial-cluster-state new &
可以在node01上看下日志输出情况,可以看到集群已经正常启动,是不是非常简单呢:
root@node01:~# tail -f nohup.out |grep member
2016-09-21 00:02:04.856809 N | membership: updated the cluster version from 2.3 to 3.0
^C
root@node01:~# cat nohup.out |grep member
2016-09-21 00:01:15.680098 I | etcdserver: member dir = /var/lib/etcd/member
2016-09-21 00:01:15.683067 I | etcdserver: starting member 47c35f4a0d57afa4 in cluster 959b506b320ffe35
2016-09-21 00:01:15.709563 I | membership: added member 47c35f4a0d57afa4 [http://192.168.1.104:2380] to cluster 959b506b320ffe35
2016-09-21 00:01:15.709833 I | membership: added member b8d5a649fb72d51b [http://192.168.1.108:2380] to cluster 959b506b320ffe35
2016-09-21 00:01:15.709972 I | membership: added member bebba08eea6ea060 [http://192.168.1.107:2380] to cluster 959b506b320ffe35
2016-09-21 00:01:44.791273 W | etcdserver: failed to reach the peerURL(http://192.168.1.108:2380) of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:44.791352 W | etcdserver: cannot get the version of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:44.803034 N | membership: set the initial cluster version to 2.3
2016-09-21 00:01:48.797864 W | etcdserver: failed to reach the peerURL(http://192.168.1.108:2380) of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:48.797935 W | etcdserver: cannot get the version of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:52.810961 W | etcdserver: failed to reach the peerURL(http://192.168.1.108:2380) of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:52.811001 W | etcdserver: cannot get the version of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:56.827061 W | etcdserver: failed to reach the peerURL(http://192.168.1.108:2380) of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:01:56.827111 W | etcdserver: cannot get the version of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:02:00.829027 W | etcdserver: failed to reach the peerURL(http://192.168.1.108:2380) of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:02:00.829069 W | etcdserver: cannot get the version of member b8d5a649fb72d51b (Get http://192.168.1.108:2380/version: dial tcp 192.168.1.108:2380: getsockopt: connection refused)
2016-09-21 00:02:04.856809 N | membership: updated the cluster version from 2.3 to 3.0
2.2.2 以后启动:
第一次启动时,initial-cluster-state选项后面用的参数是new,而除了第一次,以后必须改为existing。为了实现开机启动,我直接写入rc.local中。当然也可以自己写个服务脚本实现自启动。
root@node01:~# cat /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
nohup /opt/bin/etcd --name infra0 --data-dir /var/lib/etcd --initial-advertise-peer-urls http://192.168.1.104:2380 --listen-peer-urls http://192.168.1.104:2380 --listen-client-urls http://192.168.1.104:4001,http://127.0.0.1:4001 --advertise-client-urls http://192.168.1.104:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://192.168.1.104:2380,infra1=http://192.168.1.107:2380,infra2=http://192.168.1.108:2380 --initial-cluster-state existing &exit 0
查看集群状态:
root@node01:~# etcdctl member list
47c35f4a0d57afa4: name=infra0 peerURLs=http://192.168.1.104:2380 clientURLs=http://192.168.1.104:4001 isLeader=true
b8d5a649fb72d51b: name=infra2 peerURLs=http://192.168.1.108:2380 clientURLs=http://192.168.1.108:4001 isLeader=false
bebba08eea6ea060: name=infra1 peerURLs=http://192.168.1.107:2380 clientURLs=http://192.168.1.107:4001 isLeader=falseroot@node01:~# etcdctl cluster-health
member 47c35f4a0d57afa4 is healthy: got healthy result from http://192.168.1.104:4001
member b8d5a649fb72d51b is healthy: got healthy result from http://192.168.1.108:4001
member bebba08eea6ea060 is healthy: got healthy result from http://192.168.1.107:4001
cluster is healthy
通过这两个命令,可以看出集群健康状态,并能知道那个节点是leader。
我们也可以直接put一个kv进去看看效果:
root@node01:~# curl -L -X PUT http://192.168.1.104:4001/v2/keys/test -d value="etcdiseasy"
{"action":"set","node":{"key":"/test","value":"etcdiseasy","modifiedIndex":9,"createdIndex":9}}
root@node01:~# curl -Ls http://192.168.1.104:4001/v2/keys/test
{"action":"get","node":{"key":"/test","value":"etcdiseasy","modifiedIndex":9,"createdIndex":9}}
3、安装kubernetes
3.1 准备安装文件:
克隆基本源码:
root@node01:/opt# git clone https://github.com/kubernetes/kubernetes.git
在安装过程中,安装脚本还会自动下载最新版Kubernetes,但下载过程很慢,因此我们手动准备Kubernetes最新发行版,最快的方式是用迅雷下载,下载好后再上传到服务器:
root@node01:/opt# wget https://github.com/kubernetes/kubernetes/releases/download/v1.3.5/kubernetes.tar.gz
准备flannel,我这里也下载最新的0.6版本(Kubernetes中默认为0.55)
root@node01:/opt# wget https://github.com/coreos/flannel/releases/download/v0.6.1/flannel-v0.6.1-linux-amd64.tar.gz
下载好后的文件清单:
root@node01:/opt# ls
bin flannel-v0.6.1-linux-amd64.tar.gz kubernetes kubernetes.tar.gz
3.2 修改安装脚本:
3.2.1 修改基础配置脚本:/opt/kubernetes/cluster/ubuntu/config-default.sh
好习惯,修改之前先备份:
root@node01:/opt/kubernetes/cluster/ubuntu# cp config-default.sh config-default.sh_ba
# 第一个需要修改的,要在哪些节点安装
export nodes=${nodes:-"root@192.168.1.104 root@192.168.1.107 root@192.168.1.108"}
# 节点角色:我的场景默认即可,a表示master,i表示minion
roles=${roles:-"ai i i"}
# 集群IP段:kubernetes service使用的IP,我这里设置为与服务器同网段
export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-192.168.1.0/24}
# docker启动选项,一般需要添加私有仓库地址:
DOCKER_OPTS=${DOCKER_OPTS:-""}
# dns配置:根据实际情况配置内网dns服务器地址,也可以后面使用kubernetes推荐的skydns
DNS_SERVER_IP=${DNS_SERVER_IP:-"192.168.1.253"}
DNS_DOMAIN=${DNS_DOMAIN:-"toxingwang.com"}
3.2.2 修改自动下载脚本:/opt/kubernetes/cluster/ubuntu/download-release.sh
好习惯,修改前先备份:
root@node01:/opt/kubernetes/cluster/ubuntu# cp download-release.sh download-release.sh
_bak
修改flannel 下载部分:
# flannel
FLANNEL_VERSION=${FLANNEL_VERSION:-"v0.6.1"}
echo "Prepare flannel ${FLANNEL_VERSION} release ..."
grep -q "^${FLANNEL_VERSION}\$" binaries/.flannel 2>/dev/null || {
# curl -L https://github.com/coreos/flannel/releases/download/v${FLANNEL_VERSION}/flannel-${FLANNEL_VERSION}-linux-amd64.tar.gz -o flannel.tar.gz
cp /opt/flannel-v0.6.1-linux-amd64.tar.gz flannel.tar.gz
tar xf flannel.tar.gz
cp flanneld binaries/master
cp flanneld binaries/minion
echo ${FLANNEL_VERSION} > binaries/.flannel
}
修改etcd下载部分,直接全部注释即可,因为我们已经启动了etcd集群:
# ectd
#ETCD_VERSION=${ETCD_VERSION:-"2.3.1"}
#ETCD="etcd-v${ETCD_VERSION}-linux-amd64"
#echo "Prepare etcd ${ETCD_VERSION} release ..."
#grep -q "^${ETCD_VERSION}\$" binaries/.etcd 2>/dev/null || {
# curl -L https://github.com/coreos/etcd/releases/download/v${ETCD_VERSION}/${ETCD}.tar.gz -o etcd.tar.gz
# tar xzf etcd.tar.gz
# cp ${ETCD}/etcd ${ETCD}/etcdctl binaries/master
# echo ${ETCD_VERSION} > binaries/.etcd
#}
修改kubernetes下载部分:注意,有两个地方需要修改,内容一样
# k8s
echo "Prepare kubernetes ${KUBE_VERSION} release ..."
grep -q "^${KUBE_VERSION}\$" binaries/.kubernetes 2>/dev/null || {
# curl -L https://github.com/kubernetes/kubernetes/releases/download/v${KUBE_VERSION}/kubernetes.tar.gz -o kubernetes.tar.gz
if [ ! -f kubernetes.tar.gz ] ; then
cp /opt/kubernetes.tar.gz kubernetes.tar.gz
tar xzf kubernetes.tar.gz
fi
pushd kubernetes/server
tar xzf kubernetes-server-linux-amd64.tar.gz
popd
cp kubernetes/server/kubernetes/server/bin/kube-apiserver \
kubernetes/server/kubernetes/server/bin/kube-controller-manager \
kubernetes/server/kubernetes/server/bin/kube-scheduler binaries/master
cp kubernetes/server/kubernetes/server/bin/kubelet \
kubernetes/server/kubernetes/server/bin/kube-proxy binaries/minion
cp kubernetes/server/kubernetes/server/bin/kubectl binaries/
echo ${KUBE_VERSION} > binaries/.kubernetes
}
3.2.3 修改etcd地址配置,并进行etcd自动安装:/opt/kubernetes/cluster/ubuntu/util.sh
修改默认的etcd连接地址,用vim打开文件,批量替换即可:
# vim的命令模式下,执行批量替换,两个命令如下:
:%s#127.0.0.1:4001#192.168.1.104:4001,http://192.168.1.107:4001,http://192.168.1.108:4001#g
:%s#${1}:4001#192.168.1.104:4001,http://192.168.1.107:4001,http://192.168.1.108:4001#g
禁用etcd安装:删除有关etcd启动相关的行即可(多数是多行命令,注释会导致问题),如果怕修改错,直接点击下载我修改好的吧,具体如下:
475行 删除
495行 删除(原496行,特别注意别删除错误了)
647行 删除 (原647行,不要删错)
681行 删除
721至727 行 删除
801行及下面包含etcd的行 删除
915至921行 删除
这部分的目的仅仅是为了不让安装程序自动安装etcd而已,多找找,否则安装过程报错后处理比较麻烦。但也别删除多了,这部分需要一定shell编程基础,要能看懂脚本内容。如果后继导致错误,那就省略掉前面的etcd安装过程,采用默认的etcd,先用单机版测试吧。
3.3 执行安装:
安装脚本:/opt/kubernetes/cluster/kube-up.sh
清理脚本:/opt/kubernetes/cluster/kube-down.sh
安装过程仍然可能出现各种不可预知的情况,需要根据实际提示进行处理。有时候只需再次执行一次就OK了。
也可以用bash -x /opt/kubernetes/cluster/kube-up.sh 观察安装脚本执行过程。
安装完成后,执行如下命令拷贝kubernetes管理命令:
root@node01:~# cp /opt/kubernetes/cluster/ubuntu/binaries/kubectl /opt/bin/
3.4 安装过程中的几个坑:
- 安装卡在如下地方,超时后结束:
{"Network":"172.16.0.0/16", "Backend": {"Type": "vxlan"}}
解决办法:
service flanneld restart
service flanneld stop
然后再次运行安装脚本
- 出现如下这样的提示:
(kubectl failed, will retry 1 times)
The connection to the server 192.168.1.104:8080 was refused - did you specify the right host or port?
解决办法:手动启动kubernetes master服务:
service kube-apiserver start
service kube-scheduler start
service kube-controller-manager start
- 有节点一直处于未准备状态:
('kubectl get nodes' failed, giving up)
Waiting for 3 ready nodes. 2 ready nodes, 2 registered. Retrying.
解决办法:检查那个节点的kubelet和kube-proxy服务没有启动,手动启动即可。
4、验证kubernetes集群
- 查看节点状态:
root@node01:/opt/bin# kubectl get node
NAME STATUS AGE
192.168.1.104 Ready 8m
192.168.1.107 Ready 10m
192.168.1.108 Ready 10m
- 查看所有默认namaspace中的RC、pod和svc信息:
root@node01:~# kubectl get all
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 192.168.1.1 <none> 443/TCP 12m