安装rancher2.6.5 docker 单节点版

准备镜像仓库

必须使用https，而且ssl证书必须有SAN（Subject Alternative Names），不然会报错x509: certificate relies on legacy Common Name field，和go version > 1.15有关，高版本弃用CN(CommonName) 字段

需要创建一个新的有效证书以包含subjectAltName属性，并且应该在使用 openssl 命令创建 SSL 自签名证书时通过指定-addext标志直接添加

openssl req -x509 -sha256 -nodes -days 36500 -newkey rsa:2048 -keyout harbor.key -out harbor.crt -subj "/CN=harbor.xxx.cn" -addext "subjectAltName = DNS:harbor.xxx.cn"

需要到的镜像

rancher/rancher:v2.6.5
rancher/shell:v0.1.16
rancher/rancher-webhook:v0.2.5
rancher/fleet:v0.3.9
rancher/gitjob:v0.1.26
rancher/fleet-agent:v0.3.9
rancher/rke-tools:v0.1.80
rancher/hyperkube:v1.23.6-rancher1
rancher/mirrored-coreos-etcd:v3.5.3
rancher/mirrored-pause:3.6
rancher/mirrored-calico-cni:v3.22.0
rancher/mirrored-calico-pod2daemon-flexvol:v3.22.0
rancher/kube-api-auth:v0.1.8
rancher/mirrored-calico-node:v3.22.0
rancher/mirrored-flannelcni-flannel:v0.17.0
rancher/mirrored-cluster-proportional-autoscaler:1.8.5
rancher/mirrored-metrics-server:v0.6.1
rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
rancher/mirrored-coredns-coredns:1.9.0
rancher/mirrored-calico-kube-controllers:v3.22.0
rancher/nginx-ingress-controller:nginx-1.2.0-rancher1
rancher/rancher-agent:v2.6.5

# 获取某个版本的镜像
rke config --system-images --all

docker 命令

生成的证书，*key, *crt都放到/root/harbor/cert下面，然后映射到容器的/container/certs目录

docker run -d --name rancher2.6.5 --restart=unless-stopped -e CATTLE_SYSTEM_CATALOG=bundled -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs -p 3280:80 -p 3443:443 --privileged  --add-host harbor.xx.cn:10.xx.xx.205 rancher/rancher:v2.6.5

主要命令

docker run -d --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  --privileged \
  rancher/rancher:latest

参数详解

加载system-charts,其实默认已经在rancher镜像里面，这个变量告诉 Rancher 使用本地的，而不是尝试从 GitHub 获取它们。

-e CATTLE_SYSTEM_CATALOG=bundled

Custom CA Root Certificates，参考这里面的 docker配置，这里配置Rancher 需要访问的服务需要用的自签名证书，不然会报错x509: certificate signed by unknown authority

-e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs

配置私有仓库

根据这个Private Registry Configuration, 进到容器里面配置

docker exec -it rancher2.6.5 bash

vim /etc/rancher/k3s/registries.yaml

registries.yaml

mirrors:
  "docker.io":
    endpoint:
      - "https://harbor.xxx.cn:4443"
configs:
  "docker.io":
    auth:
      username: admin
      password: Harbor12345
    tls:
      key_file: /container/certs/harbor.key
      cert_file: /container/certs/harbor.crt
      #ca_file: /container/certs/ca.crt
      insecure_skip_verify: true

重启容器

1	`docker restart rancher2.6.5`

重新进入容器，然后配置hosts，不然使用域名解析不了，不是配置coredns

1	`echo "xxx.xxx.xxx.205 harbor.xxx.cn" >> c`

重启后，要等containd启动，检查containd更新的配置

1	`cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml`

测试拉镜像

1	`crictl pull rancher/shell:v0.1.16`

配置system-default-registry

高可用rancher

安装k3s

https://github.com/k3s-io/k3s/releases/tag/v1.23.6%2Bk3s1 里面的**k3s-images.txt **可以看k3s需要的镜像

使用v1.23.6+k3s1创建集群

K3S_TOKEN='ccd0f7ee6cf2f50f8563e434767b6488' INSTALL_K3S_SKIP_DOWNLOAD=true INSTALL_K3S_EXEC='server  --tls-san [IP] --node-external-ip [IP] --docker --cluster-init' INSTALL_K3S_VERSION='v1.23.6+k3s1' ./install.sh

install.sh

#!/bin/sh
set -e
set -o noglob

# Usage:
#   curl ... | ENV_VAR=... sh -
#       or
#   ENV_VAR=... ./install.sh
#
# Example:
#   Installing a server without traefik:
#     curl ... | INSTALL_K3S_EXEC="--disable=traefik" sh -
#   Installing an agent to point at a server:
#     curl ... | K3S_TOKEN=xxx K3S_URL=https://server-url:6443 sh -
#
# Environment variables:
#   - K3S_*
#     Environment variables which begin with K3S_ will be preserved for the
#     systemd service to use. Setting K3S_URL without explicitly setting
#     a systemd exec command will default the command to "agent", and we
#     enforce that K3S_TOKEN or K3S_CLUSTER_SECRET is also set.
#
#   - INSTALL_K3S_SKIP_DOWNLOAD
#     If set to true will not download k3s hash or binary.
#
#   - INSTALL_K3S_FORCE_RESTART
#     If set to true will always restart the K3s service
#
#   - INSTALL_K3S_SYMLINK
#     If set to 'skip' will not create symlinks, 'force' will overwrite,
#     default will symlink if command does not exist in path.
#
#   - INSTALL_K3S_SKIP_ENABLE
#     If set to true will not enable or start k3s service.
#
#   - INSTALL_K3S_SKIP_START
#     If set to true will not start k3s service.
#
#   - INSTALL_K3S_VERSION
#     Version of k3s to download from github. Will attempt to download from the
#     stable channel if not specified.
#
#   - INSTALL_K3S_COMMIT
#     Commit of k3s to download from temporary cloud storage.
#     * (for developer & QA use)
#
#   - INSTALL_K3S_BIN_DIR
#     Directory to install k3s binary, links, and uninstall script to, or use
#     /usr/local/bin as the default
#
#   - INSTALL_K3S_BIN_DIR_READ_ONLY
#     If set to true will not write files to INSTALL_K3S_BIN_DIR, forces
#     setting INSTALL_K3S_SKIP_DOWNLOAD=true
#
#   - INSTALL_K3S_SYSTEMD_DIR
#     Directory to install systemd service and environment files to, or use
#     /etc/systemd/system as the default
#
#   - INSTALL_K3S_EXEC or script arguments
#     Command with flags to use for launching k3s in the systemd service, if
#     the command is not specified will default to "agent" if K3S_URL is set
#     or "server" if not. The final systemd command resolves to a combination
#     of EXEC and script args ($@).
#
#     The following commands result in the same behavior:
#       curl ... | INSTALL_K3S_EXEC="--disable=traefik" sh -s -
#       curl ... | INSTALL_K3S_EXEC="server --disable=traefik" sh -s -
#       curl ... | INSTALL_K3S_EXEC="server" sh -s - --disable=traefik
#       curl ... | sh -s - server --disable=traefik
#       curl ... | sh -s - --disable=traefik
#
#   - INSTALL_K3S_NAME
#     Name of systemd service to create, will default from the k3s exec command
#     if not specified. If specified the name will be prefixed with 'k3s-'.
#
#   - INSTALL_K3S_TYPE
#     Type of systemd service to create, will default from the k3s exec command
#     if not specified.
#
#   - INSTALL_K3S_SELINUX_WARN
#     If set to true will continue if k3s-selinux policy is not found.
#
#   - INSTALL_K3S_SKIP_SELINUX_RPM
#     If set to true will skip automatic installation of the k3s RPM.
#
#   - INSTALL_K3S_CHANNEL_URL
#     Channel URL for fetching k3s download URL.
#     Defaults to 'https://update.k3s.io/v1-release/channels'.
#
#   - INSTALL_K3S_CHANNEL
#     Channel to use for fetching k3s download URL.
#     Defaults to 'stable'.

GITHUB_URL=https://github.com/k3s-io/k3s/releases
STORAGE_URL=https://storage.googleapis.com/k3s-ci-builds
DOWNLOADER=
BIN_URL="${MINIO_BASE_URL}/agent/k3s/${INSTALL_K3S_VERSION}/k3s"

# --- helper functions for logs ---
info()
{
    echo '[INFO] ' "$@"
}
warn()
{
    echo '[WARN] ' "$@" >&2
}
fatal()
{
    echo '[ERROR] ' "$@" >&2
    exit 1
}

# --- fatal if no systemd or openrc ---
verify_system() {
    if [ -x /sbin/openrc-run ]; then
        HAS_OPENRC=true
        return
    fi
    if [ -x /bin/systemctl ] || type systemctl > /dev/null 2>&1; then
        HAS_SYSTEMD=true
        return
    fi
    fatal 'Can not find systemd or openrc to use as a process supervisor for k3s'
}

# --- add quotes to command arguments ---
quote() {
    for arg in "$@"; do
        printf '%s\n' "$arg" | sed "s/'/'\\''/g;1s/^/'/;\$s/\$/'/"
    done
}

# --- add indentation and trailing slash to quoted args ---
quote_indent() {
    printf ' \\n'
    for arg in "$@"; do
        printf '\t%s \\n' "$(quote "$arg")"
    done
}

# --- escape most punctuation characters, except quotes, forward slash, and space ---
escape() {
    printf '%s' "$@" | sed -e 's/\([][!#$%&()*;<=>?\_`{|}]\)/\/g;'
}

# --- escape double quotes ---
escape_dq() {
    printf '%s' "$@" | sed -e 's/"/\"/g'
}

# --- ensures $K3S_URL is empty or begins with https://, exiting fatally otherwise ---
verify_k3s_url() {
    case "${K3S_URL}" in
        "")
            ;;
        https://*)
            ;;
        *)
            fatal "Only https:// URLs are supported for K3S_URL (have ${K3S_URL})"
            ;;
    esac
}

# --- define needed environment variables ---
setup_env() {
    # --- use command args if passed or create default ---
    case "$1" in
        # --- if we only have flags discover if command should be server or agent ---
        (-*|"")
            if [ -z "${K3S_URL}" ]; then
                CMD_K3S=server
            else
                if [ -z "${K3S_TOKEN}" ] && [ -z "${K3S_TOKEN_FILE}" ] && [ -z "${K3S_CLUSTER_SECRET}" ]; then
                    fatal "Defaulted k3s exec command to 'agent' because K3S_URL is defined, but K3S_TOKEN, K3S_TOKEN_FILE or K3S_CLUSTER_SECRET is not defined."
                fi
                CMD_K3S=agent
            fi
        ;;
        # --- command is provided ---
        (*)
            CMD_K3S=$1
            shift
        ;;
    esac

    verify_k3s_url

    CMD_K3S_EXEC="${CMD_K3S}$(quote_indent "$@")"

    # --- use systemd name if defined or create default ---
    if [ -n "${INSTALL_K3S_NAME}" ]; then
        SYSTEM_NAME=k3s-${INSTALL_K3S_NAME}
    else
        if [ "${CMD_K3S}" = server ]; then
            SYSTEM_NAME=k3s
        else
            SYSTEM_NAME=k3s-${CMD_K3S}
        fi
    fi

    # --- check for invalid characters in system name ---
    valid_chars=$(printf '%s' "${SYSTEM_NAME}" | sed -e 's/[][!#$%&()*;<=>?\_`{|}/[:space:]]/^/g;' )
    if [ "${SYSTEM_NAME}" != "${valid_chars}"  ]; then
        invalid_chars=$(printf '%s' "${valid_chars}" | sed -e 's/[^^]/ /g')
        fatal "Invalid characters for system name:
            ${SYSTEM_NAME}
            ${invalid_chars}"
    fi

    # --- use sudo if we are not already root ---
    SUDO=sudo
    if [ $(id -u) -eq 0 ]; then
        SUDO=
    fi

    # --- use systemd type if defined or create default ---
    if [ -n "${INSTALL_K3S_TYPE}" ]; then
        SYSTEMD_TYPE=${INSTALL_K3S_TYPE}
    else
        if [ "${CMD_K3S}" = server ]; then
            SYSTEMD_TYPE=notify
        else
            SYSTEMD_TYPE=exec
        fi
    fi

    # --- use binary install directory if defined or create default ---
    if [ -n "${INSTALL_K3S_BIN_DIR}" ]; then
        BIN_DIR=${INSTALL_K3S_BIN_DIR}
    else
        # --- use /usr/local/bin if root can write to it, otherwise use /opt/bin if it exists
        BIN_DIR=/usr/local/bin
        if ! $SUDO sh -c "touch ${BIN_DIR}/k3s-ro-test && rm -rf ${BIN_DIR}/k3s-ro-test"; then
            if [ -d /opt/bin ]; then
                BIN_DIR=/opt/bin
            fi
        fi
    fi

    # --- use systemd directory if defined or create default ---
    if [ -n "${INSTALL_K3S_SYSTEMD_DIR}" ]; then
        SYSTEMD_DIR="${INSTALL_K3S_SYSTEMD_DIR}"
    else
        SYSTEMD_DIR=/etc/systemd/system
    fi

    # --- set related files from system name ---
    SERVICE_K3S=${SYSTEM_NAME}.service
    UNINSTALL_K3S_SH=${UNINSTALL_K3S_SH:-${BIN_DIR}/${SYSTEM_NAME}-uninstall.sh}
    KILLALL_K3S_SH=${KILLALL_K3S_SH:-${BIN_DIR}/k3s-killall.sh}

    # --- use service or environment location depending on systemd/openrc ---
    if [ "${HAS_SYSTEMD}" = true ]; then
        FILE_K3S_SERVICE=${SYSTEMD_DIR}/${SERVICE_K3S}
        FILE_K3S_ENV=${SYSTEMD_DIR}/${SERVICE_K3S}.env
    elif [ "${HAS_OPENRC}" = true ]; then
        $SUDO mkdir -p /etc/rancher/k3s
        FILE_K3S_SERVICE=/etc/init.d/${SYSTEM_NAME}
        FILE_K3S_ENV=/etc/rancher/k3s/${SYSTEM_NAME}.env
    fi

    # --- get hash of config & exec for currently installed k3s ---
    PRE_INSTALL_HASHES=$(get_installed_hashes)

    # --- if bin directory is read only skip download ---
    if [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ]; then
        INSTALL_K3S_SKIP_DOWNLOAD=true
    fi

    # --- setup channel values
    INSTALL_K3S_CHANNEL_URL=${INSTALL_K3S_CHANNEL_URL:-'https://update.k3s.io/v1-release/channels'}
    INSTALL_K3S_CHANNEL=${INSTALL_K3S_CHANNEL:-'stable'}
}

# --- check if skip download environment variable set ---
can_skip_download() {
    if [ "${INSTALL_K3S_SKIP_DOWNLOAD}" != true ]; then
        return 1
    fi
}

# --- verify an executable k3s binary is installed ---
verify_k3s_is_executable() {
    if [ ! -x ${BIN_DIR}/k3s ]; then
        fatal "Executable k3s binary not found at ${BIN_DIR}/k3s"
    fi
}

# --- set arch and suffix, fatal if architecture not supported ---
setup_verify_arch() {
    if [ -z "$ARCH" ]; then
        ARCH=$(uname -m)
    fi
    case $ARCH in
        amd64)
            ARCH=amd64
            SUFFIX=
            ;;
        x86_64)
            ARCH=amd64
            SUFFIX=
            ;;
        arm64)
            ARCH=arm64
            SUFFIX=-${ARCH}
            ;;
        s390x)
            ARCH=s390x
            SUFFIX=-${ARCH}
            ;;
        aarch64)
            ARCH=arm64
            SUFFIX=-${ARCH}
            ;;
        arm*)
            ARCH=arm
            SUFFIX=-${ARCH}hf
            ;;
        *)
            fatal "Unsupported architecture $ARCH"
    esac
}

# --- verify existence of network downloader executable ---
verify_downloader() {
    # Return failure if it doesn't exist or is no executable
    [ -x "$(command -v $1)" ] || return 1

    # Set verified executable as our downloader program and return success
    DOWNLOADER=$1
    return 0
}

# --- create temporary directory and cleanup when done ---
setup_tmp() {
    TMP_DIR=$(mktemp -d -t k3s-install.XXXXXXXXXX)
    TMP_HASH=${TMP_DIR}/k3s.hash
    TMP_BIN=${TMP_DIR}/k3s.bin
    cleanup() {
        code=$?
        set +e
        trap - EXIT
        rm -rf ${TMP_DIR}
        exit $code
    }
    trap cleanup INT EXIT
}

# --- use desired k3s version if defined or find version from channel ---
get_release_version() {
    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
        VERSION_K3S="commit ${INSTALL_K3S_COMMIT}"
    elif [ -n "${INSTALL_K3S_VERSION}" ]; then
        VERSION_K3S=${INSTALL_K3S_VERSION}
    else
        info "Finding release for channel ${INSTALL_K3S_CHANNEL}"
        version_url="${INSTALL_K3S_CHANNEL_URL}/${INSTALL_K3S_CHANNEL}"
        case $DOWNLOADER in
            curl)
                VERSION_K3S=$(curl -w '%{url_effective}' -L -s -S ${version_url} -o /dev/null | sed -e 's|.*/||')
                ;;
            wget)
                VERSION_K3S=$(wget -SqO /dev/null ${version_url} 2>&1 | grep -i Location | sed -e 's|.*/||')
                ;;
            *)
                fatal "Incorrect downloader executable '$DOWNLOADER'"
                ;;
        esac
    fi
    info "Using ${VERSION_K3S} as release"
}

# --- download from github url ---
download() {
    [ $# -eq 2 ] || fatal 'download needs exactly 2 arguments'

    case $DOWNLOADER in
        curl)
            curl -o $1 -sfL $2
            ;;
        wget)
            wget -qO $1 $2
            ;;
        *)
            fatal "Incorrect executable '$DOWNLOADER'"
            ;;
    esac

    # Abort if download command failed
    [ $? -eq 0 ] || fatal 'Download failed'
}

# --- download hash from github url ---
download_hash() {
    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
        HASH_URL=${STORAGE_URL}/k3s${SUFFIX}-${INSTALL_K3S_COMMIT}.sha256sum
    else
        HASH_URL=${GITHUB_URL}/download/${VERSION_K3S}/sha256sum-${ARCH}.txt
    fi
    info "Downloading hash ${HASH_URL}"
    download ${TMP_HASH} ${HASH_URL}
    HASH_EXPECTED=$(grep " k3s${SUFFIX}$" ${TMP_HASH})
    HASH_EXPECTED=${HASH_EXPECTED%%[[:blank:]]*}
}

# --- check hash against installed version ---
installed_hash_matches() {
    if [ -x ${BIN_DIR}/k3s ]; then
        HASH_INSTALLED=$(sha256sum ${BIN_DIR}/k3s)
        HASH_INSTALLED=${HASH_INSTALLED%%[[:blank:]]*}
        if [ "${HASH_EXPECTED}" = "${HASH_INSTALLED}" ]; then
            return
        fi
    fi
    return 1
}

# --- download binary from github url ---
download_binary() {
#    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
#        BIN_URL=${STORAGE_URL}/k3s${SUFFIX}-${INSTALL_K3S_COMMIT}
#    else
#        BIN_URL=${GITHUB_URL}/download/${VERSION_K3S}/k3s${SUFFIX}
#    fi
    info "Downloading binary ${BIN_URL}"
    download ${TMP_BIN} ${BIN_URL}
}

# --- verify downloaded binary hash ---
verify_binary() {
    info "Verifying binary download"
    HASH_BIN=$(sha256sum ${TMP_BIN})
    HASH_BIN=${HASH_BIN%%[[:blank:]]*}
    if [ "${HASH_EXPECTED}" != "${HASH_BIN}" ]; then
        fatal "Download sha256 does not match ${HASH_EXPECTED}, got ${HASH_BIN}"
    fi
}

# --- setup permissions and move binary to system directory ---
setup_binary() {
    chmod 755 ${TMP_BIN}
    info "Installing k3s to ${BIN_DIR}/k3s"
    $SUDO chown root:root ${TMP_BIN}
    $SUDO mv -f ${TMP_BIN} ${BIN_DIR}/k3s
}

# --- setup selinux policy ---
setup_selinux() {
    case ${INSTALL_K3S_CHANNEL} in
        *testing)
            rpm_channel=testing
            ;;
        *latest)
            rpm_channel=latest
            ;;
        *)
            rpm_channel=stable
            ;;
    esac

    rpm_site="rpm.rancher.io"
    if [ "${rpm_channel}" = "testing" ]; then
        rpm_site="rpm-testing.rancher.io"
    fi

    [ -r /etc/os-release ] && . /etc/os-release
    if [ "${ID_LIKE%%[ ]*}" = "suse" ]; then
        rpm_target=sle
        rpm_site_infix=microos
        package_installer=zypper
    elif [ "${VERSION_ID%%.*}" = "7" ]; then
        rpm_target=el7
        rpm_site_infix=centos/7
        package_installer=yum
    else
        rpm_target=el8
        rpm_site_infix=centos/8
        package_installer=yum
    fi

    if [ "${package_installer}" = "yum" ] && [ -x /usr/bin/dnf ]; then
        package_installer=dnf
    fi

    policy_hint="please install:
    ${package_installer} install -y container-selinux
    ${package_installer} install -y https://${rpm_site}/k3s/${rpm_channel}/common/${rpm_site_infix}/noarch/k3s-selinux-0.4-1.${rpm_target}.noarch.rpm
"

    if [ "$INSTALL_K3S_SKIP_SELINUX_RPM" = true ] || can_skip_download || [ ! -d /usr/share/selinux ]; then
        info "Skipping installation of SELinux RPM"
    elif  [ "${ID_LIKE:-}" != coreos ] && [ "${VARIANT_ID:-}" != coreos ]; then
        install_selinux_rpm ${rpm_site} ${rpm_channel} ${rpm_target} ${rpm_site_infix}
    fi

    policy_error=fatal
    if [ "$INSTALL_K3S_SELINUX_WARN" = true ] || [ "${ID_LIKE:-}" = coreos ] || [ "${VARIANT_ID:-}" = coreos ]; then
        policy_error=warn
    fi

    if ! $SUDO chcon -u system_u -r object_r -t container_runtime_exec_t ${BIN_DIR}/k3s >/dev/null 2>&1; then
        if $SUDO grep '^\s*SELINUX=enforcing' /etc/selinux/config >/dev/null 2>&1; then
            $policy_error "Failed to apply container_runtime_exec_t to ${BIN_DIR}/k3s, ${policy_hint}"
        fi
    elif [ ! -f /usr/share/selinux/packages/k3s.pp ]; then
        if [ -x /usr/sbin/transactional-update ]; then
            warn "Please reboot your machine to activate the changes and avoid data loss."
        else
            $policy_error "Failed to find the k3s-selinux policy, ${policy_hint}"
        fi
    fi
}

install_selinux_rpm() {
    if [ -r /etc/redhat-release ] || [ -r /etc/centos-release ] || [ -r /etc/oracle-release ] || [ "${ID_LIKE%%[ ]*}" = "suse" ]; then
        repodir=/etc/yum.repos.d
        if [ -d /etc/zypp/repos.d ]; then
            repodir=/etc/zypp/repos.d
        fi
        set +o noglob
        $SUDO rm -f ${repodir}/rancher-k3s-common*.repo
        set -o noglob
        if [ -r /etc/redhat-release ] && [ "${3}" = "el7" ]; then
            $SUDO yum install -y yum-utils
            $SUDO yum-config-manager --enable rhel-7-server-extras-rpms
        fi
        $SUDO tee ${repodir}/rancher-k3s-common.repo >/dev/null << EOF
[rancher-k3s-common-${2}]
name=Rancher K3s Common (${2})
baseurl=https://${1}/k3s/${2}/common/${4}/noarch
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://${1}/public.key
EOF
        case ${3} in
        sle)
            rpm_installer="zypper --gpg-auto-import-keys"
            if [ "${TRANSACTIONAL_UPDATE=false}" != "true" ] && [ -x /usr/sbin/transactional-update ]; then
                rpm_installer="transactional-update --no-selfupdate -d run ${rpm_installer}"
                : "${INSTALL_K3S_SKIP_START:=true}"
            fi
            ;;
        *)
            rpm_installer="yum"
            ;;
        esac
        if [ "${rpm_installer}" = "yum" ] && [ -x /usr/bin/dnf ]; then
            rpm_installer=dnf
        fi
        # shellcheck disable=SC2086
        $SUDO ${rpm_installer} install -y "k3s-selinux"
    fi
    return
}

# --- download and verify k3s ---
download_and_verify() {
    if can_skip_download; then
       info 'Skipping k3s download and verify'
       verify_k3s_is_executable
       return
    fi

    setup_verify_arch
    verify_downloader curl || verify_downloader wget || fatal 'Can not find curl or wget for downloading files'
    setup_tmp
#    get_release_version
#    download_hash

    if installed_hash_matches; then
        info 'Skipping binary downloaded, installed k3s matches hash'
        return
    fi

    download_binary
#    verify_binary
    setup_binary
}

# --- add additional utility links ---
create_symlinks() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    [ "${INSTALL_K3S_SYMLINK}" = skip ] && return

    for cmd in kubectl crictl ctr; do
        if [ ! -e ${BIN_DIR}/${cmd} ] || [ "${INSTALL_K3S_SYMLINK}" = force ]; then
            which_cmd=$(command -v ${cmd} 2>/dev/null || true)
            if [ -z "${which_cmd}" ] || [ "${INSTALL_K3S_SYMLINK}" = force ]; then
                info "Creating ${BIN_DIR}/${cmd} symlink to k3s"
                $SUDO ln -sf k3s ${BIN_DIR}/${cmd}
            else
                info "Skipping ${BIN_DIR}/${cmd} symlink to k3s, command exists in PATH at ${which_cmd}"
            fi
        else
            info "Skipping ${BIN_DIR}/${cmd} symlink to k3s, already exists"
        fi
    done
}

# --- create killall script ---
create_killall() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    info "Creating killall script ${KILLALL_K3S_SH}"
    $SUDO tee ${KILLALL_K3S_SH} >/dev/null << \EOF
#!/bin/sh
[ $(id -u) -eq 0 ] || exec sudo $0 $@

for bin in /var/lib/rancher/k3s/data/**/bin/; do
    [ -d $bin ] && export PATH=$PATH:$bin:$bin/aux
done

set -x

for service in /etc/systemd/system/k3s*.service; do
    [ -s $service ] && systemctl stop $(basename $service)
done

for service in /etc/init.d/k3s*; do
    [ -x $service ] && $service stop
done

pschildren() {
    ps -e -o ppid= -o pid= | \
    sed -e 's/^\s*//g; s/\s\s*/\t/g;' | \
    grep -w "^$1" | \
    cut -f2
}

pstree() {
    for pid in $@; do
        echo $pid
        for child in $(pschildren $pid); do
            pstree $child
        done
    done
}

killtree() {
    kill -9 $(
        { set +x; } 2>/dev/null;
        pstree $@;
        set -x;
    ) 2>/dev/null
}

getshims() {
    ps -e -o pid= -o args= | sed -e 's/^ *//; s/\s\s*/\t/;' | grep -w 'k3s/data/[^/]*/bin/containerd-shim' | cut -f1
}

killtree $({ set +x; } 2>/dev/null; getshims; set -x)

do_unmount_and_remove() {
    set +x
    while read -r _ path _; do
        case "$path" in $1*) echo "$path" ;; esac
    done < /proc/self/mounts | sort -r | xargs -r -t -n 1 sh -c 'umount "$0" && rm -rf "$0"'
    set -x
}

do_unmount_and_remove '/run/k3s'
do_unmount_and_remove '/var/lib/rancher/k3s'
do_unmount_and_remove '/var/lib/kubelet/pods'
do_unmount_and_remove '/var/lib/kubelet/plugins'
do_unmount_and_remove '/run/netns/cni-'

# Remove CNI namespaces
ip netns show 2>/dev/null | grep cni- | xargs -r -t -n 1 ip netns delete

# Delete network interface(s) that match 'master cni0'
ip link show 2>/dev/null | grep 'master cni0' | while read ignore iface ignore; do
    iface=${iface%%@*}
    [ -z "$iface" ] || ip link delete $iface
done
ip link delete cni0
ip link delete flannel.1
ip link delete flannel-v6.1
ip link delete kube-ipvs0
rm -rf /var/lib/cni/
iptables-save | grep -v KUBE- | grep -v CNI- | grep -v flannel | iptables-restore
ip6tables-save | grep -v KUBE- | grep -v CNI- | grep -v flannel | ip6tables-restore
EOF
    $SUDO chmod 755 ${KILLALL_K3S_SH}
    $SUDO chown root:root ${KILLALL_K3S_SH}
}

# --- create uninstall script ---
create_uninstall() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    info "Creating uninstall script ${UNINSTALL_K3S_SH}"
    $SUDO tee ${UNINSTALL_K3S_SH} >/dev/null << EOF
#!/bin/sh
set -x
[ \$(id -u) -eq 0 ] || exec sudo \$0 \$@

${KILLALL_K3S_SH}

if command -v systemctl; then
    systemctl disable ${SYSTEM_NAME}
    systemctl reset-failed ${SYSTEM_NAME}
    systemctl daemon-reload
fi
if command -v rc-update; then
    rc-update delete ${SYSTEM_NAME} default
fi

rm -f ${FILE_K3S_SERVICE}
rm -f ${FILE_K3S_ENV}

remove_uninstall() {
    rm -f ${UNINSTALL_K3S_SH}
}
trap remove_uninstall EXIT

if (ls ${SYSTEMD_DIR}/k3s*.service || ls /etc/init.d/k3s*) >/dev/null 2>&1; then
    set +x; echo 'Additional k3s services installed, skipping uninstall of k3s'; set -x
    exit
fi

for cmd in kubectl crictl ctr; do
    if [ -L ${BIN_DIR}/\$cmd ]; then
        rm -f ${BIN_DIR}/\$cmd
    fi
done

rm -rf /etc/rancher/k3s
rm -rf /run/k3s
rm -rf /run/flannel
rm -rf /var/lib/rancher/k3s
rm -rf /var/lib/kubelet
rm -f ${BIN_DIR}/k3s
rm -f ${KILLALL_K3S_SH}

if type yum >/dev/null 2>&1; then
    yum remove -y k3s-selinux
    rm -f /etc/yum.repos.d/rancher-k3s-common*.repo
elif type zypper >/dev/null 2>&1; then
    uninstall_cmd="zypper remove -y k3s-selinux"
    if [ "\${TRANSACTIONAL_UPDATE=false}" != "true" ] && [ -x /usr/sbin/transactional-update ]; then
        uninstall_cmd="transactional-update --no-selfupdate -d run \$uninstall_cmd"
    fi
    \$uninstall_cmd
    rm -f /etc/zypp/repos.d/rancher-k3s-common*.repo
fi
EOF
    $SUDO chmod 755 ${UNINSTALL_K3S_SH}
    $SUDO chown root:root ${UNINSTALL_K3S_SH}
}

# --- disable current service if loaded --
systemd_disable() {
    $SUDO systemctl disable ${SYSTEM_NAME} >/dev/null 2>&1 || true
    $SUDO rm -f /etc/systemd/system/${SERVICE_K3S} || true
    $SUDO rm -f /etc/systemd/system/${SERVICE_K3S}.env || true
}

# --- capture current env and create file containing k3s_ variables ---
create_env_file() {
    info "env: Creating environment file ${FILE_K3S_ENV}"
    $SUDO touch ${FILE_K3S_ENV}
    $SUDO chmod 0600 ${FILE_K3S_ENV}
    sh -c export | while read x v; do echo $v; done | grep -E '^(K3S|CONTAINERD)_' | $SUDO tee ${FILE_K3S_ENV} >/dev/null
    sh -c export | while read x v; do echo $v; done | grep -Ei '^(NO|HTTP|HTTPS)_PROXY' | $SUDO tee -a ${FILE_K3S_ENV} >/dev/null
}

# --- write systemd service file ---
create_systemd_service_file() {
    info "systemd: Creating service file ${FILE_K3S_SERVICE}"
    $SUDO tee ${FILE_K3S_SERVICE} >/dev/null << EOF
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=${SYSTEMD_TYPE}
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-${FILE_K3S_ENV}
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=${BIN_DIR}/k3s \
    ${CMD_K3S_EXEC}

EOF
}

# --- write openrc service file ---
create_openrc_service_file() {
    LOG_FILE=/var/log/${SYSTEM_NAME}.log

    info "openrc: Creating service file ${FILE_K3S_SERVICE}"
    $SUDO tee ${FILE_K3S_SERVICE} >/dev/null << EOF
#!/sbin/openrc-run

depend() {
    after network-online
    want cgroups
}

start_pre() {
    rm -f /tmp/k3s.*
}

supervisor=supervise-daemon
name=${SYSTEM_NAME}
command="${BIN_DIR}/k3s"
command_args="$(escape_dq "${CMD_K3S_EXEC}")
    >>${LOG_FILE} 2>&1"

output_log=${LOG_FILE}
error_log=${LOG_FILE}

pidfile="/var/run/${SYSTEM_NAME}.pid"
respawn_delay=5
respawn_max=0

set -o allexport
if [ -f /etc/environment ]; then source /etc/environment; fi
if [ -f ${FILE_K3S_ENV} ]; then source ${FILE_K3S_ENV}; fi
set +o allexport
EOF
    $SUDO chmod 0755 ${FILE_K3S_SERVICE}

    $SUDO tee /etc/logrotate.d/${SYSTEM_NAME} >/dev/null << EOF
${LOG_FILE} {
        missingok
        notifempty
        copytruncate
}
EOF
}

# --- write systemd or openrc service file ---
create_service_file() {
    [ "${HAS_SYSTEMD}" = true ] && create_systemd_service_file
    [ "${HAS_OPENRC}" = true ] && create_openrc_service_file
    return 0
}

# --- get hashes of the current k3s bin and service files
get_installed_hashes() {
    $SUDO sha256sum ${BIN_DIR}/k3s ${FILE_K3S_SERVICE} ${FILE_K3S_ENV} 2>&1 || true
}

# --- enable and start systemd service ---
systemd_enable() {
    info "systemd: Enabling ${SYSTEM_NAME} unit"
    $SUDO systemctl enable ${FILE_K3S_SERVICE} >/dev/null
    $SUDO systemctl daemon-reload >/dev/null
}

systemd_start() {
    info "systemd: Starting ${SYSTEM_NAME}"
    $SUDO systemctl restart ${SYSTEM_NAME}
}

# --- enable and start openrc service ---
openrc_enable() {
    info "openrc: Enabling ${SYSTEM_NAME} service for default runlevel"
    $SUDO rc-update add ${SYSTEM_NAME} default >/dev/null
}

openrc_start() {
    info "openrc: Starting ${SYSTEM_NAME}"
    $SUDO ${FILE_K3S_SERVICE} restart
}

# --- startup systemd or openrc service ---
service_enable_and_start() {
    if [ -f "/proc/cgroups" ] && [ "$(grep memory /proc/cgroups | while read -r n n n enabled; do echo $enabled; done)" -eq 0 ];
    then
        info 'Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)'
    fi

    [ "${INSTALL_K3S_SKIP_ENABLE}" = true ] && return

    [ "${HAS_SYSTEMD}" = true ] && systemd_enable
    [ "${HAS_OPENRC}" = true ] && openrc_enable

    [ "${INSTALL_K3S_SKIP_START}" = true ] && return

    POST_INSTALL_HASHES=$(get_installed_hashes)
    if [ "${PRE_INSTALL_HASHES}" = "${POST_INSTALL_HASHES}" ] && [ "${INSTALL_K3S_FORCE_RESTART}" != true ]; then
        info 'No change detected so skipping service start'
        return
    fi

    [ "${HAS_SYSTEMD}" = true ] && systemd_start
    [ "${HAS_OPENRC}" = true ] && openrc_start
    return 0
}

# --- re-evaluate args to include env command ---
eval set -- $(escape "${INSTALL_K3S_EXEC}") $(quote "$@")

# --- run the install process --
{
    verify_system
    setup_env "$@"
    download_and_verify
#    setup_selinux
    create_symlinks
    create_killall
    create_uninstall
    systemd_disable
    create_env_file
    create_service_file
    service_enable_and_start
}

创建kubeconfig,不然helm会报错

mkdir -p ~/.kube
cp /etc/rancher/k3s/k3s.yaml ~/.kube/config

配置harbor coredns 持久化

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom  # 必须这个名字
  namespace: kube-system
data:
    xxx.server: | # 必须.server结尾
          example1.org example3.org  {
              hosts {
                   127.0.0.1 example1.org example3.org
                   fallthrough
              }
          }

重启coredns读配置

1	`kubectl rollout restart deployment coredns -n kube-system`

安装helm

https://helm.sh/docs/intro/install/

安装cent-manager

cert-manager镜像，docker load 到每个节点，不然要改cert-manager helm chart

quay.io/jetstack/cert-manager-cainjector:v1.7.1
quay.io/jetstack/cert-manager-controller:v1.7.1
quay.io/jetstack/cert-manager-ctl:v1.7.1
quay.io/jetstack/cert-manager-webhook:v1.7.1

下载参考

# If you have installed the CRDs manually instead of with the `--set installCRDs=true` option added to your Helm install command, you should upgrade your CRD resources before upgrading the Helm chart:

# 下载crd
https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.crds.yaml

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

helm fetch jetstack/cert-manager --version 1.7.1

安装

kubectl apply -f cert-manager.crds.yaml

helm install cert-manager cert-manager-v1.7.1.tgz \
  --namespace cert-manager \
  --create-namespace \
  --version v1.7.1

验证

kubectl get pods --namespace cert-manager

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c6866597-zw7kh               1/1     Running   0          2m
cert-manager-cainjector-577f6d9fd7-tr77l   1/1     Running   0          2m
cert-manager-webhook-787858fcdb-nlzsq      1/1     Running   0          2m

安装rancher

下载

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

helm repo update

helm fetch rancher-stable/rancher --version 2.6.5

安装，选项详解

kubectl create  namespace cattle-system

helm install rancher rancher-2.6.5.tgz \
  --namespace cattle-system \
  --set hostname=205.xxx.cn \
  --set bootstrapPassword=fafafa \
  --set replicas=1 \
  --set useBundledSystemChart=true \
  --set additionalTrustedCAs=true

hostname rancher 域名
bootstrapPassword：登录密码
replicas： rancher副本数
useBundledSystemChart：是否使用system-charts packaged with Rancher server
additionalTrustedCAs：信任第三方证书（harbor）配合使用
1
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem
ca-additional.pem是harbor的自签名cert证书，要重命名为ca-additional.pem

安装使用自签名SAN证书

必须使用CA签名，不然纳管agent报错

Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://ums.xxx.cn:31393\": x509: certificate signed by unknown authority

准备openssl.conf 证书生成参考https://www.golinuxcloud.com/openssl-subject-alternative-name/

[req]
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no

[req_distinguished_name]
CN = ums.xxx.cn

[req_ext]
subjectAltName = @alt_names

[alt_names]
IP.1 = 192.168.25.99
DNS.1 = ums.xxx.cn
DNS.2 = *.xxx.cn

使用自签名证书不用certmanager，要保证hostname CN subjectAltName 一致

# 生成ca
openssl req -newkey rsa:2048 -nodes -keyout ca.key -x509 -days 36500 -out ca.crt -subj "/C=xx/ST=x/L=x/O=x/OU=x/CN=ca/emailAddress=x/"

# 给生成的证书CA签名
openssl genrsa -out tls.key 2048
openssl req -new -key tls.key -out tls.csr -config openssl.conf
openssl x509 -req -days 36500 -in tls.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls.crt -extensions req_ext -extfile openssl.conf

# 验证SAN
openssl x509 -text -noout -in tls.crt | grep -A 1 "Subject Alternative Name"

kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=tls.crt \
  --key=tls.key
  
# Error from server (AlreadyExists): secrets "tls-rancher-ingress" already exists 报错的话先删除再重新创建
kubectl -n cattle-system delete secret tls-rancher-ingress


mv ca.crt cacerts.pem
kubectl -n cattle-system create secret generic tls-ca \
  --from-file=cacerts.pem=./cacerts.pem

自签名证书安装 Private CA signed certificate , add --set privateCA=true to the command:

helm install rancher rancher-2.6.5.tgz \
  --namespace cattle-system \
  --set hostname=205.xxx.cn \
  --set bootstrapPassword=fafafa \
  --set replicas=1 \
  --set useBundledSystemChart=true \
  --set additionalTrustedCAs=true \
  --set ingress.tls.source=secret \
  --set privateCA=true

验证

kubectl -n cattle-system get pod

NAME                              READY   STATUS    RESTARTS   AGE
rancher-6f7df66cf7-2fnw5          1/1     Running   0          3d23h
rancher-webhook-6994b4677-tpvf8   1/1     Running   0          3d23h

备份还原Rancher Backups (2.1.2)

参考

镜像

rancher/backup-restore-operator:v2.1.2
rancher/kubectl:v1.21.9

安装

根据需要安装就行，可以保存到pv或者s3上

docker单节点有可能会遇到k8s版本太高，rancher backup 版本太低导致
1
no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beata1"
例如

这时候要直接docker cp适合版本的安装文件到容器里面，使用helm_v3安装。

验证

kubectl -n cattle-resources-system get pod

NAME                              READY   STATUS    RESTARTS   AGE
rancher-backup-7f9ff4c6cb-68jd5   1/1     Running   0          5d

刷新后多出选项

关于minio配置，必须用https

minio自身用https

在docker目录/root/.minio对应的挂载卷目录，创建certs目录，对应的密钥和证书重命名成 private.key and public.crt，使用SAN完整证书链，加入docker container ip(可以通过 docker log 查看到)，server ip

docker run -d -p 19000:9000 -p 15000:5000 --name minio -e "MINIO_ROOT_USER=admin" -e "MINIO_ROOT_PASSWORD=12345678"  -v /data/minio/data:/data   -v /data/minio/config:/root/.minio minio/minio server --console-address ":5000" /data

开启成功的话

docker restart minio
  
docker logs -f minio
  
WARNING: Detected Linux kernel version older than 4.0.0 release, there are some known potential performance problems with this kernel version. MinIO recommends a minimum of 4.x.x linux kernel version for best performance
API: https://172.17.0.3:9000  https://127.0.0.1:9000
  
Console: https://172.17.0.3:5000 https://127.0.0.1:5000
  
Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.0s of 0.0s to load data).

关于账密

创建secret

apiVersion: v1
kind: Secret
metadata:
  name: creds
type: Opaque
data:
  accessKey: <Enter your base64-encoded access key>
  secretKey: <Enter your base64-encoded secret key>

配置填写

endpoint ca 必须填写使用证书base64后的内容，不要直接用证书明文。

迁移rancher

假设集群A迁移到B，A要先备份，尽量AB集群所用Kubernetes 版本相同，因为不同的版本apiVersion不一样

先备份

无论是docker或者集群安装的rancher，备份必须使用rancher backup operator，而不是参考docker backup，否则使用operator恢复备份文件时候会报错(参考Rancher backup panics when it encounters an invalid tarball)

goroutine 403 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1691bc0, 0xc0005a10c0)
        /go/pkg/mod/k8s.io/apimachinery@v0.18.0/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.18.0/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1691bc0, 0xc0005a10c0)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/rancher/backup-restore-operator/pkg/controllers/restore.getGVR(0xc0000aa000, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/rancher/backup-restore-operator/pkg/controllers/restore/controller.go:677 +0x2d4
github.com/rancher/backup-restore-operator/pkg/controllers/restore.(*handler).loadDataFromFile(0xc0000892c0, 0xc0003a3180, 0xc000248a00, 0x13d, 0x200, 0xc0003b8630, 0xc000291380, 0x0, 0xc0004ac120)
        /go/src/github.com/rancher/backup-restore-operator/pkg/controllers/restore/download.go:109 +0x17d
github.com/rancher/backup-restore-operator/pkg/controllers/restore.(*handler).LoadFromTarGzip(0xc0000892c0, 0xc0004ac120, 0x2f, 0xc0003b8630, 0xc000291380, 0x0, 0x0)

统一参考备份还原Rancher Backups (2.1.2)

minio可能会遇到的问题

检查minio和集群时区使用date,不然minio下载备份数据报错The difference between the request time and the server's time is too large., requeuing

# 使用timedatectl比较时区和rtc
root@d-ecs-38357230:~/af# timedatectl
                      Local time: Tue 2022-06-07 17:17:30 CST
                  Universal time: Tue 2022-06-07 09:17:30 UTC
                        RTC time: Tue 2022-06-07 16:58:33
                       Time zone: Asia/Shanghai (CST, +0800)
       System clock synchronized: no
systemd-timesyncd.service active: yes
                 RTC in local TZ: yes


# 可能会用到的命令 调整时区和
timedatectl set-timezone Asia/Shanghai
timedatectl set-local-rtc 1

# 修改时间 可以使用watch -n 1 date查看minio服务器实时时间
date -s "2022-06-07 17:15:40"

迁移流程

下载rancher-backup相关的包

helm repo add rancher-charts https://charts.rancher.io
helm repo update

helm fetch rancher-charts/rancher-backup-crd --version 2.1.2
helm fetch rancher-charts/rancher-backup --version 2.1.2

在B安装，安装前可以使用这个脚本清理节点

helm install rancher-backup-crd rancher-backup-crd-2.1.2.tgz -n cattle-resources-system --create-namespace

# docker pull 版本对应的镜像包，然后指定
helm install rancher-backup rancher-backup-2.1.2.tgz -n cattle-resources-system  -f values.yaml

values.yaml

image:
  repository: harbor.xxx.cn:4443/rancher/backup-restore-operator
  tag: v2.1.2
global:
  kubectl:
    repository: harbor.xxx.cn:4443/rancher/kubectl
    tag: v1.21.9

验证

kubectl get pod -n cattle-resources-system
NAME                             READY   STATUS    RESTARTS   AGE
rancher-backup-94944dc7b-b87z9   1/1     Running   0          171m

在B创建和A相同的minio secert

apiVersion: v1
data:
  accessKey: YWRtaW4=
  secretKey: MTIzNDU2Nzg=
kind: Secret
metadata: 
  name: s3minio
  namespace: default
type: Opaque

创建restore自定义资源，在恢复自定义资源中，prune必须设置为 false。endpointCA 证书内容得base64编码

apiVersion: resources.cattle.io/v1
kind: Restore
metadata:
  name: restore-migration
spec:
  backupFilename: minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz
  prune: false
  storageLocation:
    s3:
      bucketName: rancher-backup
      credentialSecretName: s3minio
      credentialSecretNamespace: default
      endpoint: xxx.xxx.xxx.205:9000
      endpointCA: xxxx
      insecureTLSSkipVerify: true

验证

vim restore-apply.yaml
kubectl apply -f restore-apply.yaml
restore.resources.cattle.io/restore-migration created

# 看restore状态
kubectl get restore
NAME                BACKUP-SOURCE   BACKUP-FILE                                                              AGE   STATUS
restore-migration                   minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz   8s


# 看log
kubectl get pods -n cattle-resources-system
NAME                             READY   STATUS    RESTARTS   AGE
rancher-backup-94944dc7b-b87z9   1/1     Running   0          3h5m

kubectl logs -n cattle-resources-system --tail 100 -f rancher-backup-94944dc7b-b87z9



# 成功后可以开始安装rancher
kubectl get restore
NAME                BACKUP-SOURCE   BACKUP-FILE                                                              AGE     STATUS
restore-migration   S3              minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz   4m52s   Completed

参考上面安装高可用rancher2.6.5，rancher安装命令hostname和A安装的保持一致

迁移完成后的配置修改

检查server-url是否符合实际情况，和高可用安装时候的hostname一致

针对从docker安装迁移到集群高可用安装的情况，主要是agent的连接从IP变成了域名，所以还需要配置好下游集群的agent

获取kubeconfig，参考

对于导入集群不需要考虑，创建的集群分两种情况

有从UI界面获取过kubeconfig，可以通过切换context

kubectl config get-contexts
kubectl config use-context [context-name]

Example:

CURRENT   NAME                        CLUSTER                     AUTHINFO     NAMESPACE
*         my-cluster                  my-cluster                  user-46tmn
          my-cluster-controlplane-1   my-cluster-controlplane-1   user-46tmn

在本例中，当您使用kubectl第一个上下文时my-cluster，您将通过 Rancher 服务器进行身份验证。

使用第二个上下文，my-cluster-controlplane-1您将使用授权的集群端点进行身份验证，直接与下游 RKE 集群通信。

已经对开连接了，就下载不了，可以从下游集群具有controlplane的节点上生成kubeconfig。

参考https://gist.github.com/superseb/b14ed3b5535f621ad3d2aa6a4cd6443b

docker run --rm --net=host -v $(docker inspect kubelet --format '')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
    

需要安装jq，比较麻烦。可以把方法拆分

docker exec -it kube-apiserver bash
export KUBECONFIG=/etc/kubernetes/ssl/kubecfg-kube-node.yaml
    
kubectl get configmap -n kube-system full-cluster-state -o json > full-cluster-state.json
    

拿到json后，在执行

 cat full-cluster-state.json|jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_" > kubeconfig_admin.yaml
    

修改coredns

1	`kubectl edit cm coredns -n kube-system`

修改hosts

data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        hosts {
          xxxx 201.xxxx.cn
          fallthrough
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . "/etc/resolv.conf"
        cache 30
        loop
        reload
        loadbalance
    } # STUBDOMAINS - Rancher specific change
kind: ConfigMap

delete 对应的coredns pod，让hosts起效

1	`kubectl rollout restart deployment coredns -n kube-system`

修改agent配置

1	`kubectl edit deploy cattle-cluster-agent -n cattle-system`

修改CATTLE_SERVER，改成新的域名，同时找到挂载的secret

      containers:
      - env:
       ...
        - name: CATTLE_IS_RKE
          value: "true"
        - name: CATTLE_SERVER
          value: https://xxx.xxx.cn  #修改
        .....
        image: rancher/rancher-agent:v2.6.5
          
        .....
      volumes:
      - name: cattle-credentials
        secret:
          defaultMode: 320
          secretName: cattle-credentials-cdcb52a  #复制

执行

1	`kubectl edit secret -n cattle-system cattle-credentials-cdcb52a`

修改url，填入CATTLE_SERVER值的base64编码

apiVersion: v1
data:
  namespace: xxxx
  token: xxx
  url: aHR0cHM6Ly8yMDEudWlpbi5jbg==  # 修改
kind: Secret
.....
type: Opaque

最后delete cattle-cluster-agent对应的pod

1	`kubectl rollout restart deployment cattle-cluster-agent -n cattle-system`

升级rancher2.5.5-2.6.5

单机版

参考

主要使用--volumes-from实现容器间数据共享

# 停止旧版本容器
docker stop <OLD_RANCHER_CONTAINER_NAME>

# 备份
docker create --volumes-from <OLD_RANCHER_CONTAINER_NAME> --name rancher-data rancher/rancher:<OLD_RANCHER_CONTAINER_TAG>

# 在当前位置生成rancher数据压缩包
docker run --volumes-from rancher-data -v $PWD:/backup busybox tar zcvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz /var/lib/rancher

# 启动新rancher server
docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5



docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  --add-host harbor.xxx.cn:10.xx.xx.205 \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5


docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  --add-host harbor.xxx.cn:10.81.25.149    --add-host gitlab.xxx.cn:10.41.31.1 \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5

# 如果自定义证书
docker run -d --privileged --restart=unless-stopped \
    -p 80:80 -p 443:443 \
    -v /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
    -v /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
    -v /<CERT_DIRECTORY>/<CA_CERTS.pem>:/etc/rancher/ssl/cacerts.pem \
    -e CATTLE_SYSTEM_DEFAULT_REGISTRY=<REGISTRY.YOURDOMAIN.COM:PORT> \ # Set a default private registry to be used in Rancher
    -e CATTLE_SYSTEM_CATALOG=bundled \ #Available as of v2.3.0，use the packaged Rancher system charts
    <REGISTRY.YOURDOMAIN.COM:PORT>/rancher/rancher:<RANCHER_VERSION_TAG>
    
    
# 删除旧容器
docker rm -f <OLD_RANCHER_CONTAINER_NAME>

回滚

# 解压数据到rancher-data

docker run  --volumes-from rancher-data \
-v $PWD:/backup busybox sh -c "rm /var/lib/rancher/* -rf \
&& tar zxvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz"

# 运行旧版本rancher
docker run -d --volumes-from rancher-data \
 --restart=unless-stopped \
 -p 80:80 -p 443:443 \
 --privileged \
 rancher/rancher:<PRIOR_RANCHER_VERSION>

高可用

升级rancher，rancher依赖cert-manager的crd，所以最好先找到升级版本的rancher对应的cert-manager，cert-manager也要升级

例如rancher 2.6.5依赖cert-manager 1.7，参考Install/Upgrade Rancher on a Kubernetes Cluster

备份

备份可以参考上面，或者看https://docs.rancher.cn/docs/rancher2.5/backups/back-up-rancher/_index/

# 获取旧版本的values
helm get values rancher -n cattle-system -o yaml > values.yaml

# 卸载 Rancher
helm delete rancher -n cattle-system

升级cert-manager

参考

备份现有资源

kubectl get -o yaml --all-namespaces \
issuer,clusterissuer,certificates,certificaterequests > cert-manager-backup.yaml

# 卸载现有部署
helm uninstall cert-manager -n cert-manager
kubectl delete namespace cert-manager

# 删除旧crd
kubectl delete -f old-version-crd.yaml

# 安装新版cert-manager
kubectl apply -f new-verson-crds.yaml

helm install cert-manager cert-manager-v1.7.1.tgz \
  --namespace cert-manager \
  --create-namespace \
  --version v1.7.1

恢复备份资源

kubectl apply -f cert-manager-backup.yaml


# 验证
kubectl get pods --namespace cert-manager

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c6866597-zw7kh               1/1     Running   0          2m
cert-manager-cainjector-577f6d9fd7-tr77l   1/1     Running   0          2m
cert-manager-webhook-787858fcdb-nlzsq      1/1     Running   0          2m

升级rancher

# 安装新版rancher
helm install rancher rancher-2.6.5.tgz -n cattle-system -f values.yaml

自定义集群重新以导入方式纳管

根据文档描述，直接UI界面删除自定义集群，会造成k8s的组件也会被删除，所以不能直接界面操作

场景：A rancher的自定义下游集群想迁移到B上（AB可以一样）

B先要创建导入集群，得到部署命令行，拿到对应的yaml

kubectl delete -f xxx.yaml相当于删掉与A的连接，删除后kubectl会失效

kubectl get pods -A
error: You must be logged in to the server (Unauthorized)

重新获取kubeconfig 参考这个里面关于从kube-apiserver容器里面获取kubeconfig
使用新的kubeconfig，执行B的导入纳管命令

重新使用rke管理集群

参考这里获取kubeconfig，这里面有整个集群的信息

使用key full-cluster-state的内容创建cluster.rkestate，用于rke命令操作集群

kubectl get configmap -n kube-system full-cluster-state -o jsonpath='{.data.full-cluster-state}' > cluster.rkestate

根据集群信息编写cluster.yml，完整示例

nodes:
  - address: xxxx
    user: root
    role: ["controlplane", "etcd", "worker"]
    ssh_key_path: /root/.ssh/id_rsa
    port: 22
		hostname_override: "" # 参考节点名，缺了可能相同IP两个节点

kubernetes_version: v1.19.16-rancher1-5  # 参考cluster.rkestate的kubernetesVersion非常重要
cluster_name: xxxx  # 参考

尽量寻找支持该kubernetesVersion的rke版本，只要前面v1.19.16-rancher1一样就可以。

cluster.rkestate和cluster.yml要在同一个目录，处理完这两个文件后，使用rke来操作了。同时保证cluster.rkestate真实反映集群节点现状

1	`rke up -config cluster.yml`

遇到问题

Centos系统不能用root作为SSH用户

Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the Docker socket (/var/run/docker.sock). Please check if the configured user can execute `docker ps` on the node, and if the SSH server version is at least version 6.7 or higher. If you are using RedHat/CentOS, you can't use the user `root`. Please refer to the documentation for more instructions. Error: ssh: rejected: administratively prohibited (open failed) 

新增用户

# 所有机器 新增rancher用户，添加到docker组(rke安全限制)
useradd rancher -G docker
echo "123456" | passwd --stdin rancher
  
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub rancher@192.168.0.22

根据节点修改cluster.yml的ssh_key_path和user

nodes:
  - address: xxxx
    user: rancher
    role: ["controlplane", "etcd", "worker"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
    port: 22
  
usermod -d /new/directory -m username
  

端口开放准备

rancher 端口准备

k3s

入站规则

Protocol	Port	Source	Description
TCP	80	Load balancer/proxy that does external SSL termination	Rancher UI/API when external SSL termination is used
TCP	443	server nodesagent nodeshosted/registered Kubernetesany source that needs to be able to use the Rancher UI or API	Rancher agent, Rancher UI/API, kubectl
TCP	6443	K3s server nodes	Kubernetes API
UDP	8472	K3s server and agent nodes	Required only for Flannel VXLAN.
TCP	10250	K3s server and agent nodes	kubelet

出站规则

Protocol	Port	Destination	Description
TCP	22	Any node IP from a node created using Node Driver	SSH provisioning of nodes using Node Driver
TCP	443	git.rancher.io	Rancher catalog
TCP	2376	Any node IP from a node created using Node driver	Docker daemon TLS port used by Docker Machine
TCP	6443	Hosted/Imported Kubernetes API	Kubernetes API server

RKE

节点间流量规则

Protocol	Port	Description
TCP	443	Rancher agents
TCP	2379	etcd client requests
TCP	2380	etcd peer communication
TCP	6443	Kubernetes apiserver
TCP	8443	Nginx Ingress’s Validating Webhook
UDP	8472	Canal/Flannel VXLAN overlay networking
TCP	9099	Canal/Flannel livenessProbe/readinessProbe
TCP	10250	Metrics server communication with all nodes
TCP	10254	Ingress controller livenessProbe/readinessProbe

入站规则

Protocol	Port	Source	Description
TCP	22	RKE CLI	SSH provisioning of node by RKE
TCP	80	Load Balancer/Reverse Proxy	HTTP traffic to Rancher UI/API
TCP	443	Load Balancer/Reverse ProxyIPs of all cluster nodes and other API/UI clients	HTTPS traffic to Rancher UI/API
TCP	6443	Kubernetes API clients	HTTPS traffic to Kubernetes API

出站规则

Protocol	Port	Source	Description
TCP	443	`35.160.43.145`,`35.167.242.46`,`52.33.59.17`	Rancher catalog (git.rancher.io)
TCP	22	Any node created using a node driver	SSH provisioning of node by node driver
TCP	2376	Any node created using a node driver	Docker daemon TLS port used by node driver
TCP	6443	Hosted/Imported Kubernetes API	Kubernetes API server
TCP	Provider dependent	Port of the Kubernetes API endpoint in hosted cluster	Kubernetes API

RKE2

入站规则

Protocol	Port	Source	Description
TCP	9345	RKE2 agent nodes	Kubernetes API
TCP	6443	RKE2 agent nodes	Kubernetes API
UDP	8472	RKE2 server and agent nodes	Required only for Flannel VXLAN
TCP	10250	RKE2 server and agent nodes	kubelet
TCP	2379	RKE2 server nodes	etcd client port
TCP	2380	RKE2 server nodes	etcd peer port
TCP	30000-32767	RKE2 server and agent nodes	NodePort port range
TCP	5473	Calico-node pod connecting to typha pod	Required when deploying with Calico
HTTP	8080	Load balancer/proxy that does external SSL termination	Rancher UI/API when external SSL termination is used
HTTPS	8443	hosted/registered Kubernetesany source that needs to be able to use the Rancher UI or API	Rancher agent, Rancher UI/API, kubectl. Not needed if you have LB doing TLS termination.

通常允许所有出站流量。

总结

常用端口

TCP Ports
22, 80, 443, 2376, 2379, 2380, 6443, 9099, 9796, 10250, 10254, 30000-32767
UDP Ports
8472, 30000-32767

网络插件端口，默认使用Canal

WAVE插件 TCP 6783 UDP 6783-6784
Calico插件 TCP 179,5473 UDP 4789
Cilium插件 TCP 8472，4240

端口开放检测脚本

#!/bin/bash

TCP_PORTS="22088 22 80 443 2376 2379 2380 6443 9099 9796 10250 10254"
UDP_PORTS="8472"
REMOTE_HOST=$1
TIMEOUT_SEC=5
LOCAL_IP='hostname -I | cut -d' ' -f1'
function check() {
    res=$1
    PORT=$2
    proto=$3
    if [[ $res -eq 0 ]]
    then
        echo "$proto $PORT OPEN"
    elif [[ $res -eq 1 ]]
    then
        echo "$proto $PORT OPEN BUT NOT LISTEN"
    elif [[ $res -eq 124 ]]
    then
        echo "$proto $PORT NOT OPEN"
    else
        echo "$proto $PORT UNKONWN ERROR"
    fi
}
echo "check $LOCAL_IP -----> $REMOTE_HOST port"
for PORT in $TCP_PORTS
do
    timeout $TIMEOUT_SEC bash -c "</dev/tcp/$REMOTE_HOST/$PORT" &>/dev/null; res=$?
    check $res $PORT "tcp"
done

for PORT in $UDP_PORTS
do
    timeout $TIMEOUT_SEC bash -c "</dev/tcp/$REMOTE_HOST/$PORT" &>/dev/null; res=$?
    check $res $PORT "udp"
done

端口测试

测试命令

# 添加规则 禁止tcp/9099 被访问
iptables -A INPUT -p tcp --dport 9099 -j DROP
# 删除规则
iptables -D INPUT -p tcp --dport 9099 -j DROP

tcp/6443

tcp/2380,2379

kubectl get pods -A
Error from server: etcdserver: request timed out

udp/8472

pod 正常，dns解析访问pod ip都有问题

/ # nslookup default-http-backend
;; connection timed out; no servers could be reached
  
  
/ # wget 10.43.85.213
Connecting to 10.43.85.213 (10.43.85.213:80)
wget: can't connect to remote host (10.43.85.213): Operation timed out

tcp/10254 nginx-ingress-controller 起不来

kubectl get pods -n  ingress-nginx
NAME                                    READY   STATUS             RESTARTS   AGE
default-http-backend-6db58c58cd-bfk2h   1/1     Running            0          4h27m
nginx-ingress-controller-lrx64          1/1     Running            0          4h13m
nginx-ingress-controller-ztww2          0/1     CrashLoopBackOff   6          4h27m

tcp/9099 canal 运行有问题

kubectl get pods -n kube-system
NAME                                       READY   STATUS      RESTARTS   AGE
calico-kube-controllers-5898bd695c-cgl6f   1/1     Running     0          4h35m
canal-2mwgg                                1/2     Running     1          4h35m
canal-ffn6k                                2/2     Running     1          4h21m

tcp/10250 Metrics server communication with all nodes

节点CPU 内存会变成N/A rancher看log会看不了

kubectl logs -f --tail 200 xxx-5768967f5c-wmc2k -n xxx
Error from server: Get "https://xxx:10250/containerLogs/xxx/xxx-5768967f5c-wmc2k/xxx?follow=true&tailLines=200": dial tcp xxxxx:10250: connect: no route to host

rancher error 处理

pod莫名其妙拉不了镜像，宿主机可以拉，推测集群有问题，而集群靠kubelet来管理容器

cp ~/.docker/config.json /var/lib/kubelet/config.json

docker restart kubelet

template system-library-rancher-monitoring 和kubeversion 不匹配

template system-library-rancher-monitoring incompatible with rancher version or cluster’s [xxx] kubernetes version

参考 https://github.com/rancher/rancher/issues/37039#issuecomment-1176320933

迁移相关，看是不是开启了v1版本的monitor，新版rancher用的是v2

# 设置检查集群配置
enable_cluster_alerting: false
enable_cluster_monitoring: false

或者使用这个脚本检查，是否存在迁移，自定义证书要加--insecure参数

得到

The Monitoring V1 operator does not appear to exist in cluster *******. Migration to Monitoring V2 should be possible.

应该没啥问题

检查system-library-rancher-monitoring这个 cr的内容

1	`kubectl edit catalogtemplates system-library-rancher-monitoring`

修改versions的第一项

spec:
  catalogId: system-library
  defaultVersion: 0.3.2
  description: Provides monitoring for Kubernetes which is maintained by Rancher 2.
  displayName: rancher-monitoring
  folderName: rancher-monitoring
  icon: https://coreos.com/sites/default/files/inline-images/Overview-prometheus_0.png
  projectURL: https://github.com/coreos/prometheus-operator
  versions:
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: < 1.22.0-0  # 这个地方 改成 '>=1.21.0-0'
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring

Template system-library-rancher-monitoring incompatible with rancher version or cluster’s [local] kubernetes version

最后我通过编辑clusters.management.cattle.io CR conditions，里面的报错来修复它，因为即使我卸载了 system-monitor ，错误仍然存在。

Example

kubectl edit  clusters.management.cattle.io/local


conditions:
  - status: "True"
    type: Ready
  - lastUpdateTime: "2022-05-31T09:00:06Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-05-31T09:00:09Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-05-31T09:00:09Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-05-31T09:00:07Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-05-31T09:00:08Z"

从system-library-rancher-monitoring获取相关的参数

kubectl edit catalogtemplates system-library-rancher-monitoring -n cattle-global-data

  versions:
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: =1.23.6+k3s1
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: '>=1.22.0-0'
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring

通过阅读源码，做了一些其他debug，其实是不可能会报错的，所以直接改cluster的cr也不会重新报错。

func (m *Manager) LatestAvailableTemplateVersion(template *v3.CatalogTemplate, clusterName string) (*v32.TemplateVersionSpec, error) {
	versions := template.DeepCopy().Spec.Versions
	if len(versions) == 0 {
		return nil, errors.New("empty catalog template version list")
	}

	sort.Slice(versions, func(i, j int) bool {
		val1, err := semver.ParseTolerant(versions[i].Version)
		if err != nil {
			return false
		}

		val2, err := semver.ParseTolerant(versions[j].Version)
		if err != nil {
			return false
		}

		return val2.LT(val1)
	})

	for _, templateVersion := range versions {
		catalogTemplateVersion := &v3.CatalogTemplateVersion{
			Spec: templateVersion,
		}

		if err := m.ValidateChartCompatibility(catalogTemplateVersion, clusterName, ""); err == nil {
			return &templateVersion, nil
		}
	}

	return nil, errors.Errorf("template %s incompatible with rancher version or cluster's [%s] kubernetes version", template.Name, clusterName)
}

could not find tenant ID context deadline exceeded

创建微软云凭证时候报错，凭证有效能用，POST /meta/aksCheckCredentials时候，报错。

{"error":"could not find tenant ID: Request failed: subscriptions.Client#Get: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded"}

通过错误提示，找到并阅读源码，goCtx用于控制请求超时时间，报错也和时间有关。

func FindTenantID(ctx context.Context, env azure.Environment, subscriptionID string) (string, error) {
	goCtx, cancel := context.WithTimeout(ctx, findTenantIDTimeout)
	defer cancel()
....
}

函数作用

调用azure sdk访问azure http api，验证cred。

排查方向

网络方面真的超时，通过debug，大概摸清访问的url，参数等，尝试请求发现可以访问，响应也快。
通过debug怀疑到时区问题引发的问题

系统默认是UTC，通过把时区设置成当地时区，并且reboot机器问题解决，只是restart rancher pod不起效

root@xxx:~# timedatectl
                      Local time: Tue 2022-07-19 03:20:00 UTC
                  Universal time: Tue 2022-07-19 03:20:00 UTC
                        RTC time: Tue 2022-07-19 03:20:01
                       Time zone: UTC (UTC, +0000)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
                 
# 修改时区
sudo timedatectl set-timezone Asia/Shanghai

下游集群重新导入Cluster agent is not connected

agent和rancher完全没有具体报错，agent卡在Connecting to proxy

INFO: Environment: CATTLE_ADDRESS=10.42.1.43 
....
time="2022-09-28T08:22:36Z" level=info msg="Rancher agent version v2.6.5 is starting"
time="2022-09-28T08:22:36Z" level=info msg="Listening on /tmp/log.sock"
time="2022-09-28T08:22:36Z" level=info msg="Connecting to wss://rancher.xxx.cn:31628/v3/connect/register with token starting with 2b2ch2cnsp6wxkzdm5djjbttr55"
time="2022-09-28T08:22:36Z" level=info msg="Connecting to proxy" url="wss://rancher.xxx.cn:31628/v3/connect/register"

解决方法：修改rancher 集群信息AgentDeployed

# rancher 导入集群命令所用的yaml路径例子，其中c-zqpccj就是集群id
https://xxxx/v3/import/xxxx_c-zqpcc.yaml

修改集群信息

1	`kubectl edit clusters.management.cattle.io c-zqpcc`

重置agent部署状态AgentDeployed

  conditions:         
  ...
   - lastUpdateTime: "2022-02-16T07:16:07Z"                           
    status: "True"      # 改成False                                 
    type: AgentDeployed
    
    

清楚原来agent，然后重新运行导入命令就可以了

迁移到2.6.7

需要镜像

rancher/rancher-webhook:v0.2.6
rancher/shell:v0.1.18
rancher/rancher:v2.6.7
rancher/gitjob:v0.1.30
rancher/fleet:v0.3.10
rancher/fleet-agent:v0.3.10
rancher/rancher-agent:v2.6.7
rancher/mirrored-pause:3.6
rancher/rke-tools:v0.1.87
rancher/hyperkube:v1.24.2-rancher1
rancher/mirrored-coreos-etcd:v3.4.16-rancher1
rancher/mirrored-calico-cni:v3.23.1
rancher/mirrored-calico-pod2daemon-flexvol:v3.23.1
rancher/kube-api-auth:v0.1.8
rancher/mirrored-calico-node:v3.23.1
rancher/mirrored-flannelcni-flannel:v0.17.0
rancher/mirrored-cluster-proportional-autoscaler:1.8.5
rancher/mirrored-metrics-server:v0.6.1
rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
rancher/mirrored-coredns-coredns:1.9.3
rancher/mirrored-calico-kube-controllers:v3.23.1
rancher/nginx-ingress-controller:nginx-1.2.1-rancher1

helm升级

# helm
helm upgrade --reuse-values  rancher rancher-2.6.7.tgz -n cattle-system

docker升级

升级rancher2.5.5-2.6.5参考

升级后

重新纳管集群参考

修改agent配置，那块只需要改secret，不需要改agent deploy

安装rancher2.6.5 docker 单节点版

准备镜像仓库

docker 命令

参数详解

配置私有仓库

配置system-default-registry

高可用rancher

安装k3s

安装helm

安装cent-manager

安装rancher

安装使用自签名SAN证书

备份还原Rancher Backups (2.1.2)

迁移rancher

先备份

minio可能会遇到的问题

迁移流程

迁移完成后的配置修改

升级rancher2.5.5-2.6.5

单机版

高可用

备份

升级cert-manager

升级rancher

自定义集群重新以导入方式纳管

端口开放准备

k3s

RKE

RKE2

总结

端口测试

rancher error 处理

Failed to pull image “xxx”: rpc error: code = Unknown desc = Error response from daemon: pull access denied for xxx, repository does not exist or may require ‘docker login’

template system-library-rancher-monitoring 和kubeversion 不匹配

Template system-library-rancher-monitoring incompatible with rancher version or cluster’s [local] kubernetes version

could not find tenant ID context deadline exceeded

下游集群重新导入Cluster agent is not connected

迁移到2.6.7