# k8s-device-plugin **Repository Path**: wilds/k8s-device-plugin ## Basic Information - **Project Name**: k8s-device-plugin - **Description**: k8s-device-plugin - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-12-13 - **Last Updated**: 2023-12-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # k8s-device-plugin #### 介绍 k8s-device-plugin #### 软件架构 软件架构说明 #### 安装教程 ``` cd /opt/code/wilds git clone https://gitee.com/wilds/k8s-device-plugin.git cd k8s-device-plugin go mod tidy go mod vendor cd cmd/k8s-device-plugin go build ``` #### 使用说明 ##### device plugin 1. 生成mock设备(gpu) ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin mockdev 2023/12/22 10:29:09 k8s generic device plugin. 2023/12/22 10:29:09 mock device resources: [gpu_0 gpu_1 gpu_2] 2023/12/22 10:29:09 Execute 'mkdir /mock/dev ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_0 ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_1 ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_2 ' command successed. 2023/12/22 10:29:09 Execute 'mkdir -p /mock/so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_0.so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_1.so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_2.so ' command successed. 2023/12/22 10:29:09 Execute 'mkdir -p /mock/bin ' command successed. 2023/12/22 10:29:09 Write to file /mock/bin/list_dev.sh ' successed. ``` 2. 运行设备插件 ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin 2023/12/21 19:12:26 k8s generic device plugin. 2023/12/21 19:12:26 huawei.com/gpu 2023/12/21 19:12:26 Discovery devices,resource name: huawei.com/gpu 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_0 name: gpu_0 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_1 name: gpu_1 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_2 name: gpu_2 2023/12/21 19:12:26 Watch the device plugin path. 2023/12/21 19:12:26 Watch os signal. 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_0 name: gpu_0 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_1 name: gpu_1 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_2 name: gpu_2 2023/12/21 19:12:26 Starting device plugin on /var/lib/kubelet/device-plugins/gpu.sock 2023/12/21 19:12:26 enter list and watch,devices: [&Device{ID:gpu_0,Health:Healthy,Topology:nil,} &Device{ID:gpu_1,Health:Healthy,Topology:nil,} &Device{ID:gpu_2,Health:Healthy,Topology:nil,}] 2023/12/21 19:12:26 Registered device plugin with kubelet. ``` 3. 运行使用huawei.com/gpu资源的Pod ``` [root@openfuyao-0004 yaml]# kubectl apply -f pod-gpu.yaml ``` 4. 查看Node信息 ``` [root@openfuyao-0004 so]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.17.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true ... Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/gpu: 3 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/gpu: 3 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1500m (37%) 1 (25%) memory 4012Mi (12%) 4436Mi (14%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/gpu 1 1 Events: ``` 5. 删除Pod ``` [root@openfuyao-0004 yaml]# kubectl delete-f pod-gpu.yaml ``` 6. 清理mock资源 ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin mockdev clean ``` 7. 停止设备插件 ##### cdi 0. 前置条件 - k8s版本:1.29.0 - containerd:1.27.2,修改配置文件支持cdi ``` sudo vim /etc/containerd/config.toml [plugins."io.containerd.grpc.v1.cri"] enable_cdi = true cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"] # 重启containerd systemctl restart containerd ``` 1. 生成mock设备(gpu) ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin mockdev 2023/12/22 10:29:09 k8s generic device plugin. 2023/12/22 10:29:09 mock device resources: [gpu_0 gpu_1 gpu_2] 2023/12/22 10:29:09 Execute 'mkdir /mock/dev ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_0 ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_1 ' command successed. 2023/12/22 10:29:09 Execute 'mknod /mock/dev/gpu_2 ' command successed. 2023/12/22 10:29:09 Execute 'mkdir -p /mock/so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_0.so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_1.so ' command successed. 2023/12/22 10:29:09 Execute 'touch /mock/so/gpu_2.so ' command successed. 2023/12/22 10:29:09 Execute 'mkdir -p /mock/bin ' command successed. 2023/12/22 10:29:09 Write to file /mock/bin/list_dev.sh ' successed. ``` 2. 运行设备插件(使用cdi模式) ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin --cdi 2023/12/21 19:12:26 k8s generic device plugin. 2023/12/21 19:12:26 huawei.com/gpu 2023/12/21 19:12:26 Discovery devices,resource name: huawei.com/gpu 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_0 name: gpu_0 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_1 name: gpu_1 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_2 name: gpu_2 2023/12/21 19:12:26 Watch the device plugin path. 2023/12/21 19:12:26 Watch os signal. 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_0 name: gpu_0 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_1 name: gpu_1 2023/12/21 19:12:26 find mock device path: /mock/dev/gpu_2 name: gpu_2 2023/12/21 19:12:26 Starting device plugin on /var/lib/kubelet/device-plugins/gpu.sock 2023/12/21 19:12:26 enter list and watch,devices: [&Device{ID:gpu_0,Health:Healthy,Topology:nil,} &Device{ID:gpu_1,Health:Healthy,Topology:nil,} &Device{ID:gpu_2,Health:Healthy,Topology:nil,}] 2023/12/21 19:12:26 Registered device plugin with kubelet. ``` 3. 拷贝k8s-device-plugin/yaml/cdi_gpu.yaml到/var/run/cdi目录 ``` [root@openfuyao-0004 yaml]# mkdir /var/run/cdi [root@openfuyao-0004 yaml]# cp cdi_gpu.yaml /var/run/cdi/ ``` 4. 运行使用huawei.com/gpu资源的Pod ``` [root@openfuyao-0004 yaml]# kubectl apply -f pod-gpu.yaml ``` 查看设备设备控制台信息 ``` 2023/12/22 10:55:27 Allocate request devices: gpu_1 from cdi true ``` 5. 查看结点信息 ``` [root@openfuyao-0004 so]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.17.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true ... System Info: Machine ID: 1d6c803a58f6428fbee2d85ccf87dbd5 System UUID: 82ca0648-9b77-4b31-8650-d718e8c07113 Boot ID: ec3cf03d-7c76-46f0-ac87-97b00441a375 Kernel Version: 4.19.90-2003.4.0.0036.oe1.x86_64 OS Image: openEuler 20.03 (LTS) Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.2 Kubelet Version: v1.29.0 Kube-Proxy Version: v1.29.0 Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/gpu: 3 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/gpu: 3 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1500m (37%) 1 (25%) memory 4012Mi (12%) 4436Mi (14%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/gpu 1 1 Events: ``` 6. 进入Pod容器查看设备挂载情况 ``` [root@openfuyao-0004 yaml]# kubectl exec -it gpu-device-plugin -- bash root@gpu-device-plugin:/mock# ls /mock/dev gpu_1 root@gpu-device-plugin:/mock# ls /mock/so gpu_1.so root@gpu-device-plugin:/mock# ls /usr/local/bin list_dev.sh root@gpu-device-plugin:/mock# list_dev.sh gpu_1 ``` 7. 删除Pod ``` [root@openfuyao-0004 yaml]# kubectl delete-f pod-gpu.yaml ``` 8. 清理mock资源 ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin mockdev clean ``` 9. 停止设备插件 ##### other 1. 查询结点信息 ``` [root@openfuyao-0004 k8s-device-plugin]# kubectl get node NAME STATUS ROLES AGE VERSION fuyao-master-01 Ready control-plane 14d v1.27.4 [root@openfuyao-0004 k8s-device-plugin]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane=control-plane node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.16.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true ... Capacity: cpu: 4 ephemeral-storage: 226803232Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2600m (65%) 4 (100%) memory 9039432192 (27%) 10139Mi (32%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: ``` 2. 启动k8s-device-plugin ``` [root@openfuyao-0004 k8s-device-plugin]# ./k8s-device-plugin --vendor-domain huawei.com --resource-type npu 2023/12/15 10:21:51 k8s generic device plugin. 2023/12/15 10:21:51 huawei.com/npu 2023/12/15 10:21:51 Discovery devices. 2023/12/15 10:21:51 Watch the device plugin path. 2023/12/15 10:21:51 Watch os signal. 2023/12/15 10:21:51 Starting device plugin on /var/lib/kubelet/device-plugins/npu.sock 2023/12/15 10:21:51 Registered device plugin with kubelet. ``` 3. 再次查看node信息 ``` [root@openfuyao-0004 k8s-device-plugin]# kubectl describe node fuyao-master-01 [root@openfuyao-0004 ~]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane=control-plane node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.16.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true ... Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2600m (65%) 4 (100%) memory 9039432192 (27%) 10139Mi (32%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/npu 0 0 Events: ``` 4. 创建Pod,申请npu资源 ``` cd /opt/paul/code/wilds/k8s-device-plugin kubectl apply -f yaml/pod-npu.yaml ``` 查看pod状态 ``` [root@openfuyao-0004 yaml]# kubectl get pod NAME READY STATUS RESTARTS AGE npu-device-plugin 1/1 Running 0 106s [root@openfuyao-0004 device-plugins]# kubectl get pod npu-device-plugin -o yaml apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/containerID: 5f426cd6a132b3263f97ceef18428e5fb205663257f58825239c84fe794944dc cni.projectcalico.org/podIP: 172.16.135.30/32 cni.projectcalico.org/podIPs: 172.16.135.30/32 creationTimestamp: "2023-12-15T02:27:41Z" name: npu-device-plugin namespace: default resourceVersion: "1669099" uid: 6ae9699e-d8e0-479e-904b-3eee62a697b0 spec: ... ``` 5. 再次查看node信息 ``` [root@openfuyao-0004 ~]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane=control-plane node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.16.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true ... Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2600m (65%) 4 (100%) memory 9039432192 (27%) 10139Mi (32%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/npu 8 8 Events: ``` 6. 查看`/var/lib/kubelet/device-plugins`中的内容 ``` [root@openfuyao-0004 ~]# cd /var/lib/kubelet/device-plugins [root@openfuyao-0004 device-plugins]# ls kubelet_internal_checkpoint kubelet.sock npu.sock [root@openfuyao-0004 device-plugins]# cat kubelet_internal_checkpoint | jq . { "Data": { "PodDeviceEntries": [ { "PodUID": "6ae9699e-d8e0-479e-904b-3eee62a697b0", "ContainerName": "npu-device-plugin", "ResourceName": "huawei.com/npu", "DeviceIDs": { "-1": [ "huawei.com/npu-1", "huawei.com/npu-2", "huawei.com/npu-0", "huawei.com/npu-8", "huawei.com/npu-5", "huawei.com/npu-7", "huawei.com/npu-9", "huawei.com/npu-6" ] }, "AllocResp": "CqoBCh5WSVNJQkxFX0RFVklDRVMvaHVhd2VpLmNvbS9ucHUShwFodWF3ZWkuY29tL25wdS0yLGh1YXdlaS5jb20vbnB1LTAsaHVhd2VpLmNvbS9ucHUtOCxodWF3ZWkuY29tL25wdS01LGh1YXdlaS5jb20vbnB1LTcsaHVhd2VpLmNvbS9ucHUtOSxodWF3ZWkuY29tL25wdS02LGh1YXdlaS5jb20vbnB1LTE=" } ], "RegisteredDevices": { "huawei.com/npu": [ "huawei.com/npu-1", "huawei.com/npu-2", "huawei.com/npu-7", "huawei.com/npu-8", "huawei.com/npu-9", "huawei.com/npu-0", "huawei.com/npu-4", "huawei.com/npu-5", "huawei.com/npu-6", "huawei.com/npu-3" ] } }, "Checksum": 702431409 } ``` 7. 删除pod ``` cd /opt/paul/code/wilds/k8s-device-plugin kubectl delete-f yaml/pod-npu.yaml ``` 8. 查看Node ``` [root@openfuyao-0004 yaml]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane=control-plane node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.16.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 30 Nov 2023 16:09:07 +0800 Taints: Unschedulable: false Lease: HolderIdentity: fuyao-master-01 AcquireTime: RenewTime: Fri, 15 Dec 2023 10:43:15 +0800 Addresses: InternalIP: 192.168.100.36 Hostname: fuyao-master-01 Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 System Info: Machine ID: 1d6c803a58f6428fbee2d85ccf87dbd5 System UUID: 82ca0648-9b77-4b31-8650-d718e8c07113 Boot ID: ec3cf03d-7c76-46f0-ac87-97b00441a375 Kernel Version: 4.19.90-2003.4.0.0036.oe1.x86_64 OS Image: openEuler 20.03 (LTS) Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.2 Kubelet Version: v1.27.4 Kube-Proxy Version: v1.27.4 Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2600m (65%) 4 (100%) memory 9039432192 (27%) 10139Mi (32%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/npu 0 0 Events: ``` 9. 停止k8s-device-plugin进程 ``` kill ${pid} ``` 10. 查看Node信息(Allocatable变为0) ``` [root@openfuyao-0004 yaml]# kubectl describe node fuyao-master-01 Name: fuyao-master-01 Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=fuyao-master-01 kubernetes.io/os=linux node-role.kubernetes.io/control-plane=control-plane node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.100.36/24 projectcalico.org/IPv4IPIPTunnelAddr: 172.16.135.0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 30 Nov 2023 16:09:07 +0800 Taints: Unschedulable: false Lease: HolderIdentity: fuyao-master-01 AcquireTime: RenewTime: Fri, 15 Dec 2023 10:47:31 +0800 Addresses: InternalIP: 192.168.100.36 Hostname: fuyao-master-01 Capacity: cpu: 4 ephemeral-storage: 226803232Ki huawei.com/npu: 10 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32411976Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 209021858266 huawei.com/npu: 0 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32309576Ki pods: 110 System Info: Machine ID: 1d6c803a58f6428fbee2d85ccf87dbd5 System UUID: 82ca0648-9b77-4b31-8650-d718e8c07113 Boot ID: ec3cf03d-7c76-46f0-ac87-97b00441a375 Kernel Version: 4.19.90-2003.4.0.0036.oe1.x86_64 OS Image: openEuler 20.03 (LTS) Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.2 Kubelet Version: v1.27.4 Kube-Proxy Version: v1.27.4 Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2600m (65%) 4 (100%) memory 9039432192 (27%) 10139Mi (32%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) huawei.com/npu 0 0 Events: ``` 11. 再次创建pod申请npu资源 ``` cd /opt/paul/code/wilds/k8s-device-plugin kubectl apply -f yaml/pod-npu.yaml ``` 发现pod一直处于pending状态 ``` [root@openfuyao-0004 yaml]# kubectl get pod --watch NAME READY STATUS RESTARTS AGE npu-device-plugin 0/1 Pending 0 12s C[root@openfuyao-0004 yaml]# kubectl describe pod npu-device-plugin Name: npu-device-plugin Namespace: default Priority: 0 Service Account: default Node: Labels: Annotations: Status: Pending IP: IPs: Containers: npu-device-plugin: Image: nginx Port: 80/TCP Host Port: 0/TCP Limits: huawei.com/npu: 8 Requests: huawei.com/npu: 8 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s98lq (ro) Conditions: Type Status PodScheduled False Volumes: kube-api-access-s98lq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2m20s default-scheduler 0/1 nodes are available: 1 Insufficient huawei.com/npu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. ```