Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Dell ObjectScale 1.3 Administration Guide

Place a failed node into permanent maintenance mode (ObjectScale on OpenShift)

Use kubectl to manually place a failed node (that is powered off, dead, or otherwise inaccessible) into permanent maintenance mode. Use this process for ObjectScale instances on a Red Hat OpenShift cluster.

About this task

NOTE:The removal process for a failed node is largely manual, and does not involve the ObjectScale Operator in the same way that standard PMM does. However, during the node removal process, the ObjectScale Operator starts recovery procedures for certain non-SS stateful pods, such as bookie, influxdb, and zookeeper. Recovery of the non-SS stateful pods occurs automatically, and does not affect the procedure workflow.

Place the failed node to be removed in to permanent maintenance mode:

Steps

  1. Mark the failed node as unschedulable so that it is no longer available to run pods.
    kubectl cordon <NODE_NAME>
  2. Delete the pods from the node.
    kubectl drain <NODE_NAME> --force --delete-local-data --ignore-daemonsets
  3. Collect the UUID for the node to be removed from the cluster:
    kubectl get csibmnodes
  4. Remove the node from the OpenShift cluster.
    kubectl delete node <NODE_NAME>
    NOTE:Once you delete the node, it is no longer listed in the kubectl get nodes output.
  5. Manually delete the PVCs bound to the failed node:
    1. Get the names of the PVCs:
      kubectl get pvc
    2. Get the details for each of the PVCs:
      kubectl describe <PVC_NAME>
    3. Get the node for each of the PVCs:
      for i in `kubectl get pvc --no-headers -o jsonpath="{.items[*].metadata.name}"`; do echo "=== $i"; kubectl get pvc $i -o json | grep selected-node | grep -v "{}"; done
      user1@hw-and-os-96:~> kubectl describe pvc data-0-ecs-cluster-ss-0
      Name:          data-0-ecs-cluster-ss-0
      Namespace:     default
      StorageClass:  csi-baremetal-sc-hdd
      Status:        Bound
      Volume:        pvc-8f28acda-f7e5-4efc-9b4b-d8e50e21e72f
      Labels:        app=ecs-cluster-ss
                     app.kubernetes.io/component=ss
                     app.kubernetes.io/name=ecs-cluster
                     app.kubernetes.io/namespace=default
                     component=ss
                     objectscale.dellemc.com/logging-inject=true
                     objectscale.dellemc.com/logging-release-name=ecs-cluster
                     operator=objectscale-operator
                     release=ecs-cluster
      Annotations:   pv.attach.kubernetes.io/ignore-if-inaccessible: yes
                     pv.kubernetes.io/bind-completed: yes
                     pv.kubernetes.io/bound-by-controller: yes
                     volume.beta.kubernetes.io/storage-provisioner: csi-baremetal
                     volume.kubernetes.io/selected-node: worker3.ocp4.cmo.com
                     volumehealth.storage.kubernates.io/health: accessible
                     volumerelease.csi-baremetal/support: yes
      Finalizers:    [kubernetes.io/pvc-protection]
      Capacity:      11176Gi
      Access Modes:  RWO
      VolumeMode:    Filesystem
      Mounted By:    ecs-cluster-ss-0
      Events:        <none>
    4. Once all PVC names for the node have been gathered, delete the PVCs:
      kubectl delete pvc <PVC_NAME_1> <PVC_NAME_2> <PVC_NAME_N>
  6. Patch and delete the volumes of the failed node:
    1. Patch the volumes:
      kubectl get volume | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch volume --type merge -p '{"metadata":{"finalizers":null}}'
    2. Remove the volumes:
      kubectl get volume | grep <NODE_UUID>| awk '{print $1}' | xargs kubectl delete volume
  7. Patch and delete the LVGs of the failed node:
    1. Patch the LVGs:
      kubectl get lvg | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch lvg --type merge -p '{"metadata":{"finalizers":null}}'
    2. Remove the LVGs:
      kubectl get lvg | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete lvg
  8. Clean up all the CSI resources for the failed node:
    1. Patch the LVGs:
      kubectl get csibmnode | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch csibmnode --type merge -p '{"metadata":{"finalizers":null}}'
    2. Remove the LVGs:
      kubectl get csibmnode | grep <NODE_UUID>| awk '{print $1}' | xargs kubectl delete csibmnode
    3. Delete the drive CRs:
      kubectl get drive | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete drive
    4. Delete the available capacity:
      kubectl get ac | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete ac
       
      				
  9. Remove the pending pods for all namespaces that are associated with ObjectScale and object stores:
    1. Identify the pods to be deleted:
      kubectl get pods | grep Pending
    2. Delete each pod returned, that is associated with the removed node:
      kubectl delete pods <PODS>
  10. Finally, verify that all the resources have been successfully removed:
    1. Check for Bare-Metal nodes:
      kubectl get csibmnode | grep <NODE_UUID>
    2. Check for available capacity:
      kubectl get ac | grep <NODE_UUID>
    3. Check for drive CRs:
      kubectl get drive | grep <NODE_UUID>
  11. Optional: Monitor the automatic recovery of non-SS pods:
    In order to ensure data protection, certain non-SS pods, such as the bookie, influxdb, and zookeeper pods, require recovery after they are relocated. ObjectScale Operator initiates recovery for these pods automatically once the pods are removed from the PMM node and started on another available node in the cluster.
    kubectl get serviceprocedures -A -o custom-columns=Name:metadata.name,Node:spec.nodeInfo.name,Type:spec.type,Time:metadata.managedFields[0].time,Reason:status.reason,Message:status.message

Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\