Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Dell EMC ObjectScale 1.2.x Administration Guide

Replace a failed node within the ObjectScale Software Bundle

Use this Node Replacement service procedure to replace a completely failed node.

Prerequisites

Ensure that the new node has the same OS version and networking configuration as the other nodes within the ObjectScale Software Bundle.

About this task

When a node goes to failure, all pods on that node turn to terminating state. Stateless pods would be rescheduled to another available node after five minutes, and stateful pods would keep terminating.

Steps

  1. The ObjectScale Software Bundle CMO Platform Manager APIs require a keycloak token to authenticate the requests for cluster management tasks.

    The ObjectScale Software Bundle contains a CMO Platform Manager running on Kubernetes within the cluster that is used to request cluster management tasks, like service procedures.

    1. Collect the keycloak account information from the secret:
      export KEYCLOAK_USER=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-username"]' | base64 --decode)
      export KEYCLOAK_PASSWORD=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-password"]' | base64 --decode)
      export KEYCLOAK_REALM=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-realm"]' | base64 --decode)
      export KEYCLOAK_CLIENT=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-client"]' | base64 --decode)
      export KEYCLOAK_CLIENT_SECRET=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-credentials-secret"]' | base64 --decode)
    2. Set an environment variable for the access token:
      export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')
  2. Collect the IP address of the CMO Platform Manager.
    kubectl get services -n cmo platform-manager -o jsonpath='{.spec.clusterIP}'
  3. Create the scaledown.json with the details of the node that you are removing from the ObjectScale Software Bundle.
    Place this JSON payload in the node where you are going to perform the scale down of the node.
    {
      "hosts":  [{
        "hostname": "<NODE_HOSTNAME>"
      }],
      "remove_os_packages": "true"
    }
    NOTE: If the remove_os_packages parameter is set to true, the OS packages are removed from the node. This precludes the user from adding the node back to the cluster without reinstalling those OS packages.
    For example:
    {
        "worker": [{
            "hostname": "hostname6",
        }],
        "remove_os_packages": "true" 
    }
  4. Scale down the node using the CMO Platform Manager scale down API.
    NOTE: If the node is unreachable (the logs read "Unreachable=1"), a scale down operation would report failure, even though the scale down happens successfully.
    curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request DELETE --data @scaledown.json https://<CMO_PLATFORM_MANAGER_IP>/v3/clusters/nodes -v -k | json_pp
    For example:
    ......
    {
       "created_at" : "2023-04-15T11:35:35Z",
       "completed_tasks" : 0,
       "total_tasks" : 273,
       "recap" : {
          "hosts" : {}
       },
       "id" : "ac2324c5-0112-45f3-83e9-4f018d24ca57",
       "link" : {
          "href" : "https://0.0.0.0:8080/v1/status/ac2324c5-0112-45f3-83e9-4f018d24ca57",
          "rel" : "self"
       },
       "logs" : "",
       "state" : "created",
       "updated_at" : "2023-04-15T11:35:36Z",
       "playbook_id" : "remove-node"
    }
  5. Collect the "id" value from the returned output. You will use this value in the next step.
    For previous example, the "id" value is ac2324c5-0112-45f3-83e9-4f018d24ca57.
  6. After performing the scale down API, check the status of the operation through the API below:
    NOTE: The CMO Platform Manager TOKEN may expire, and be refreshed by running:
    export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')
    curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request GET https://<CMO_PLATFORM_MANAGER_IP>/v1/status/<ID> -k | jq
    When the operation is finished, the operation "state" is marked as "complete".
    NOTE: In certain situations, the status may show as Failed when the failure node was removed successfully. Check the node status.
  7. Confirm that the node has been removed from the node list.
    kubectl get node
  8. Scale up a node with reference to the old, removed node.
    Place this JSON payload in the node where you are going to perform the scale up of the node.
    {
      "credentials": [{
        "name": "<HOSTNAME>",
        "type": "password",
        "password": "<PASSWORD>"
      }],
      "hosts": [{
        "hostname": "<NODE_HOSTNAME>",
        "managementhost": "<HOST_IP>",
        "kuberneteshost": "<HOST_IP>",
        "hostCredentials": "<HOST_CREDS>",
        "topology": {
          "role": "<REMOVED_NODE_ROLE>"
        }
      }]
    }
    For example:
    {
      "credentials": [{
      "name": "mykey1",
      "type": "password",
      "password": "ChangeMe"}],
      "hosts": [{
        "hostname": "hostname6",
        "managementhost": "10.236.227.214",
        "kuberneteshost": "10.236.227.214",
        "hostCredentials": "mykey1",
        "topology": {
            "role": "worker"} 
      }]
    }
    }
  9. Confirm that daemonset pods are scheduled on the new node.
    kubectl get pod -A -o wide | grep <NODE_NAME>
  10. Call the CMO Platform Manager API to initiate the scaling-up operation.

    Run this command from the directory where the scaleup.json file exists.

    curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN"  --request POST --data @scaleup.json https://<CMO_PLATFORM_MANAGER_IP>/v3/clusters/nodes -v -k | json_pp
    For example:
    curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN"  --request POST --data @scaleup.json https://10.43.78.77:7070/v3/clusters/nodes -v -k | json_pp
    ......
    {
       "created_at" : "2023-04-15T11:35:35Z",
       "completed_tasks" : 0,
       "total_tasks" : 273,
       "recap" : {
          "hosts" : {}
       },
       "id" : "286bdb32-ff07-4e46-947e-e4c9e9b98338",
       "link" : {
          "href" : "https://0.0.0.0:8080/v1/status/286bdb32-ff07-4e46-947e-e4c9e9b98338",
          "rel" : "self"
       },
       "logs" : "",
       "state" : "created",
       "updated_at" : "2023-04-15T11:35:36Z",
       "playbook_id" : "scale"
    }
  11. Collect the "id" value from the returned output. You will use this value in the next step.
    For previous example, the "id" value is 286bdb32-ff07-4e46-947e-e4c9e9b98338.
  12. After performing the scale up API, check the status of the operation:
    NOTE: The CMO Platform Manager TOKEN may expire, and need to be refreshed by running:
    export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')
    curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request GET https://<CMO_PLATFORM_MANAGER_IP>/v1/status/<ID> -k | jq 

    When the operation is finished, the operation "state" is marked as "complete".

  13. Confirm that the new node appears in the node list.
    kubectl get node

NOTE: Although the status of this operation may appear as failed, but the failure node could be removed successfully. Check the node status.

  1. Delete the PVC, volumes, and LVGs of stateful pods on the removed node.
    Retrieve all PVCs bound to the node to be removed.
    NOTE: The node name is listed as part of the volume.kubernetes.io/selected-node annotation in the describe output of each PVC.
    The PVC names and the described details are obtained with the following commands.
    1. Get PVC names:
      kubectl get pvc
    2. Get the details for each listed PVC:
      kubectl describe pvc <PVC_NAME>
    3. Get the node for each listed PVC:

      PVCs are namespace-scoped resources. Repeat this step for all namespaces used by ObjectScale.

      for i in `kubectl get pvc --no-headers -o jsonpath="{.items[*].metadata.name}"`; do echo "=== $i"; kubectl get pvc $i -o json | grep selected-node | grep -v "{}"; done
    4. Patch and remove volumes:
      kubectl get volume | grep <NODE_ID> | awk '{print $1}' | xargs kubectl patch volume --type merge -p '{"metadata":{"finalizers":null}}'
      kubectl get volume | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete volume
    5. Patch and remove volumes:
      kubectl get lvg | grep <NODE_ID> | awk '{print $1}' | xargs kubectl patch lvg --type merge -p '{"metadata":{"finalizers":null}}'
      kubectl get lvg | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete lvg
  2. Clean up CSI resources.
    More cleanup steps are required for failed node removal, and may otherwise be required if the PMM procedure fails.
    1. Delete CSI Bare-Metal Node:
      kubectl get csibmnode | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete csibmnode
    2. Delete Drive CRs:
      kubectl get drive | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete drive
    3. Delete Available Capacity:
      kubectl get ac | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete ac
  3. Verify Resource Removal.
    1. Check for CSI Bare-Metal Node:
      kubectl get csibmnodes | grep <NODE_ID>
    2. Check for available capacity:
      kubectl get ac | grep <NODE_ID>
    3. Check for drive CRs:
      kubectl get drive | grep <NODE_ID>
  4. Delete pending stateful pods.
    kubectl get pods -o wide -A | grep Pending
    NOTE: After the removal of a failed node, there may be some pods left in the Pending state. These are likely StatefulSet pods that were previously running on the removed node. This includes SS, influxdb, bookie, and atlas pods. Once deleted, they, along with their associated volumes, are re-created on another available node.

Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\