Welcome

Dell Sites

Dell Technologies
Premier Sign In
Partner Program Sign In
Support

Dell Sites

Dell Technologies
Premier Sign In
Partner Program Sign In
Support

Sign Out

Welcome to Dell

My Account

Place orders quickly and easily
View orders and track your shipping status
Create and access a list of your products

ID/EN

Products
Solutions
Services
Contact Us

Support
Product Support
Manuals

Dell ObjectScale 1.3 Administration Guide

Notes, cautions, and warnings
Revision history
- Document feedback
About using this guide
Overview
Getting Started with ObjectScale
IAM accounts and account entities
Object stores
Buckets
Federate ObjectScale Systems
ObjectScale Replication
Platform settings
Management Users and Roles in ObjectScale
- Management Users in ObjectScale Software Bundle
- Approver users
Authentication Providers in ObjectScale Software Bundle
- Configuring external authentication providers
ObjectScale Administration
ObjectScale Management REST API
Accessing data with IAM and S3
- ObjectScale IAM overview
- Amazon S3 API support in ObjectScale
Alerts
- About ObjectScale instance event and issue monitoring
- Monitoring Events, Audits, and Alerts
Metrics for ObjectScale and object stores
Maintain ObjectScale
- About ObjectScale service procedures

PDF

Loading, Please wait

Replace a failed node within the ObjectScale Software Bundle

Use this Node Replacement service procedure to replace a failed node.

Prerequisites

Ensure that the new node has the same operating system version and networking configuration as the other nodes within the ObjectScale Software Bundle. Ensure that the system has an extra FTT quota, that is, if the system is FTT=1, ensure that there are no extra nods down. If the system is FTT=2, the other down node size is <=1. Ensure that there are no other ongoing service procedures or recoveries.

NOTE:If this FTT requirement is not met, do not proceed with these steps; call Dell Support.

About this task

When a node goes to failure, all pods on that node turn to terminating state. Stateless pods would be rescheduled to another available node after five minutes, and stateful pods would keep terminating.

Steps

The ObjectScale Software Bundle CMO Platform Manager APIs require a keycloak token to authenticate the requests for cluster management tasks.

The ObjectScale Software Bundle contains a CMO Platform Manager running on Kubernetes within the cluster that is used to request cluster management tasks, like service procedures.

Collect the keycloak account information from the secret:

export KEYCLOAK_USER=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-username"]' | base64 --decode)
export KEYCLOAK_PASSWORD=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-password"]' | base64 --decode)
export KEYCLOAK_REALM=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-realm"]' | base64 --decode)
export KEYCLOAK_CLIENT=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-client"]' | base64 --decode)
export KEYCLOAK_CLIENT_SECRET=$(kubectl get secret keycloak-pm-auth-info -n cmo -o json | jq -r '.data["keycloak-credentials-secret"]' | base64 --decode)

Set an environment variable for the access token:

export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')

Collect the IP address of the CMO Platform Manager.

kubectl get services -n cmo platform-manager -o jsonpath='{.spec.clusterIP}'

Create the scaledown.json with the details of the node that you are removing from the ObjectScale Software Bundle.
Place this JSON payload in the node where you are going to perform the scale down of the node.
```
{
  "hosts":  [{
    "hostname": "<NODE_HOSTNAME>"
  }],
  "remove_os_packages": "true"
}
```
NOTE:If the remove_os_packages parameter is set to true, the OS packages are removed from the node. This precludes the user from adding the node back to the cluster without reinstalling those OS packages.
For example:
```
{
    "worker": [{
        "hostname": "hostname6",
    }],
    "remove_os_packages": "true" 
}
```

Scale down the node using the CMO Platform Manager scale down API.

NOTE: If the node is unreachable (the logs read "Unreachable=1"), a scale down operation would report failure, even though the scale down happens successfully.

curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request DELETE --data @scaledown.json https://<CMO_PLATFORM_MANAGER_IP>/v3/clusters/nodes -v -k | json_pp

For example:

......
{
   "created_at" : "2023-04-15T11:35:35Z",
   "completed_tasks" : 0,
   "total_tasks" : 273,
   "recap" : {
      "hosts" : {}
   },
   "id" : "ac2324c5-0112-45f3-83e9-4f018d24ca57",
   "link" : {
      "href" : "https://0.0.0.0:8080/v1/status/ac2324c5-0112-45f3-83e9-4f018d24ca57",
      "rel" : "self"
   },
   "logs" : "",
   "state" : "created",
   "updated_at" : "2023-04-15T11:35:36Z",
   "playbook_id" : "remove-node"
}

Collect the "id" value from the returned output. You will use this value in the next step.
For previous example, the "id" value is ac2324c5-0112-45f3-83e9-4f018d24ca57.

After performing the scale down API, check the status of the operation through the API below:

NOTE:The CMO Platform Manager TOKEN may expire, and be refreshed by running:

export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')

curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request GET https://<CMO_PLATFORM_MANAGER_IP>/v1/status/<ID> -k | jq

When the operation is finished, the operation "state" is marked as "complete".

NOTE:In certain situations, the status may show as Failed when the failure node was removed successfully. Check the node status.

Confirm that the node has been removed from the node list.
```
kubectl get node
```

Scale up a node with reference to the old, removed node.

Place this JSON payload in the node where you are going to perform the scale up of the node.

{
  "credentials": [{
    "name": "<HOSTNAME>",
    "type": "password",
    "password": "<PASSWORD>"
  }],
  "hosts": [{
    "hostname": "<NODE_HOSTNAME>",
    "managementhost": "<HOST_IP>",
    "kuberneteshost": "<HOST_IP>",
    "hostCredentials": "<HOST_CREDS>",
    "topology": {
      "role": "<REMOVED_NODE_ROLE>"
    }
  }]
}

For example:

{
  "credentials": [{
  "name": "mykey1",
  "type": "password",
  "password": "ChangeMe"}],
  "hosts": [{
    "hostname": "hostname6",
    "managementhost": "10.236.227.214",
    "kuberneteshost": "10.236.227.214",
    "hostCredentials": "mykey1",
    "topology": {
        "role": "worker"} 
  }]
}
}

Confirm that daemonset pods are scheduled on the new node.
```
kubectl get pod -A -o wide | grep <NODE_NAME>
```

Call the CMO Platform Manager API to initiate the scaling-up operation.

Run this command from the directory where the scaleup.json file exists.

curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN"  --request POST --data @scaleup.json https://<CMO_PLATFORM_MANAGER_IP>/v3/clusters/nodes -v -k | json_pp

For example:

curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN"  --request POST --data @scaleup.json https://10.43.78.77:7070/v3/clusters/nodes -v -k | json_pp
......
{
   "created_at" : "2023-04-15T11:35:35Z",
   "completed_tasks" : 0,
   "total_tasks" : 273,
   "recap" : {
      "hosts" : {}
   },
   "id" : "286bdb32-ff07-4e46-947e-e4c9e9b98338",
   "link" : {
      "href" : "https://0.0.0.0:8080/v1/status/286bdb32-ff07-4e46-947e-e4c9e9b98338",
      "rel" : "self"
   },
   "logs" : "",
   "state" : "created",
   "updated_at" : "2023-04-15T11:35:36Z",
   "playbook_id" : "scale"
}

Collect the "id" value from the returned output. You will use this value in the next step.
For previous example, the "id" value is 286bdb32-ff07-4e46-947e-e4c9e9b98338.

After performing the scale up API, check the status of the operation:

NOTE:The CMO Platform Manager TOKEN may expire, and need to be refreshed by running:

export TOKEN=$(curl -L -X POST https://keycloak-http.atlantic/auth/realms/$KEYCLOAK_REALM/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode client_id=$KEYCLOAK_CLIENT --data-urlencode 'grant_type=password' --data-urlencode client_secret=$KEYCLOAK_CLIENT_SECRET --data-urlencode 'scope=openid' --data-urlencode username=$KEYCLOAK_USER --data-urlencode password=$KEYCLOAK_PASSWORD | jq -r '.access_token')

curl --header "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --request GET https://<CMO_PLATFORM_MANAGER_IP>/v1/status/<ID> -k | jq

When the operation is finished, the operation "state" is marked as "complete".

Confirm that the new node appears in the node list.
```
kubectl get node
```

NOTE:Although the status of this operation may appear as failed, but the failure node could be removed successfully. Check the node status.

Delete the PVC, volumes, and LVGs of stateful pods on the removed node.

Retrieve all PVCs bound to the node to be removed.

NOTE:The node name is listed as part of the volume.kubernetes.io/selected-node annotation in the describe output of each PVC.

The PVC names and the described details are obtained with the following commands.

Get PVC names:
```
kubectl get pvc
```
Get the details for each listed PVC:
```
kubectl describe pvc <PVC_NAME>
```

Get the node for each listed PVC:

PVCs are namespace-scoped resources. Repeat this step for all namespaces used by ObjectScale.

for i in `kubectl get pvc --no-headers -o jsonpath="{.items[*].metadata.name}"`; do echo "=== $i"; kubectl get pvc $i -o json | grep selected-node | grep -v "{}"; done

Patch and remove volumes:

kubectl get volume | grep <NODE_ID> | awk '{print $1}' | xargs kubectl patch volume --type merge -p '{"metadata":{"finalizers":null}}'

kubectl get volume | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete volume

Patch and remove volumes:

kubectl get lvg | grep <NODE_ID> | awk '{print $1}' | xargs kubectl patch lvg --type merge -p '{"metadata":{"finalizers":null}}'

kubectl get lvg | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete lvg

Clean up CSI resources.

More cleanup steps are required for failed node removal, and may otherwise be required if the PMM procedure fails.

Delete CSI Bare-Metal Node:

kubectl get csibmnode | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete csibmnode

Delete Drive CRs:

kubectl get drive | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete drive

Delete Available Capacity:

kubectl get ac | grep <NODE_ID> | awk '{print $1}' | xargs kubectl delete ac

Verify Resource Removal.

Check for CSI Bare-Metal Node:
```
kubectl get csibmnodes | grep <NODE_ID>
```
Check for available capacity:
```
kubectl get ac | grep <NODE_ID>
```
Check for drive CRs:
```
kubectl get drive | grep <NODE_ID>
```

Delete pending stateful pods.
```
kubectl get pods -o wide -A | grep Pending
```
NOTE:After the removal of a failed node, there may be some pods left in the Pending state. These are likely StatefulSet pods that were previously running on the removed node. This includes SS, influxdb, bookie, and atlas pods. Once deleted, they, along with their associated volumes, are re-created on another available node.

Data is not available for the Topic

Rate this content

All fields are required unless marked otherwise.

Accurate

not accurate

somewhat accurate

mostly accurate

accurate

very accurate

Useful

not useful

somewhat useful

mostly useful

useful

very useful

Easy to understand

not easy

somewhat easy

mostly easy

easy

very-easy

Was this article helpful?

Yes

Send us feedback (Optional)

0/3000 characters

Comments cannot contain these special characters: <>()\

Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.

Please provide ratings (1-5 stars).

Please select whether the article was helpful or not.

Comments cannot contain these special characters: <>()\

ID/EN

Site Map

Account

My Account
Profile Settings
My Products

Support

Support Home
Contact Technical Support

Connect with Us

Community
Contact Us

Site Map

ID/EN

Our Offerings

Artificial Intelligence
Products
Solutions
Services

Our Company

Who We Are
Careers
Dell Technologies Capital
Investors
Newsroom
Perspectives
Recycling
ESG & Impact
Customer Stories

Our Partners

Find a Partner
OEM Solutions
Partner Program

Resources

Blog
Events
Privacy Centre
Resource Library
Security & Trust Centre
Trial Software Downloads

Dell Technologies
Dell Premier

Terms of Sale
Privacy Statement
Legal & Regulatory
Accessibility