Prepare a failed node for node hardware or software maintenance (ObjectScale Software Bundle)
Follow this Node Reparation procedure for a failed node.
About this task
Stateful data on the node to be repaired are not deleted, and nor are stateful ObjectScale pods rescheduled to other nodes in the cluster.
Steps
The ObjectScale Software Bundle CMO Platform Manager APIs require a keycloak token to authenticate the requests for cluster management tasks.
The ObjectScale Software Bundle contains a CMO Platform Manager running on Kubernetes within the cluster that is used to request cluster management tasks, like service procedures.
Collect the keycloak account information from the secret:
Scale down the node using the CMO Platform Manager scale down API.
NOTE: If the node is unreachable (the logs read "Unreachable=1"), a scale down operation would report failure, even though the scale down happens successfully.
When the operation is finished, the operation
"state" is marked as
"complete".
NOTE:In certain situations, the status may show as
Failed when the failure node was removed successfully. Check the node status.
Confirm that the node has been removed from the node list.
kubectl get node
Verify that the statefulset pods have move to
Pending state after node removal:
kubectl get pods -o wide | grep -v Running
Fix the node while it is offline, and then go to the next step.
On the node, create the
scaleup.json file with the necessary details for the node.
NOTE:When a node is added to a cluster, a situation may occur whereby the
/etc/hosts file for the added node is not updated correctly, which causes issues when the cluster is upgraded. To avoid failures during the upgrade process, perform the following steps after adding a node:
Retrieve the helmrepo service IP address.
kubectl -n cmo get svc helmrepo
For example:
kubectl -n cmo get svc helmrepo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
helmrepo ClusterIP 172.43.174.187 <none> 30036/TCP 12d
Add an entry for the service to the /etc/hosts file of the added node. For example:
<CLUSTER_IP> helmrepo
For example:
172.43.174.187 helmrepo
Place this JSON payload in the node where we are going to perform the scale up of the node.