Prepare a healthy node for node hardware or software maintenance (ObjectScale Software Bundle)
Follow this Node Reparation procedure to repair a healthy node to fix a system disk, issues with node hardware or software, or upgrade the node Operating System.
Steps
The ObjectScale Software Bundle CMO Platform Manager APIs require a keycloak token to authenticate the requests for cluster management tasks.
The ObjectScale Software Bundle contains a CMO Platform Manager running on Kubernetes within the cluster that is used to request cluster management tasks, like service procedures.
Collect the keycloak account information from the secret:
Verify that the PHASE of the cluster now displays
Maintenance.
kubectl -n <OBJECTSCALE_NAMESPACE> get ecs-cluster
NAME PHASE READY COMPONENTS S3 ENDPOINT MGMT API
ecs-cluster Maintenance 22/23 10.236.228.53:443 10.236.228.52:4443
The ObjectScale Portal UI shows the object store status as
Maintenance.
Once the taint has been applied to a node, the ObjectScale Operator creates the ObjectScale TMM service procedure. Retrieve the list of service procedures and locate the TMM service procedure with
tmm- prefixed to the service procedure name:
kubectl -n <OBJECTSCALE_NAMESPACE> get serviceprocedures
NOTE:To obtain details about a service procedure, including its status, use:
NOTE:Do not delete the service procedure while it is running.
Monitor the status of the service procedure with the following command:
while true; do kubectl -n <OBJECTSCALE_NAMESPACE> get serviceprocedures -o custom-columns=Name:metadata.name,Node:spec.nodeInfo.name,Type:spec.type,Time:metadata.managedFields[0].time,Reason:status.reason,Message:status.message; echo; sleep 5; done
The service procedure transitions through various phases as it progresses. The Reason value for the TMM service procedure should progress from
NotStarted,
In Progress,
PostCheck,
Waiting, and finally to
Success. A reason of
Success or
Waiting indicates that the service procedure has completed without error, and the node is now in TMM.
Next, place the node into maintenance mode within the CMO Platform within the ObjectScale Software Bundle.
Scale down the node using the CMO Platform Manager scale down API.
NOTE: If the node is unreachable (the logs read "Unreachable=1"), a scale down operation would report failure, even though the scale down happens successfully.
When the operation is finished, the operation
"state" is marked as
"complete".
NOTE:In certain situations, the status may show as
Failed when the failure node was removed successfully. Check the node status.
Confirm that the node has been removed from the node list.
kubectl get node
Perform any necessary maintenance on the node.
On the node, create the
scaleup.json file with the necessary details for the node.
NOTE:When a node is added to a cluster, a situation may occur whereby the
/etc/hosts file for the added node is not updated correctly, which causes issues when the cluster is upgraded. To avoid failures during the upgrade process, perform the following steps after adding a node:
Retrieve the helmrepo service IP address.
kubectl -n cmo get svc helmrepo
For example:
kubectl -n cmo get svc helmrepo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
helmrepo ClusterIP 172.43.174.187 <none> 30036/TCP 12d
Add an entry for the service to the /etc/hosts file of the added node. For example:
<CLUSTER_IP> helmrepo
For example:
172.43.174.187 helmrepo
Place this JSON payload in the node where we are going to perform the scale up of the node.