Unsolved
17 Posts
0
909
Why is avamar automatically deleting checkpoints? Also why is it failing to validate new checkpoints?
I've deployed an AVE vm in our environment and created and validated a checkpoint on it. Version 19.4.0-116. It's connected to an updated Data Domain to use as storage.
The server is set to create and validate a checkpoint automatically once a day, but for some reason after a few days it keeps failing to validate new checkpoints. The errors it has returned are:
hfscheck of cp.# failed on error: MSG_ERR_DDR_ERROR
failed hfscheck maintenance with error MSG_ERR_KILLED
failed checkpoint maintenance with error MSG_ERR_INPROGRESS
If someone could point me in the direction of fixing these errors or at least understanding why they keep coming up that would be great.
The other issue though is that it's deleting the validated checkpoints that existed before these broken ones so I can't even roll back. Why is it deleting them and how do I set that autodelete to be longer?
Niko Virta
1 Rookie
1 Rookie
•
27 Posts
0
April 28th, 2022 07:00
To understand your problem, you need to find the reason what is making error. This is tracked in ddrmaintlogs. Bare in mind that you want to find the first error (it can be x versions before the current running log) that caused this.
Here is example how to grep it:
grep -i Error /usr/local/avamar/var/ddrmaintlogs/ddrmaint.log|grep -v -i "gc"
It can be from bad backup to whatever, so this is essential as you need to understand what is causing the problem. Usually its some backup that is corrupted and you need to remove it.
And for the other issue, I would not touch to default (and recommended) checkpoint retention scheme, which is cpmostrecent="2", cphfschecked="1" as if your system is working properly as it should there absolutely is no need to tune these in normal environment.
You can and should (if you don't know what to do) open SR to Dell if DDR_ERROR exists in Avamar.
RusscsCarter
17 Posts
0
April 28th, 2022 08:00
Thank you, I'll look into these logs to better understand the error.
Though I don't understand the point in having checkpoints, if when my avamar stops working it's already deleted the checkpoints that could be helpful. Please explain.
Ilavarasan IC
1 Rookie
1 Rookie
•
50 Posts
0
April 28th, 2022 08:00
Login to AVE via command line and run the command " status.dpn " and " dpnctl status " it should give you an idea of what your system is currently doing.
look at validated checkpoints as last know good config, in case if there are any issue you can go to last good config.
In your case issue might not be checkpoint it might be something else.
Niko Virta
1 Rookie
1 Rookie
•
27 Posts
1
April 28th, 2022 09:00
Checkpoints are system-wide backups of Avamar for disaster recovery. They are validated after done and thus as you have DDR ERROR currently in system the validation has not succeeded.
As they are systemwide, you would not use them in DDR_ERROR situation as usually backups are still flowing from other clients or for example to other Data Domains. When you rollback to yesterdays checkpoint you lose everything that has happened after that checkpoint. That is why it is not used unless there is no other way to recover the system.
Good example when you would use rollback is unsuccessfull upgrade when there is no other way to get system up and running. Checkpoint is always taken before the upgrade. But hopefully you don't have to do rollback ever.