Start a Conversation

Unsolved

This post is more than 5 years old

1274

April 25th, 2012 22:00

Why is instance still running w/o control file?

I chatted with my DBA friends, was told they can remove control file under Linux and instance is running, even able to complete checkpoint.

I am quite confused with this, as knowledge I received, instance would immediately crash if control file was removed.

Any insight?

Eddy

46 Posts

May 2nd, 2012 03:00

If you remove a file under Linux / Unix, but a process has the file handle open, then the file still exists and will not be deleted until the last process accessing that file ends. It just is no longer listed in the directory.

So you might keep running but Oracle might complain after you stop and restart the database.

225 Posts

May 3rd, 2012 18:00

Bart,

Thanks for your information. Could I understand it in this way? Some of Oracle instance ( which one?) still keep the file handle open, Linux just unlink file. If so, could we have any workaround the get it back?

Eddy

46 Posts

May 4th, 2012 04:00

In Unix, as long as there is at least one process that keeps the handle open, the file will not disappear really.

Then the only way to get the file back is if that process would "link" the file again to an inode entry on the filesystem (and add a path with filename etc etc).

Or you would need to hack the OS somehow. All in all not a very solid approach. Maybe with the tool "lsof" you might see the Oracle process holding the file open.

As to what process keeps the handle open, not sure, probably a bunch of Oracle processes. If you're not into kernel hacking, you must assume the file destroyed if it is no longer visible in the filesystem. Note that this is true for all files (at least on a regular FS) not just the control file. If you use ASM then I'm not sure if it works the same way.

Regards

Bart

225 Posts

May 7th, 2012 03:00

Thanks for your information. If it was, I think the following approach might work, not tested yet

Use isof to find which process has the file handle

Cd /proc//fd

dd to temp

thoughts?

Eddy

46 Posts

May 8th, 2012 00:00

Hi Eddy,

Good chance it might work like that indeed. Only testing will provide the proof...

The issue with all that is that after something has gone wrong you must be aware of what the exact problem is. If you make the slightest mistake as an admin it might not work.

But for learning such situations a test like you describe is indeed very helpful.

BTW if you are going to try, please let us know your findings :-)

My experience is (with real-world situations):

- Something goes wrong and weird errors start to appear (either on database or server level)

- Admins try a few things to fix it (they might try shutdown abort/restart)

- Then they call 3rd level support (who are now too late to identify and fix the problem)...

And if you exactly know what's wrong, then you might wonder why it went wrong in the first place...

An example.. I was working as a Unix engineer in an investment bank. At a certain day at 5am in the morning I got an emergency call (and I wasn't even on standby duty) that one of the mission critical DB servers was behaving very strange, did not allow new connections but some batches were still running. I jumped in the car (not sure if I was still wearing pyamas ;-) and an hour later or so I was physically there.

Tried logging in to Unix, did not work. Not with my own account, not with the emergency root password. Tried telnet, ftp, ssh, etc etc, nothing worked. Finally I found that Tivoli still had management access so I wrote a Tivoli script (under root) that opened an X session at my terminal. Finally I had root access. Started to look around on the system and everything looked fine. All processes were there, no full filesystems, etc etc. But something was looking strange. At first I could not see what it was... Then I found that the "root" user was listed (with tools like ls) as "Root" (with capital "R")... I opened /etc/passwd and the very first character in that file is the "r" for "root".

Turns out that one of the monkeys administrators had opened the passwd file with VI, then accidentally pushed an F-key (like F2 or something like that) which translates into ~F2, ~ in VI means swap capital, and then the ape admin routinely closed the file with "ESC-wq!" ... "root" no longer existed on the system (and Unix does not know anyone named "Root" and now a whole lot of strange things started happening... The only solution now was to fix the passwd file and reboot (consider btw what would have happened if they had hard powered off and just rebooted without fixing the passwd file)...

You cannot anticipate stuff like this. And while troubleshooting you need to try a few things which can make things worse. You might have a procedure like you provided to recover the control file if somebody accidentally deletes it, but the next low cost fool junior admin that comes along will find creative ways to mess up things in ways that you cannot think of in your wildest dreams.

So, continuous data protection, anyone? :-)

225 Posts

May 8th, 2012 03:00

Bart,

Thank for your sharing experience and reminder. The method I am working is just a workaround to an abnormal situation, very experimental. I hope it would be never applied.

For learning, I still like to prove it @ some level.

As current I do not have an Ora lab environment on hand, the following is the test procedure I planned to do

Vi a text file

Remove it

Use isof to find which process has the file handle

Cd /proc//fd

dd to temp

if works, conceptly approved, right?

Eddy

No Events found!

Top