Before I joined Blue Gecko, I did independent remote DBA work, and called myself ORA-600 Consulting. Stemming from my hair-raising experiences in the trenches at Amazon in the late ’90s / early 2000s, I decided to specialize in emergency DBA work for companies in the midst of crises (I know, great idea for someone who wanted to get away from the Amazon craziness, right?).
One day in 2009, a company in Florida called my cell phone at 2AM. They described their problem as follows:
We have a 32-bit Intel server running Red Hat Enterprise Linux 4 and Oracle Database Enterprise Edition 18.104.22.168. There are four databases ranging in size from 20G to 100G. The storage is EXT3 filesystems on partitions of an Apple Xserv RAID5 array.
We had a power outage yesterday, and the database server powered down and booted back up. Prior to yesterday, it has not rebooted for about one year. We have been running trouble-free for the previous year. Upon reboot, Oracle started automatically, but all of the databases appeared as they did about one year ago. It is like the database hasn’t been saving the changes we have been making for the past year. None of the inserts, updates or deletes made in the past year are present in the databases. We are absolutely flummoxed. Please help!
I logged into the server and it was just as they described. Even the alert log and messages files ended suddenly about one year prior, and picked up again on the day of the most recent reboot. There was no trace of the intervening 12 months of work. The customer was ready to resort to their backups, but wanted to understand the problem before they proceeded. In addition, restoring backups would mean losing the last 24 hours of transactions, since archivelogs had not gone to tape for that long, and they were missing just like everything else from before the most recent reboot.
They weren’t the only ones who were flummoxed. I just sat there thinking, “where do I start?” After some poking around, though, I solved the problem. Any guesses what went wrong here? I’ll post the solution in about a week. No fair posting the solution if I’ve told you this story before!