9 Dec

Oracle tablespace hot backup mode revisited

Here’s a revised version of an old popular article I wrote over ten years ago.  I wrote this when I was at Amazon.com, long before I came to work at remote DBA provider Blue Gecko. Enjoy!

Oracle’s pre-RMAN hot backup mode is the subject of one of the most pervasive and persistent misconceptions about Oracle.

During an Oracle tablespace hot backup, you (or your script) puts a tablespace into backup mode, then copies the datafiles to disk or tape, then takes the tablespace out of backup mode. These steps are widely understood by most DBAs.

However, there is a popular misconception that datafiles are “quiesced,” “frozen,” “offlined” or “locked” during backup mode. So many people think it is true, that it appears in some books on Oracle and on numerous websites. Some have even reported that they learned this from DBA class instructors.

The myth has a couple permutations. One is that while the datafiles are allegedly not writable, changes are stored somewhere in the SGA, the redologs, the rollback segments or some combination thereof, then written back into the datafile when the tablespace is taken out of backup mode. There is a passage in the SAMS title Oracle Unleashed describing this supposed mechanism.

When you place a tablespace in backup mode, the Oracle instance notes that a backup is being performed and internally compensates for it. As you know, it is impossible to make an authentic copy of a database file that is being written to. On receipt of the command to begin the backup, however, Oracle ceases to make direct changes to the database file. It uses a complex combination of rollback segments, buffers, redo logs, and archive logs to store the data until the end backup command is received and the database files are brought back in sync. Simplifying a hot backup in this way is tantamount to classifying the USS Nimitz as a boat. The complexity of the actions taken by the Oracle RDBMS under a hot backup could consume an entire chapter and is beyond the scope of this book. What you should understand is the trade-off for taking a hot backup is increased use of rollback segments, redo logs, archive logs, and internal buffer areas within the SGA.

Oracle Unleashed, Copyright © SAMS/Macmillan, Inc. 1997, chapter 14)

No No No!  Stop making stuff up! Oracle’s tablespace hot backup does not work this way at all. It is actually a simple, elegant and failure-resistant mechanism. It absolutely does not stop writing to the datafiles. It actually allows continued operation of the database almost exactly as during normal operation. Contrary to the characterization as “complex” in SAMS Oracle Unleashed, it can be  summarized in a few steps:

  • DBWn checkpoints the tablespace (writes out all dirty blocks as of a given SCN)
  • CKPT stops updating the Checkpoint SCN field in the datafile headers and begins updating the Hot Backup Checkpoint SCN field instead
  • LGWR begins logging full images of changed blocks the first time a block is changed after being written by DBWn

Those three actions are all that is required to guarantee consistency once the file is restored and recovery is applied. By freezing the Checkpoint SCN, any subsequent recovery on that backup copy of the file will know that it must commence at that SCN. Having an old SCN in the file header tells recovery that the file is an old one, and that it should look for the archivelog containing that SCN, and apply recovery starting there.

Note that during hot backup mode, checkpoints to datafiles are not suppressed. Only the main Checkpoint SCN flag is frozen, but CKPT continues to update a Hot Backup Checkpoint SCN in the file header.

There is a confusing side effect of having the Checkpoint SCN frozen at an SCN earlier than the true checkpointed SCN of the database. In the event of a system crash or a shutdown abort during hot backup of a tablespace, the automatic crash recovery routine during startup will look at the file headers, think that the files for that tablespace are out of date, and will suggest that you need to apply old archived redologs in order to bring them back into sync with the rest of the database. Fortunately, no media recovery is necessary. With the database started up in mount mode:

SQL> alter database end backup;

This action will bring the Checkpoint SCN in the file headers in sync with the Hot Backup Checkpoint SCN (which is a true representation of the last SCN to which the datafile is checkpointed). Once you do this, normal crash recovery can proceed during ‘alter database open;’.

By initially checkpointing the datafiles that comprise the tablespace and logging full block images to redo, Oracle guarantees that any blocks changed in the datafile while in hot backup mode will also be present in the archivelogs in case they are ever used for a recovery. Most of the Oracle user community knows that  Oracle generates a greater volume of redo during hot backup mode. This is the result of Oracle logging of full images of changed blocks in these tablespaces. Normally, Oracle writes a change vector to the redologs for every change, but it does not write the whole image of the database block. Full block image logging during backup eliminates the possibility that the backup will contain unresolvable split blocks. To understand this reasoning, you must first understand what a split block is.

Typically, Oracle database blocks are a multiple of O/S blocks. For example, most Unix filesystems have a default block size of 512 bytes, while Oracle’s default block size is 8k. This means that the filesystem stores data in 512 byte chunks, while Oracle performs reads and writes in 8k chunks or multiples thereof. While backing up a datafile, your backup script makes a copy of the datafile from the filesystem, using O/S utilities such as copy, dd, cpio, or OCOPY. As it is making this copy, your process is reading in O/S-block-sized increments. If DBWn happens to be writing a DB block into the datafile at the same moment that your script is reading that block’s constituent O/S blocks, your copy of the DB block could contain some O/S blocks from before the database performed the write, and some from after. This would be a split block. By logging the full block image of the changed block to the redologs, Oracle guarantees that in the event of a recovery, any split blocks that might be in the backup copy of the datafile will be resolved by overlaying them with the full legitimate image of the block from the archivelogs. Upon completion of a recovery, any blocks that got copied in a split state into the backup will have been resolved by overlaying them with the block images from the archivelogs.

All of these mechanisms exist for the benefit of the backup copy of the files and any future recovery. They have very little effect on the current datafiles and the database being backed up. Throughout the backup, server processes read datafiles DBWn writes them, just as when a backup is not taking place. The only difference in the open database files is the frozen Checkpoint SCN, and the active Hot Backup Checkopint SCN. To demonstrate the principle, we can formulate a simple proof:

Create a table and insert a row:

SQL> create table fruit (kind varchar2(32)) tablespace users;
Table created.

SQL> insert into fruit values ('orange');
1 row created.

SQL> commit;
Commit complete.

Force a checkpoint, to flush dirty blocks to the datafiles.

SQL> alter system checkpoint;
System altered.

Get  the file name and block number where the row resides:

SQL> select dbms_rowid.rowid_relative_fno(rowid) file_num,
            dbms_rowid.rowid_block_number(rowid) block_num,
            kind
     from fruit;
FILE_NUM BLOCK_NUM KIND
-------- --------- ------
       4       183 orange

SQL> select name from v$datafile where file# = 4;
NAME
-----------------------------
/u01/oradata/uw01/users01.dbf

Use the dd utility to skip to block 183 and extract the DB block containing the row:

unixhost% dd bs=8k skip=183 count=1 if=/u01/oradata/uw01/users01.dbf | strings
1+0 records in
16+0 records out
orange

Now we put the tablespace into hot backup mode:

SQL> alter tablespace users begin backup;
Tablespace altered.

Update the row, commit, and force a checkpoint on the database.

SQL> update fruit set kind = 'plum';
1 row updated

SQL> commit;
Commit complete.

SQL> alter system checkpoint;
System altered.

Extract the same block. It shows that the DB block has been written to disk during backup mode:

unixhost% dd bs=8k skip=183 count=1 if=/u01/oradata/uw01/users01.dbf | strings
1+0 records in
16+0 records out
plum
orange

Don’t forget to take the tablespace out of backup mode!

SQL> alter tablespace administrator end backup;
Tablespace altered.

It is quite clear from this demonstration that datafiles receive writes even during hot backup mode!

Related posts from the blog:

  1. Exploring Oracle 11g Tablespace Encryption
    Tablespace encryption encrypts data at the datafile level to keep people from being able to peek at the...
  2. OurSQL Episode 32: Backup tools you already have [Backup series #2]
    Show notes: Previous backup series podcast: Backup glossary [Backup series #1] Feedback: In the backup glossary we asked...
  3. Defining MySQL Backup Strategies
    Backups may seem like a complicated subject, but they can be simplified into one multi variable question.  What...
  4. OurSQL Episode 41: Zmanda [backup series #6]
    Previously in the backup series: Backup Series #1: Backup Glossary Backup Series #2: Backup Tools You Already Have...
  5. InnoDB tablespace fragmentation (find it and fix it!)
    If you frequently delete rows (or update rows with variable-length data types), you can end up with a...

10 Responses to “Oracle tablespace hot backup mode revisited”

  1. Rahul 05. Jan, 2011 at 8:13 pm #

    Sir execellent article.

  2. Sudarshan 19. Feb, 2011 at 12:57 am #

    Very nice article, clearing all misconceptions regarding hot backups. I guess at the end its “alter tablespace users end backup;”. Also it was not clear why the second dd command shows both “plum” and “orange”.

    Anyway, thanks for the article, it was something I was looking for.

  3. Sameer 23. Mar, 2011 at 6:49 am #

    very nice article indeed

  4. Daniel Morgan 06. May, 2011 at 8:30 am #

    Oracle has a substantive bug in the hot backup code for all currently supported versions. Going into hot backup mode substantially increases the burn-rate of SCNs. Coming out of hot backup mode does not clear the extra activity. For systems with average transaction volumes exceeding 3000 commits/sec this can be fatal.

    A patch has been written and is currently being tested.

  5. Paul 11. Jan, 2012 at 2:03 am #

    Daniel,

    Could you give the bug number of a MyOracleSupport document number where I can find additional information about the nbug you mentioned?

  6. Gus 12. Jan, 2012 at 12:17 pm #

    Has anyone experienced a long delay putting several tablespaces in begin backup mode and then receiving errors when issuing the end backup mode command?

  7. Daniel 07. Feb, 2012 at 4:27 am #

    Nice article, but other then testing yourself, do you have any Oracle doc references?
    Can you give us some links?
    Thanks

  8. Saurabh Sharma 16. Feb, 2012 at 9:04 pm #

    excellent explanation..

  9. Santosh Kumar 02. Apr, 2012 at 2:46 am #

    @Paul

    Perhaps Daniel is referring the backup bug which was pointed out by Infoworld sometimes back.It was kept secret purposefully, but AFAIK the bug is fixed and explanation is published on infoworld site.
    I doubt that a MOS exist for the explanation.

    http://www.infoworld.com/d/security/fundamental-oracle-flaw-revealed-184163-0?page=0,1

    Regards,
    S.K.

  10. Bolla 16. Jul, 2012 at 3:00 am #

    it is very nice article

Leave a Reply