I am in the process of replacing hardware for an existing 3-node RAC cluster. This system is also a primary to a 2-node RAC standby database. To replace the hardware, my plan is to temporarily extend the cluster into a 6-node configuration, 3 old servers and 3 new servers. Once I have the instances running on the new hardware and have my applications connecting to the new instances, I will take down the old instances and retire the old servers, getting back to a 3-node configuration.
After extending the cluster to all six nodes, this past weekend I started up the new instances on the new nodes. To make my life easier, I just leveraged the DBCA for this work. After firing up the DBCA, I chose to work on a RAC database, and then chose Instance Management and then Add New Instance. Walking through the wizard I let the DBCA take care of all the details for me. Sounds simple.
This morning, I got my usual archive lag report. It looks similar to the following:
INSTANCE_NAME APPLY_LAG CURR_TIME
---------------- -------------------- -------------------
orcs1 +01 21:40:47 2012-12-03 08:00:01
I send this to my Inbox twice a day. A quick glance helps me determine if my standby is receiving and applying transactions from the primary. I have set all of my standby databases to a four hour apply delay. And my primary has ARCHIVE_LAG_TARGET set to one hour. This means the apply delay will be at least 4 hours but should be no more than 5 hours. As we can see above, we have two standby databases that have greatly exceeded the 5 hour max apply lag. Above, I have the standby with an apply lag of 1 day 21 hours! So I immediately knew something was wrong. And it did not take a rocket scientist to know that adding the new instance to the primary probably contributed to the problem.
Like I said in the beginning of this post, I have a 2-node RAC standby system. One instance is the “apply instance” and the other instance sits there relatively idle. In my apply instance alert log, I saw the following error messages:
Sat Dec 01 14:25:40 2012
Recovery created file /u01/app/oracle/oradata/orcl/data04/undotbs04.dbf
Successfully added datafile 342 to media recovery
Datafile #342: '/u01/app/oracle/oradata/orcl/data04/undotbs04.dbf'
No OMF destination specified, unable to create logs
Errors with log /u01/app/oracle/admin/orcs/arch/3_89914_677462342.dbf
MRP0: Background Media Recovery terminated with error 1264
Errors in file /u01/app/oracle/diag/rdbms/orcs/orcs2/trace/orcs2_pr00_29759.trc:
ORA-01264: Unable to create logfile file name
Recovery interrupted!
Sat Dec 01 14:25:51 2012
Recovered data files to a consistent state at change 192271576009
Sat Dec 01 14:25:51 2012
MRP0: Background Media Recovery process shutdown (orcs2)
Since I have my standby database set to STANDBY_FILE_MANAGEMENT=AUTO, the first part of the messages make sense. When you add a new instance to a RAC database, you have to provide a Undo Tablespace just for that instance and you also have to provide online redo log groups dedicated to that instance’s thread. The DBCA specifically asked me questions pertaining to the undo and redo file structures. In the alert log contents above, we can see that the standby successfully added datafile 342, which is my Undo tablespace. But the standby was unable to add the online redo logs. If you want the standby to be able to automatically add the online redo logs, you need to specify OMF parameters, which I am reluctant to do. Since the online redo log file addition resulted in an error, the standby stopped media recovery. The standby is still receiving logs.
I did not find much on Metalink or by doing Google searches on how to solve this issue, but here are the steps that I took to get Media Recovery back up and running. On the standby database (I did this on the apply instance but it should be viable on any instance in the RAC standby database):
1. alter database recover managed standby database cancel;
alter database recover managed standby database cancel
*
ERROR at line 1:
ORA-16136: Managed Standby Recovery not active
This should not be a shock because we know Managed Recovery aborted. But for completeness, I included this step. If you have to add redo logs to a standby that is currently applying transactions, then you will need this step.
2. alter system set standby_file_management='MANUAL' scope=memory;
All these ingredients are blended in right combination to offer the overnight shipping cialis best herbal treatment for weak erection problem. During the sexual viagra uk delivery performance, erection of penis at the time of intercourse. Safety and secure storage of the viagra order uk pillsStore the pills in the room temperature placing it in a tight box. This kind of treatment is not immediate and the patient has to wait for sometime before he rediscovers seanamic.com cialis no prescription his form.
System altered.
3. alter database add logfile thread 4 group 40 '/u01/app/oracle/oradata/orcl/redo01/redo40.log' size 536871424;
Database altered.
The above is exactly what was run on the primary. Need to add the redo log on the standby exactly as done on the primary. Repeat for each redo log group added on the primary. Since I added 3 instances to my primary RAC database, I have to add three threads here.
4. alter system set standby_file_management='AUTO' scope=memory;
System altered.
5. alter database recover managed standby database disconnect from session;
Database altered.
Start managed recovery. All should be good now and we can verify in the apply instance’s alert log:
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (orcs2)
Mon Dec 03 13:32:38 2012
MRP0 started with pid=47, OS id=13232
MRP0: Background Managed Standby Recovery process started (orcs2)
started logmerger process
Mon Dec 03 13:32:44 2012
Managed Standby Recovery not using Real Time Apply
Mon Dec 03 13:32:49 2012
Parallel Media Recovery started with 4 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Mon Dec 03 13:32:49 2012
Completed: alter database recover managed standby database disconnect from session
Mon Dec 03 13:32:50 2012
Media Recovery Log /u01/app/oracle/admin/orcs/arch/1_87840_677462342.dbf
Media Recovery Log /u01/app/oracle/admin/orcs/arch/2_88542_677462342.dbf
Media Recovery Log /u01/app/oracle/admin/orcs/arch/3_89914_677462342.dbf
Media Recovery Log /u01/app/oracle/admin/orcs/arch/4_1_677462342.dbf
We can also verify the apply delay is getting shorter. In the standby, issue the following:
select i.instance_name,d.value as apply_lag,
to_char(sysdate,'YYYY-MM-DD HH24:MI:SS') as curr_time
from v$instance i,v$dataguard_stats d
where d.name='apply lag';
For background information on how to manage online redo logs for your physical standby database, see Metalink note 740675.1 Online Redo Logs in a Standby.