Just recently, I was trying to apply the latest and greatest Patch Set Update (PSU) to a 2-node Oracle RAC system. Everything went smoothly on the first node. I did have problems when trying to apply the PSU to the second node. The problem wasn’t with OPatch or the PSU, but rather, I could not even bring down Grid Infrastructure (GI) successfully. And to make matters worse, it would not come up either.
I tracked my issue down to the Grid Inter Process Communication Daemon (gipcd) When issuing ‘crsctl stop crs’, I received a message stating that gipcd could not be successfully terminated. When starting GI, the startup got as far as trying to start gipcd and then it quit. I found many helpful articles on My Oracle Support (MOS) and with Google searches. Many of those documents seemed to be right on track with my issue, but I could not successfully get GI back up and running. Rebooting the node did not help either. The remainder of this article can help even if your issue is not with gipcd, it was just the sticking point for me.
So at this juncture, I had a decision to make. I could file a Service Request (SR) on MOS. Or I could “rebuild” that node in the cluster. I knew if I filed a SR, I’d be lucky to have the node operational any time in the next week. I did not want to wait that long and if this were a production system, I could not have waited that long. So I decided to rebuild the node. This blog post will detail the steps I took. At a high level, this is what is involved:
- Remove the node from the cluster
- Cleanup any GI and RDBMS remnants on that node.
- Add the node back to the cluster.
- Add the instance and service for the new node.
- Start up the instance.
In case it matters, this system is Oracle 12.1.0.2 (both GI and RDBMS) running on Oracle Linux 7. In my example, host01 is the “good” node and host02 is the “bad” node. The database name is “orcl”. Where possible, my command will have the prompt indicating the node I am running that command from.
First, I’ll remove the bad node from the cluster.
I start by removing the RDBMS software from the good node’s inventory.
[oracle@host01]$ ./runInstaller -updateNodeList ORACLE_HOME=$RDBMS_HOME "CLUSTER_NODES={host01}" LOCAL_NODE=host01
Then I remove the GI software from the inventory.
[oracle@host01]# ./runInstaller -updateNodeList ORACLE_HOME=$GRID_HOME "CLUSTER_NODES={host01}" CRS=TRUE -silent
Now I’ll remove that node from the cluster registry.
[root@host01]# crsctl delete node -n host02
CRS-4661: Node host02 successfully deleted.
Remove the VIP.
[root@host01]# srvctl config vip -node host02 VIP exists: network number 1, hosting node host02 VIP Name: host02-vip VIP IPv4 Address: 192.168.1.101 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes: [root@host01]# srvctl stop vip -vip host02-vip -force [root@host01]# srvctl remove vip -vip host02-vip Please confirm that you intend to remove the VIPs host02-vip (y/[n]) y
Then remove the instance.
[root@host01]# srvctl remove instance -db orcl -instance orcl2 Remove instance from the database orcl? (y/[n]) y
At this point, the bad node is no longer part of the cluster, from the good node’s perspective.
Next, I’ll move to the bad node and remove the software and clean up some config files.
[oracle@host02]$ rm -rf /u01/app/oracle/product/12.1.0.2/
[root@host02 ~]# rm -rf /u01/grid/crs12.1.0.2/*
[root@host02 ~]# rm /var/tmp/.oracle/*
[oracle@host02]$ /tmp]$ rm -rf /tmp/*
Men all around the globe tend to face this disorder of cheap viagra online males. The pill comes in 100mg power, which has to be exercised completely levitra overnight delivery with enough amount of water without crushing or breaking it. However, the degree of erection may differ such as in one case; man may not able to erect for pleasing plus order uk viagra satisfactory physical intimacy. Powerful herbs include Shilajit, Safed Musli, Kaunch and Salabmisri in this herbal pills increases buy tadalafil in canada semen volume. [root@host02]# rm /etc/oracle/ocr*
[root@host02]# rm /etc/oracle/olr*
[root@host02]# rm -rf /pkg/oracle/app/oraInventory
[root@host02]# rm -rf /etc/oracle/scls_scr
I took the easy way out and just used ‘rm’ to remove the RDBMS and Grid home software. Things are all cleaned up now. The good node thinks its part of a single-node cluster and the bad node doesn’t even know about the cluster. Next, I’ll add that node back to the cluster. I’ll use the addnode utility on host01.
[oracle@host01]$ cd $GRID_HOME/addnode
[oracle@host01]$ ./addnode.sh -ignoreSysPrereqs -ignorePrereq -silent "CLUSTER_NEW_NODES={host02}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={host02-vip}"
This will clone the GI home from host01 to host02. At the end, I am prompted to run root.sh on host02. Running this script will connect GI to the OCR and Voting disks and bring up the clusterware stack. However, I do need to run one more cleanup routine as root on host02 before I can proceed.
[root@host02]# cd $GRID_HOME/crs/install
[root@host02]# ./rootcrs.sh -verbose -deconfig -force
It is possible that I could have run the above earlier when cleaning up the node. But this is where I executed it at this time. Now I run the root.sh script as requested.
[root@host02]# cd $GRID_HOME
[root@host02]# ./root.sh
At this point, host02 is now part of the cluster and GI is up and running. I verify with “crs_stat -t” and “olsnodes -n”. I also check the VIP.
[root@host02]# srvctl status vip -vip host02-vip VIP host02-vip is enabled VIP host02-vip is running on node: host02
Now back on host01, its time to clone the RDBMS software.
[oracle@host01]$ cd $RDBMS_HOME/addnode
[oracle@host01]$ ./addnode.sh "CLUSTER_NEW_NODES={host02}"
This will start the OUI. Walk through the wizard to complete the clone process.
Now I’ll add the instance back on that node.
[oracle@host01]$ srvctl add instance -db orcl -instance orcl2 -node host02
If everything has gone well, the instance will start right up.
[oracle@host01]$ srvctl start instance -db orcl -instance orcl2
[oracle@host01]$ srvctl status database -d orcl Instance orcl1 is running on node host01 Instance orcl2 is running on node host02
SQL> select inst_id,status from gv$instance;
INST_ID STATUS ---------- ------------ 1 OPEN 2 OPEN
Awesome! All that remains is to reconfigure and start any necessary services. I have one.
srvctl modify service -db orcl -service hr_svc -modifyconfig -preferred "orcl1,orcl2"
srvctl start service -db orcl -service hr_svc -node host02
srvctl status service -db orcl
That’s it. I now have everything operational.
Hopefully this blog post has shown how easy it is to take a “bad” node out of the cluster and add it back in. This entire process took me about 2 hours to complete. Much faster than any resolution I’ve ever obtained from MOS.
I never did get to the root cause of my original issue. Taking the node out of the cluster and adding it back in got me back up and running. This process will not work if the root cause of my problem was hardware or OS-related.
And the best part for me in all of this? Because host01 already had the PSU applied bo both GI and RDBMS homes, cloning those to host02 means I did not have to run OPatch on host02. That host received the PSU patch. All I needed to do to complete the patching was run datapatch against the database.