Skip to main content

How To: Unjoin NetApp Nodes from a Cluster

Let me paint you a word picture:

You've upgraded to a shiny new AFF - it's all racked, stacked, cabled and ready to rock. You've moved your volumes onto the new storage and your workloads are performing beautifully (of course) and it's time to put your old NetApp gear out to pasture.

We're going to learn how to unjoin nodes from an existing cluster. But wait! There are several prerequisites that must be met before the actual cluster unjoin can be done.


  • Ensure that you have either moved volumes to your new aggregates or offlined and deleted any unused volumes.
  • Offline and delete aggregates from old nodes.
  • Re-home data LIFs or disable/delete if they are not in use.
  • Disable and delete intercluster LIFs for the old nodes (and remove them from any Cluster Peering relationships)
  • Remove the old node's ports from any Broadcast Domains or Failover Groups that they may be a member of.
  • Move epsilon to one of the new nodes (let's assume nodes 3 and 4 are the new nodes, in this scenario).

labnetapp01::> set -priv advanced
labnetapp01::*> cluster show -epsilon *
Node                 Health  Eligibility   Epsilon
-------------------- ------- ------------  ------------
node1               true    true          true
node2               true    true          false
node3               true    true          false
node4               true    true          false
4 entries were displayed.

We can see that node 1 has epsilon currently, so let's disable it on node 1 and move it to one of the new nodes.

labnetapp01::*> cluster modify -node node1 -epsilon false 

labnetapp01::*> cluster modify -node node3 -epsilon true

We can verify that epsilon was moved by running the "cluster show -epsilon *" command again:

labnetapp01::*> cluster show -epsilon *
Node                 Health  Eligibility   Epsilon
-------------------- ------- ------------  ------------
node1               true    true          false
node2               true    true          false
node3               true    true          true
node4               true    true          false
4 entries were displayed.

  • Disable cluster replication ring eligibility for the old nodes by moving the cluster ring master to one of the new nodes. 
labnetapp01::*> cluster ring show
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
node1     mgmt     1        1        1068     node0     master
node1     vldb     1        1        98       node0     master
node1     vifmgr   1        1        350      node0     master
node1     bcomd    1        1        56       node0     master
node1     crs      1        1        88       node0     master
node2     mgmt     1        1        1068     node0     secondary
node2     vldb     1        1        98       node0     secondary
node2     vifmgr   1        1        350      node0     secondary
node2     bcomd    1        1        56       node0     secondary
node2     crs      1        1        88       node0     secondary

node3     mgmt     1        1        1068     node0     secondary
node3     vldb     1        1        98       node0     secondary
node3     vifmgr   1        1        350      node0     secondary
node3     bcomd    1        1        56       node0     secondary
node3     crs      1        1        88       node0     secondary
node4     mgmt     1        1        1068     node0     secondary
node4     vldb     1        1        98       node0     secondary
node4     vifmgr   1        1        350      node0     secondary
node4     bcomd    1        1        56       node0     secondary
node4     crs      1        1        88       node0     secondary
20 entries were displayed.

In order to force the cluster ring master to move to a different node, we need to set eligibility to false for node1 and node2.

labnetapp01::*> system node modify -node node1 -eligibility false

Then we'll do the same thing for node2.

labnetapp01::*> system node modify -node node2 -eligibility false

You'll probably get a bunch of email alerts at this point, don't panic. After these commands have run, you can see the result by running the "cluster ring show" command again.

labnetapp01::*> cluster ring show
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
node1     mgmt     1        1        1068     node0     offline
node1     vldb     1        1        98       node0     offline
node1     vifmgr   1        1        350      node0     offline
node1     bcomd    1        1        56       node0     offline
node1     crs      1        1        88       node0     offline
node2     mgmt     1        1        1068     node0     offline
node2     vldb     1        1        98       node0     offline
node2     vifmgr   1        1        350      node0     offline
node2     bcomd    1        1        56       node0     offline
node2     crs      1        1        88       node0     offline

node3     mgmt     1        1        1068     node0     master
node3     vldb     1        1        98       node0     master
node3     vifmgr   1        1        350      node0     master
node3     bcomd    1        1        56       node0     master
node3     crs      1        1        88       node0     master
node4     mgmt     1        1        1068     node0     secondary
node4     vldb     1        1        98       node0     secondary
node4     vifmgr   1        1        350      node0     secondary
node4     bcomd    1        1        56       node0     secondary
node4     crs      1        1        88       node0     secondary
20 entries were displayed.

  • Now we'll need to disable SFO (storage failover) for the two old nodes. 
labnetapp01::*> storage failover modify -node node1 -enabled false
labnetapp01::*> storage failover modify -node node2 -enabled false

  • Verify that storage failover is disabled by running "storage failover show". You'll see a value of "False" under the "Possible" column. 
  • Now we're finally ready to do the actual cluster unjoin! After waiting for some time, you'll see a success message if nothing went catastrophically wrong. 
labnetapp01::*> cluster unjoin -node node1

Warning: This command will unjoin node "node1" from the cluster. You must unjoin the failover partner as well. After the node is successfully unjoined, erase its configuration and init
         (4)" option from the boot menu.
Do you want to continue? {y|n}: y
[Job 47561] Cleaning cluster database[Job 47561] Job succeeded: Cluster unjoin succeeded

labnetapp01::*> cluster unjoin -node node2

Warning: This command will unjoin node "node2" from the cluster. You must unjoin the failover partner as well. After the node is successfully unjoined, erase its configuration and init
         (4)" option from the boot menu.
Do you want to continue? {y|n}: y
[Job 47561] Cleaning cluster database[Job 47561] Job succeeded: Cluster unjoin succeeded

That's it for the cluster unjoin. After this process, you'll be ready to power down and physically remove the old storage. Thanks for reading!

Disclaimer: This is a lab - I am not responsible for breaking your stuff if you run this in production and something gets borked.

Comments

Post a Comment

Popular posts from this blog

NetApp ONTAP 9.3 Simulator Deployment - Part 1

I am going to be doing a few of these simulator/lab posts in an effort to set up an environment that will pave the way for future guides and blog posts. Hopefully it'll also be a good resource for folks that want to set up their own labs to test out new features and software versions. Today I'm going to show the steps required to deploy Netapp's ONTAP Simulator 9.3 on vSphere 6.5.  I'll also be doing a follow-up article that will detail the process of clustering a second node with this first one.

Note: My lab has vCenter 6.5 deployed along with a Distributed vSwitch, so the steps will be specific to that deployment. I will also assume that you already have basic networking and storage for your virtual machines in place.

Step 1: Deploying the Simulator

1. Browse out to https://mysupport.netapp.com, click on "Sign In" in the upper right-hand corner and log in using your NetApp account credentials.

2. Click on the Downloads drop-down at the top of the screen and c…

Cisco UCS Platform Emulator Installation

To continue my series of posts on building the framework for a functional lab environment, I'd like to talk about the Cisco UCS Platform Emulator (UCSPE). It is a software appliance packaged as a vSphere OVA that approximates a UCS deployment, including the networking components (a pair of switches called the Fabric Interconnects) and both blade and rackmount UCS servers (B- and C-Series, respectively). It can be a great tool for learning and becoming more familiar with the UCS platform. I will be deploying my UCSPE on vSphere 6.7 in my lab, but it should work similarly in other recent versions.

1. Start by downloading the UCS Platform Emulator OVA from https://communities.cisco.com/docs/DOC-71877 - you will need a Cisco Connection Online (CCO) login in order to begin the download. I am using version 3.1(2ePE1) of the emulator for this guide as that appeared to be the latest version available at the time of writing. Side note, I also noticed during the boot process that this versi…