Skip to main content

ONTAP Configuration Compliance Auditing with PowerShell and Pester




I have been looking for a way to validate NetApp cluster configuration settings (once a configuration setting is set, I want to validate that it was set properly in a programmatic fashion) and prevent configuration drift (if a setting is different than its expected value, I want to know about it). I needed it to be able to scale out to dozens of clusters as well, so it needed to be something that I could run both automatically and on an ad-hoc basis if necessary.

NetApp PowerShell Toolkit

The core of the solution is the NetApp PowerShell Toolkit, without which this would likely not be possible. It contains 2300+ cmdlets for provisioning and managing NetApp storage components. It can be downloaded from the ToolChest on the NetApp MySupport site (with a valid login). You'll find exhaustive documentation there as well for each of the cmdlets along with syntax examples and sample code. It is a fantastic and easy way to automate common storage tasks - we use it in our environment for everything from creating new LIFs, volumes, and igroups to entire cluster deployments. I won't cover its installation in great detail as it's pretty straightforward - once installed it can be loaded into PowerShell using the "Import-Module DataONTAP" command. While the PS Toolkit is great for provisioning, my current use case was to test my cluster configuration against expected values (NetApp best practices or organization-specific configuration settings) and alert me of discrepencies. For that, we use Pester.

Pester

Pester is a unit testing framework for PowerShell (it is even bundled with the version of PowerShell that ships with Windows 10, albeit an older version), which is designed to execute PowerShell commands and evaluate their output. Pester can execute functions, cmdlets, modules, or entire scripts and can be run ad-hoc or be integrated into a deployment/provisioning process. As with the PS Toolkit, I won't cover its installation in great detail, but it can be installed from the PS Gallery or with the NuGet package manager by running "Get-Package pester" and then importing it into your PowerShell session using "Import-Module pester". Let's take a look at a simple Pester test to get some context:

Describe -Name 'NetApp Configuration Tests' {
 Context "Basic Storage Tests" {
  It -name "Check Cluster ONTAP version" {
    $version = Get-NcSystemVersion  
    $version | Should -Match 'NetApp Release 9.3P8'
  } # It
 } # Context
} # Describe

Pester tests are broken up into logical groups of tests called blocks. The Describe block is typically used to group larger amounts of tests, such as all tests run against a particular type of system (ONTAP clusters, for my use case). On the next hierarchical level below Describe is a Context block, which is used to group similiar types of tests, such as a group of cluster networking or high availability tests. Below Context blocks, there are It blocks, which are designed to contain a single assertion.

Once you have the test built and assuming that you have already connected to a NetApp controller with the Connect-NcController <controller> command (or validated that you have a current active connection by typing "$currentnccontroller" without the quotes), you can run a Pester test using the "Invoke-Pester /path/to/test" command.

PS C:\> Invoke-Pester .\cluster-version.ps1
Executing all tests in '.\cluster-version.ps1'
Executing script .\cluster-version.ps1
  Describing NetApp Configuration Tests
    Context Basic Storage Tests
      [+] Check Cluster ONTAP version 310ms
Tests completed in 441ms
Tests Passed: 1, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0
You can see that the test passed since the value returned by the PowerShell cmdlet stored in the variable matched the value we specified in the It block. This is probably the most basic example of a Pester test, so I am using it in my Pester tests to validate that I can run commands against a cluster and return valid results. So what does a failed test look like? If we connect to a different controller that's not running our expected version of ONTAP, we can see.

PS C:\> Invoke-Pester .\cluster-version.ps1
Executing all tests in '.\cluster-version.ps1'
Executing script .\cluster-version.ps1
  Describing NetApp Configuration Tests
    Context Basic Storage Tests
      [-] Check Cluster ONTAP version 1.96s
        Expected regular expression 'NetApp Release 9.3P8' to match NetApp Release 9.1P12, but it did not match.
        5:     $version | Should -Match 'NetApp Release 9.3P8'
        at scriptblock, C:\cluster-version.ps1: line 5
Tests completed in 421ms
Tests Passed: 0, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0

As you can see, the test failed now since the value returned by the PS cmdlet returned an unexpected value. We can apply this logic to any one of the PowerShell cmdlets that are part of the PowerShell Toolkit. One of the other things that comes in handy is storing the actual configuration values in a JSON configuration file that is referenced in the script instead of the actual hard-coded values. This makes it very easy to update a configuration file when values change rather than having to update the script itself (and the configuration files themselves can exist in source control). For example, here is a slightly more advanced example of multiple It blocks grouped into a single Context and referencing a simple JSON file for configuration settings:

JSON configuration file (sample data):

{
    "About": {
      "Product": "NetApp Cluster Config (Basic)",
      "ConfigVersion": "1.0"
    },
    "Cluster": {
      "Autosupport": {
        "IsEnabled": "True",
        "Transport": "https",
        "MailHost": "smtp.domain.com",
        "SupportAddress": "autosupport@netapp.com",
        "To": "storage_team@domain.com"
           },
       }
  }

(Get-NcNode).ForEach{    
    Context "Testing AutoSupport Settings on $($_)" {
      (Get-NcAutoSupportConfig -Node $_.Node).ForEach{ 
        It "AutoSupport is enabled" -test {
          $_.IsEnabled | Should -Be $basic_config.Cluster.AutoSupport.IsEnabled
        } # It
        It "Autosupport transport method is HTTPS" -test {  
          $_.Transport | Should -Be $basic_config.Cluster.AutoSupport.Transport
        } # It
        It "Mail host value is well-defined" -test {
          $_.MailHosts | Should -Be $basic_config.Cluster.Autosupport.MailHost
        } # It
        It "Support address is well-defined" -test {
          $_.SupportAddress | Should -Be $basic_config.Cluster.Autosupport.SupportAddress
        } # It
        It "AutoSupport notification destination email is well-defined" -test {
          $_.To | Should -Be $basic_config.Cluster.Autosupport.To
        } # It            
      } # Get-NcAutoSupportConfig
    } # Context    
   } # Get-NcNode
The Pester output of this command then looks like this, assuming the cluster values match the values in your JSON file:
 Context Testing AutoSupport Settings on lab-clst-01-n1
   [+] AutoSupport is enabled 1ms
   [+] Autosupport transport method is HTTPS 1ms
   [+] Mail host value is well-defined 1ms
   [+] Support address is well-defined 1ms
   [+] AutoSupport notification destination email is well-defined 1ms
 Context Testing AutoSupport Settings on lab-clst-01-n2
   [+] AutoSupport is enabled 2ms
   [+] Autosupport transport method is HTTPS 1ms
   [+] Mail host value is well-defined 2ms
   [+] Support address is well-defined 2ms
   [+] AutoSupport notification destination email is well-defined 1ms
 Context Testing AutoSupport Settings on lab-clst-01-n3
   [+] AutoSupport is enabled 1ms
   [+] Autosupport transport method is HTTPS 1ms
   [+] Mail host value is well-defined 1ms
   [+] Support address is well-defined 1ms
   [+] AutoSupport notification destination email is well-defined 3ms
 Context Testing AutoSupport Settings on lab-clst-01-n4
   [+] AutoSupport is enabled 1ms
   [+] Autosupport transport method is HTTPS 1ms
   [+] Mail host value is well-defined 1ms
   [+] Support address is well-defined 1ms
   [+] AutoSupport notification destination email is well-defined 1ms

Below is a list of ONTAP components I am testing with Pester today:

Basic cluster level items:
AutoSupport settings
NTP servers
DNS servers
Auto-giveback status
Cluster HA health
Networking/SnapMirror:
Cluster Peer health
Unhealthy SnapMirror relationships
Down ports
Interfaces not at their home ports
Service Processor status
Disk/volume:
Broken disks
Unassigned disks
Offline volumes
Check for manually created snapshots older than 180 days
Check for backup storage snapshots older than 180 days
External engine fpolicy configuration:
- event
- policy
- scope
- service account
- external engine
- status
Native fpolicy configuration:
- event
- policy
- scope
Vscan configuration:
Vscanner role
Vscan service account
Vscan scanner-pool
Vscan on-access policy
Vscan enabled

Future/wishlist items:
Values that differ based on the cluster (DNS/NIS servers, Fpolicy collector server IPs, Vscan server IPs, etc.)
SAN health items (offline LUNs, number of initiators per igroup, etc.)
More complete JSON configuration file usage 
Automated remediation (far future, if ever)

My goal is to add more tests as time goes on and use cases pop up. Also, I will eventually be running the tests in an automated fashion with some sort of reporting to be more proactive about detecting configuration drift. Hopefully this is useful for more than just me as a customer - I could see it being useful as a partner for auditing the health of customer environments without ever having to manually log into the cluster.

If you'd like to get more information about Pester (and believe me, there is a LOT that I haven't covered, this is barely scratching the surface) you can review the documentation at https://github.com/pester/Pester.

I've also published my Pester code (and associated sample JSON configuration file) to GitHub at https://github.com/zer0trust/Pester-ONTAP if you'd like to use it in your environment or even if you just want to point and laugh at my rudimentary PowerShell abilities. I have approximately one week of experience writing Pester tests, so if you see a way that it could be improved, I'd love the feedback and will happily update the tests to reflect that.

Disclaimer: These tests rely on only PS "Get-Function" commands, but I'm not responsible for anything wacky that happens if/when you run this against production clusters.

Thanks for reading!

Popular posts from this blog

How To: Unjoin NetApp Nodes from a Cluster

Let me paint you a word picture:

You've upgraded to a shiny new AFF - it's all racked, stacked, cabled and ready to rock. You've moved your volumes onto the new storage and your workloads are performing beautifully (of course) and it's time to put your old NetApp gear out to pasture.

We're going to learn how to unjoin nodes from an existing cluster. But wait! There are several prerequisites that must be met before the actual cluster unjoin can be done.


Ensure that you have either moved volumes to your new aggregates or offlined and deleted any unused volumes.Offline and delete aggregates from old nodes.Re-home data LIFs or disable/delete if they are not in use.Disable and delete intercluster LIFs for the old nodes (and remove them from any Cluster Peering relationships)Remove the old node's ports from any Broadcast Domains or Failover Groups that they may be a member of.Move epsilon to one of the new nodes (let's assume nodes 3 and 4 are the new nodes, in th…

Step up your HTTP security header game with NetScaler Rewrite Policies

There are a number of HTTP response headers that exist to increase web site security. If set properly, they can ensure that your site is less exposed to many common web vulnerabilities. By no means are these descriptions exhaustive, so I have included some references that can provide a more in-depth explanation at the bottom of each section. I'd also like to give a shout-out to the OWASP Secure Headers Project and Scott Helme of securityheaders.com - thank you!

Note: Screenshots are from a NetScaler VPX 12.1 - if you are running a different version, the screenshots may look different, but the logic is the same. So that I have something to bind these policies to, I've also already created a load-balancing virtual server named lb_web_ssl and a Service Group for two TurnKey LAMP servers on the back-end.

X-Frame-Options
The X-Frame-Options header is designed to guard against clickjacking (an attack where malicious content is hidden beneath a clickable button or element on a web si…