Friday, August 5, 2016

Quick Troubleshooting for Agent-Free Dell Devices in SCOM

First, if you know anything about SCOM, it is probably that almost nothing is quick.  But here are the steps you need to take when needing to remove and re-add an Agent-Free device, or when you just can't get a device to be discovered Agent-Free to begin with.

Agent-Free monitoring in SCOM is available for 12G and later servers.

The rest of this post assumes that you have already been through my post on setting up Agent-Free and SNMP Monitoring (http://bradsjumpbag.blogspot.com/2015/12/dell-server-management-pack-with-agent.html) for initial setup.  If you are still having problems and cannot get the iDracs discovered through the WSman template, here are some troubleshooting steps.

1. If an iDrac is already discovered and in SCOM, it cannot be discovered again.  If it is already an Agent-Managed device or discovered as an SNMP device, it cannot be discovered again.  You must remove it from the console first.  Do a quick search for the machine in the top left Search box in SCOM.  If it shows up in the results, then it is certainly in the console and will need to be removed.



Find it and delete it.  The results may tell you where it is.  Otherwise, look for it in some of these common places:
On the Administration workspace, look under Agent Managed, Agentless Managed, Pending Management, Network Devices, Network Devices Pending Management.
On the Authoring Node, go to WS-Management and SMASH Device Discovery, open the properties of the first template and make sure the iDrac is not on listed on the Devices tab.  Repeat this for each template.
If you find the iDrac on any of these views, please delete it.

2. If the machine is a Windows server, login to it and look in Control Panel\Programs and Features.  Make sure that the Microsoft Monitoring Agent is not installed.  If it is, please uninstall it.  Having this installed will cause the machine to keep trying to re-insert itself into the SCOM DB.

3. Make sure the machine is removed from the WS-Management and SMASH Device Discovery templates.  Once you remove it here, SCOM should start the clean up process.  There is a SMASH device script from Microsoft that is supposed to run.  However, you can kickoff this script yourself too to make sure that it starts.

Go to the Monitoring workspace, scroll all the way to the bottom and expand WS-Management and SMASH Monitoring.  Click on WS-Management and SMASH Operational Events.  Click on any event to highlight it.  Then you will be able to find and click on the "Clean up SMASH devices from deleted templates" script in the Tasks pane on the far right.



4. After you have deleted the target device, uninstalled any Microsoft Monitoring Agent, and run the SMASH clean up script, now it is time to wait a little while.  I usually suggest giving it an hour.

5. Next, let's check the SQL DB.  Open SQL Management Studio and connect to the Operations Manager DB.  Select New Query from the toolbar.  If you are looking for several machines, you can use this query:

SELECT * FROM dbo.[BasemanagedEntity] where FullName Like '%Windows.Computer%'

If you are looking for a specific machine, use this query and replace FQDN with your machine name:

SELECT * FROM dbo.[BasemanagedEntity] where FullName Like '%Windows.Computer%' and Name Like 'FQDN'

***NOTE*** If you copy and paste the query instead of typing it, you might need to replace the apostrophes.  Just delete them and then replace from your keyboard.

In the results tab, scroll over looking for the IsDeleted column.  If the machine(s) you are concerned with have a 0 in that column, we need to change them with an update query.



Use this Update query and replace FQDN with the specific machine that needs to be changed:

UPDATE dbo.[BasemanagedEntity] SET IsDeleted = 1 where FullName Like '%Windows.Computer%' and Name Like 'FQDN'

Give the system a little more time.  I have not timed this so I don't know how long to suggest.

6. Now the target device should be gone from the DB and Operations Manager console.  We should be able to discover it now.  If you already have a WS-Management and SMASH Device Discovery template, you can add the target iDrac IP back into the into the device list.  Or you can refer back to my post on setting up Agent-Free and SNMP Monitoring (http://bradsjumpbag.blogspot.com/2015/12/dell-server-management-pack-with-agent.html).  Step 12 in section 1.3 is where we start creating the template.

Now give it a little more time for the discovery, inventory and correlation scripts to run.  Once the iDrac shows up in the Servers and WRack orkstations (Agent-Free) view, you can discover them as SNMP devices.  Those steps are in the other post as well.

I hope this helps you discover or re-discover your Dell Agent-Free devices.

Thursday, August 4, 2016

Dell MP Suite 6.2 CMC Class not Found errors

Dell Management Pack Suite 6.2 has simplified discovery and monitoring for CMCs.  Chassis Detailed discovery and Slot discovery now do not require the RACADM utility.  The RACADM utility is only required to monitor the health of the Chassis controller, IO Module, IO Module Group, Power Supply, and Power Supply Group.

This simplification has included adding WSman status plus inventory polling for CMCs to the basic script that runs against all VRTX, M1000e, and FX2 chassis.

There are a couple of classes that are available on VRTX but not M1000e or FX2.  If you happen to have Log Level overridden to 1 (enabled), you will see 2 errors every day on each M1000e and FX2 CMC log.  By default, these logs are at C:\Windows\Temp\DellDeviceHelper_Logs when enabled.

The logs will be named with this format: DellDeviceHelper_X_X_X_X_GUID.log.  The X_X_X_X will be the IP of your CMC.  The logs are short and you can look through them for references to

"Profile Name= cimv2/DCIM_ChassisPCIDeviceView" or "Profile Name= cimv2/DCIM_ControllerView"

Then on the "Session Error" line you will see "Class not found".

These errors are innocuous and can be ignored for M1000e and FX2 chassis.  The only way to get rid of them is to disable the debug logging.  This is considered working by design.

To turn off debug logging, go to the Authoring workspace, expand Management Pack Objects and select Overrides.  Make sure the Find button is depressed in the toolbar and then in the Look for box type "log level" without the quotes, and click the Find Now button.  Then expand Management Pack Object Type: Object Discovery and any Targets under that to locate the Log Level overrides.

I have highlighted in the picture below where I have 2 log levels set to 1.











You can either right click on each one and select Delete.













Or, you can double click on the target to open the properties and simply uncheck Log Level and click on OK.


























Remove all Log Level set to 1 because they are not necessary unless you are debugging.