Monday, June 22, 2015

Health Service Failure Runbook

Every time I apply a SCOM 2012 R2 update I go to the best source on the web, Kevin Holman’s Step by Step series.

But here is my experience, issues that I encountered and of course, the solution.

After running all SCOM server-related updates as per Kevin’s document I was now ready to update all Managed Agents waiting to be updated in the Pending Management pane.

Half updated with no issues, but the other half (200+ agents) would generate the following error:

The Agent Management Operation Agent Install failed for remote computer xxxx
Install account: xxx\ScomAction
Error Code: 8007041D
Error Description: The service did not respond to the start or control request in a timely fashion.
Microsoft Installer Error Description:
For more information, see Windows Installer log file "C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\AgentInstall.LOG
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\AgentPatch.LOG
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\MOMAgentMgmt.log" on the Management Server.

The key words here is in the error description “The service did not respond to the start or control request in a timely fashion”

When I open the affected agent services, indeed the Microsoft Monitoring Agent (HealthService) was stopped.

That of course generated a “Health Service Heartbeat Failure” alert and if the agent was a Cluster or Domain Controller then there were a plethora of other Critical alerts that came with it.

The funny part is that SCOM had successfully updated the agent but failed to re-start the Health Service which presented a challenge since there is nothing that can be done from within the SCOM console to resuscitate the now grey-out agent.

A quick solution is to remotely start the Microsoft Monitoring Service but it’s impractical on a 400+ agent population.

The Solution:

I created a SCORCH 2012 R2 Runbook to start the Microsoft Monitoring Service

Under the hood:

The Runbook listens for the ‘Health Service Heartbeat Failure’ alert
Ping the server to ensure it has not been shutdown or rebooted. 
If ping fails, an Information alert is created, mainly so it won’t interfere with the ‘Failed to Connect to Computer’ Critical alert that is generated immediately after.
If the ping is successful we pass the information to the next activity to start the Microsoft Monitoring Agent service.
Last step, we closed the ‘Health Service Heartbeat Failure’ alert and write ‘Closed by SCORCH’ in custom filed 1 as a successful stamp.

All software and information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!

Tuesday, June 9, 2015

AD User Attribute Changes Audit Report

The following SCOM 2012 R2 ACS report provides detailed attribute changes done to any Active Directory user.
The Challenge
There are many reports that provide similar information included in SCOM Audit reports. One example is located in Reporting>>Audit Reports>>DAC_-_Object_Attribute_Changes

However, there are thousands of AD Attributes which include hundreds of AD User-related attributes making the above mentioned report very convoluted. The use of sometimes cryptic attribute names, values and operation description adds to the complexity of the report making it hard to read especially for non-tech people whom are, most of the time, the recipients of many SCOM reports.

Sample out-of-the-box report

The Solution:

The attached report focuses on AD User attributes displayed via Outlook which are a representation of LDAP fields and are by far the most commonly modified.

Most Common User Attributes

In my report I have replaced all AD User attributes with user-friendly names.

AD User Attribute Name
Friendly Name
Display Name
First Name
Last Name
Email Alias
Phone Number
Postal Code
Zip/Postal Code

Sample Report

Mundo SCOM AD User Attribute Changes Report

The report takes two variable in between two %% signs: ‘User Name Contains’ (Affected User) and/or ‘Attribute Name Contains’ (Changed Attribute) or just enter two %% to get all possible results.

Preparing the AD environment:

How to enable AD Object Auditing, Audit Policies or Advanced Audit Policies setup is out of the scope of this post. 

However here is quick description of what is needed in order to produce the report:
On you Domain Controllers, enable ‘Directory Service Changes’ Audit Policy Subcategory, which is part of the Directory Service Audit Policy Category. Make sure to enable both.

AD object attribute changes are captured in Event ID 5136: A directory service object was modified which is part of the above Subcategory.

Enable Auditing to all Users via GPO or manually for a small number of users.

For a single user go to ‘Advanced’ security setting, Auditing. Add ‘Write all properties’.

All software and the information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!