In my last installment, I discussed a few different areas where data center monitoring automation can not only make life in the data center more convenient but also become a force multiplier. I ran out of space, however, before I ran out of ideas (the story of my life). The one thing I didnt cover was the automation you can implement in response to an alert.
As a data center professional, you probably have a solid understanding of monitoring and alerting already, but to truly appreciate how automation can relieve an enormous burden, it may be helpful to review a few examples.
What follows are some clippings from my garden of automationalert responses that have had a huge impact on the environments where they were implemented.
Example 1: Disk Full
Disk-full alerting is a simple concept with a deceptively large number of moving parts. So, I want to break it down into specifics. First, get the alert right. As my fellow SolarWinds Head Geek Thomas LaRock and I discussed in a recent episode of SolarWinds Lab, simplistic disk alerts help nobody. If you have a 2TB disk, alerting when its 90 percent used translates to having204.8GBs of disk space remaining.
A good solution to this problem is to check for both percent used and also remaining space. A better solution is to include logic in the alert that tests for the total space of the drive, so that drives with less than 1TB of space have one set of criteria and drives with greater than 1Tb have another. These tests should all be in the same alert, if possible, because who wants to manage hundreds of alert rules? Nevertheless, you want to ensure you are monitoring disk space in a way that is reasonable for the volumes in question, and only create necessary alerts.
Next, clear unnecessary disk files out of various directories. For the purpose of this article, Ill just say that all systems have a temporary directory and that you can delete all files out of that folder with impunity. The challenge in doing so easily comes down to a problem of impersonation. Many monitoring solutions run on the server as the system account. As a result, performing certain actions requires the script to impersonate a privileged user account. There are a variety of ways to do so, which is why Ill leave the problem here for you to solve in a way that best fits your individual environment.
Once the impersonation issue is resolved, theres another challenge specific to the disk-full alert: knowing that the correct directories for the specific server are being targeted. The best approach is to use a common shared folder that maps to all servers and place a script file there. That script can be set up to first detect the proper directories and then clear them out with all the necessary safeguards and checks in place to avoid accidental damage.
Example 2: Restart an IIS Application Pool
Sadly, restarting application pools is often the easiest and best fix for website-related issues. Im not saying that running appcmd stop... and then appcmd start... from the server command line is a quick kludge that ignores the bigger issues. Im saying that often, resetting the application pool is the fix.
If your web team finds itself in this situation, waking a human being to do the honors is absolutely your most expensive option. But automatically restarting the application pool becomes slightly more challenging because one server could be running multiple websites, which in turn have multiple application pools. Or you could have one big application pool controlling multiple websites. It all depends on how the server and websites were configured and you have no way of knowing.
If your monitoring solution can monitor the application pool, it will provide the name for you. Most mature monitoring solutions do so already. Once you have the name, you can do the following:
Example 3: Restart IIS
Running a close second behind restarting application pools is resetting IIS. Doing so is, of course, the nuclear option of website fixes since you are bouncing all websites and all connections. Even though its drastic, its a necessary step in some cases.
As with restarting application pools, getting a human involved in this incredibly simple action is a waste of everyones time and the companys money. Its far better to automatically restart and then recheck the website a minute or two later. If all is well, the server logs can be investigated in the morning as part of a postmortem. If the website is still down, its time to send in the troops.
You can restart the IIS web server in a number of ways:
Example 4: Restart a Server
If restarting the IIS service is the nuclear option, restarting the entire server is akin to nuclear Armageddon. Yet we all know there are times when restarting the server is the best option, given a certain set of conditions that you can monitor.Assuming your monitoring solution doesn't support a built-in capability for this function, some options include the following:
Example 5: Restart a Service
Occasionally, services stop. They are sometimes even services that you, as a data center professional who needs to monitor your infrastructure, care about, such as SNMP.So, you are cutting dozens of service-down alerts. Have you thought about restarting them? In some cases, a restart doesnt really help much. But in far more situations it does. Computers are funny things. After all, Screws fall out all the time. The world is an imperfect place. (From The Breakfast Club.)
Sometimes, they just need a gentle nudge. If this is the case, you can do the following:
Example 6: Backup a Network-Device Configuration
Everything Ive gone over so far covers direct remediation-type actions. But in some cases, automation can be defensive and informational. Network-device configurations are a good example, in that they dont fix anything, but instead gather additional information to help you fix the issue faster.
Its important to note that between 40 and 80 percent of all corporate-network downtime is the result of unauthorized or uncontrolled changes to network devices. These changes arent always malicious. Often, the change simply went unreviewed by another set of eyes or an otherwise simple error slipped past the team.
So, having the ability to spontaneously pull a device configuration based on an event trigger is super helpful. To do so, you can use the following approach:
There are two general cases when you may want to execute this automatic action. The first is when your monitoring solution receives a config change trap. Although the details of SNMP traps are beyond the scope of this article, you can configure your network devices to send spontaneous alerts on the basis of certain events. One of these events is a configuration change. The second is when the behavior of a device changes drastically, such as when ping success drops below 75 percent or ping latency increases. In either case, often the device is in the process of becoming unavailable. But in some situations, its wobbly, and theres a chance to grab the configuration before it drops completely.
In both of those situations, having the latest configuration provides valuable forensic information that can help troubleshoot the issue. It also gives you a chance to restore the absolutely last-known-good configuration, if necessary. And if it leads you to think, Well, if I have the last known good configuration, why cant I just push that one back? Then you, my friend, have caught the automation bug! Run with it.
Example 7: Reset a User Session
Somewhere in the murky past, the first computer went online and became Node 1 in the vast network we now call the Internet. The next thing that probably happened, mere seconds later, was that the first user forgot to log off their session and left it hanging.
For any system that supports remote connectionswhether its in the form of telnet/ssh, drive mappings or RDP sessionshaving the ability to monitor and manage remote-connection user sessions can make running weekly, if not daily, restarts unnecessary. Or at least much smoother.
For Linux, use the who command to discover current sessions, or with greater granularity by remotely running netstat -tnpa | grep 'ESTABLISHED.*sshd. Once you have the process ID, you can kill it. For Windows, you get the active sessions on a system using the query session
Example 8: Clear DNS Cache
At times, a server and/or application will misbehave because it cant contact an external system. This misbehavior is either because the DNS cache (the list of known systems and their IP addresses) is corrupt, or because the remote system has moved. In either case, a really easy fix is to clear the DNS cache and let the server attempt to contact the system at its new location.
In Windows, use the command ipconfig /flushdns. In Linux, the command varies from one distribution to another, so its possible that sudo /etc/init.d/nscd restart will do the trick, or /etc/init.d/dns-clean, or perhaps another command. Research may be necessary for this one.
Hopefully at least a few of things Ive shared here and in this series on automation as a whole have inspired you to give automation a try in your data center. If so, or if youre already well on your way to automating all the things. Id love to hear about your experiences and perspective in the comments section.
Leading article image courtesy ofLeonardo Rizzi under a Creative Commons license
Leon Adato,SolarWindsHead Geek and long-time IT systems management and monitoring expert, discusses all things data center in this ongoing series.
Automations Impace on Data Center Monitoring Alerts was last modified: February 13th, 2017 by Leon Adato
Read the original here:
Automation's Impace on Data Center Monitoring Alerts - The Data Center Journal
- Automation Personnel Services - Temporary Staffing ... [Last Updated On: March 25th, 2016] [Originally Added On: March 25th, 2016]
- Automation | Define Automation at Dictionary.com [Last Updated On: March 25th, 2016] [Originally Added On: March 25th, 2016]
- Automation | Definition of automation by Merriam-Webster [Last Updated On: March 25th, 2016] [Originally Added On: March 25th, 2016]
- Automation | The Car Company Tycoon Game [Last Updated On: March 25th, 2016] [Originally Added On: March 25th, 2016]
- Automation - Wikipedia, the free encyclopedia [Last Updated On: March 25th, 2016] [Originally Added On: March 25th, 2016]
- Automation - Cloud process & workflow automation | Microsoft ... [Last Updated On: June 29th, 2016] [Originally Added On: June 29th, 2016]
- Riverside Automation - Machine Controls [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- Automation: The Car Company Tycoon Game Windows - Mod DB [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- System Integration | Industrial Automation [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- WinAutomation - Smart Macro Recorder, Web Automation ... [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- Automation Solutions - Home [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- The Automation Conference [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- Rohtek Automation [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- JL Automation, LLC | Home Automation, A/V Automation [Last Updated On: July 3rd, 2016] [Originally Added On: July 3rd, 2016]
- Four fundamentals of workplace automation | McKinsey & Company [Last Updated On: August 27th, 2016] [Originally Added On: August 27th, 2016]
- Leviton Security & Home Automation [Last Updated On: August 27th, 2016] [Originally Added On: August 27th, 2016]
- EVA Automation [Last Updated On: September 6th, 2016] [Originally Added On: September 6th, 2016]
- News | Automation | The Car Company Tycoon Game [Last Updated On: September 6th, 2016] [Originally Added On: September 6th, 2016]
- Automation - The Car Company Tycoon Game on Steam [Last Updated On: September 6th, 2016] [Originally Added On: September 6th, 2016]
- Test automation - Wikipedia, the free encyclopedia [Last Updated On: September 6th, 2016] [Originally Added On: September 6th, 2016]
- Job Seekers - Automation Personnel Services [Last Updated On: October 8th, 2016] [Originally Added On: October 8th, 2016]
- Custom Automation & Machine Design | Automation GT [Last Updated On: October 31st, 2016] [Originally Added On: October 31st, 2016]
- iAutomation [Last Updated On: October 31st, 2016] [Originally Added On: October 31st, 2016]
- Test automation - Wikipedia [Last Updated On: November 16th, 2016] [Originally Added On: November 16th, 2016]
- Automation - Official Site [Last Updated On: November 19th, 2016] [Originally Added On: November 19th, 2016]
- Beckhoff Automation - Wikipedia [Last Updated On: November 21st, 2016] [Originally Added On: November 21st, 2016]
- Automation - Security Hyperstore [Last Updated On: November 21st, 2016] [Originally Added On: November 21st, 2016]
- IT Automation - BMC [Last Updated On: November 29th, 2016] [Originally Added On: November 29th, 2016]
- ID Automation [Last Updated On: November 29th, 2016] [Originally Added On: November 29th, 2016]
- The Best Home Automation Systems of 2016 | Top Ten Reviews [Last Updated On: December 24th, 2016] [Originally Added On: December 24th, 2016]
- What is Home Automation? | Home Automation Systems [Last Updated On: December 24th, 2016] [Originally Added On: December 24th, 2016]
- Beyond Automation - hbr.org [Last Updated On: December 25th, 2016] [Originally Added On: December 25th, 2016]
- Build automation - Wikipedia [Last Updated On: December 26th, 2016] [Originally Added On: December 26th, 2016]
- Home automation - Wikipedia [Last Updated On: January 10th, 2017] [Originally Added On: January 10th, 2017]
- Automation | Food Engineering [Last Updated On: January 13th, 2017] [Originally Added On: January 13th, 2017]
- Home Automation - Enerwave Home Automation [Last Updated On: January 14th, 2017] [Originally Added On: January 14th, 2017]
- Automation - DESHAZO [Last Updated On: January 14th, 2017] [Originally Added On: January 14th, 2017]
- Robots, Automation, EOAT, Grippers, Conveyors, Guarding [Last Updated On: January 26th, 2017] [Originally Added On: January 26th, 2017]
- Werner Electric | Automation [Last Updated On: January 28th, 2017] [Originally Added On: January 28th, 2017]
- Automationtechies | Automation Engineering Recruiting [Last Updated On: January 28th, 2017] [Originally Added On: January 28th, 2017]
- Automation - Mazak Corporation [Last Updated On: January 28th, 2017] [Originally Added On: January 28th, 2017]
- Automation | Technologies | Systems | Integrator ... [Last Updated On: January 28th, 2017] [Originally Added On: January 28th, 2017]
- Test Automation Services for Development of Regression ... [Last Updated On: January 28th, 2017] [Originally Added On: January 28th, 2017]
- Carlo Gavazzi Automation Components [Last Updated On: January 30th, 2017] [Originally Added On: January 30th, 2017]
- UI Automation Overview - msdn.microsoft.com [Last Updated On: February 5th, 2017] [Originally Added On: February 5th, 2017]
- New telecom transformation goals require service automation - TechTarget [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Global Hazardous Waste Handling Automation Market: By Products ... - Business Wire (press release) [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- 2M Automation wins IoT support from Schneider - Electronics EETimes (registration) [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Futures Shaped by Automation and Catastrophe: Peter Frase on Capitalism's Endgame - Truth-Out [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Automation expected to displace insurance underwriters, real estate brokers - CIO Dive [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Automation, robots could replace 250000 public sector workers in the next 15 years - Computer Business Review [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Design Automation Conference - Business Wire (press release) [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- The Perks Of Automation And The Risks: Why To Think Twice About Getting Into That Driverless Uber - Forbes [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Lib Dems Should Embrace Automation of the Workforce - Liberal Democrat Voice [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Voices Reinventing enterprise finance by overhauling AP automation - Accounting Today [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- How Accountants Can Use Automation Their Advantage - Accountingweb.com (blog) [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- DFLabs Launches the First Security Automation and Orchestration Platform based Upon Supervised Active Intelligence - Business Wire (press release) [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- QAD Automation Solutions is Honda Approved - Yahoo Finance [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- VIDEO: Going Big on Automation in a Small Footprint Facility - ENGINEERING.com [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Building a better model of human-automation interaction - Phys.Org [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- AlixPartners examines automation in manufacturing and logistics management - Logistics Management [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Report: Test automation is increasing - SD Times - SDTimes.com [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Automation is the unavoidable future of the economy - The Daily Cougar [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- GM's Cruise Automation Is Testing An App to Order Self-Driving ... - Fortune [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Speeders beware: Legislation would allow automation crackdown ... - SFGate [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Orbita Ingenieria: New Age Terminal Automation - Port Technology International [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- A Sharper Focus on the Edge - Automation World [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Rockwell Automation Surged 10% in January as Growth Picked Up Steam - Motley Fool [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Most people are optimistic about workplace automation, social data suggests - ZDNet [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Improving Behavior Through Automation of Vehicle Systems - School Transportation News (blog) [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- 'We employ insane levels of automation' Kris Canekeratne - Times of India [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Why Don't We See More Automation in Federal Networks? - Nextgov [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Technobabble: Automation and the modern worker - CIO Dive [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Readers Write (Feb. 12): The moose population; jobs, start-ups and automation; diversity in the funny pages - Minneapolis Star Tribune [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Automation Nightmare: Philosopher Warns We Are Creating a World Without Consciousness - Big Think [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Automation can replace bureaucrats and save taxpayers money - Hot Air [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Automation can revitalize the US workforce - Fox News [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- TigerStop hopes to ride automation to new heights - The Columbian [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Hexadite Unveils Custom Playbooks Following One Millionth Automated Cybersecurity Investigation - Yahoo Finance [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- NEC updates postal automation system for Hongkong Post - ETCIO.com [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]