Welcome to the next installment in our blog series highlighting the companies in SoftLayer’s new Technology Partners Marketplace. These Partners have built their businesses on the SoftLayer Platform, and we’re excited for them to tell their stories. New Partners will be added to the Marketplace each month, so stay tuned for many more come.
- Paul Ford, SoftLayer VP of Community Development
Scroll down to read the guest blog from Jason Abate of Panopta, a SoftLayer Tech Marketplace Partner specializing in monitoring your servers and managing outages with tools and resources designed to help minimize the impact of outages to your online business. To learn more about Panopta, visit http://www.panopta.com/.
Server Monitoring Best Practices
Prior to starting Panopta, I was responsible for the technology and operations side of a major international hosting company and worked with a number of large online businesses. During this time, I saw my share of major disasters and near catastrophes and had a chance to study what works and what doesn’t when Murphy’s Law inevitably hits.
Monitoring is a key component of any serious online infrastructure, and there are a wide range of options when it comes to monitoring tools — from commercial and open-source software that you install and manage locally to monitoring services like Panopta. The best solution depends on a number of criteria, but there are five major factors to consider when making this decision.
1. Get the Most Accurate View of Your Infrastructure
Accuracy is a dual-edged sword when it comes to monitoring that can hurt you in two different ways. Check too infrequently and you’ll miss outages entirely, making you think that things are rosy when your customers or visitors are actually encountering problems. There are tools that check every 30 minutes or more, but these are useless to real production sites. You should make sure that you can perform a complete check of your systems every 60 seconds so that small problems aren’t overlooked.
I’ve seen many people setup this high-resolution monitoring only to be hit with a barrage of alerts for frequent short-lived problems which were previously never detected. It may hurt to find this, but at least with information about the problem you can fix it once and for all.
The flip side to accuracy is that your monitoring system needs to verify outages to ensure they are real in order to avoid sending out false alerts. There’s no faster way to train an operations team to ignore the monitoring system than with false alerts. You want your team to jump at alerts when they come in.
High-frequency checks that are confirmed from multiple physical locations will ensure you get the most accurate view of your infrastructure possible.
2. Monitor Every Component of Your Infrastructure
There are lots of components that make up a modern website or application, and any of them could break at any time. You need to make sure that you’re watching all of these pieces, whether they’re inside your firewall or outside. Lots of monitoring providers focus purely on remotely accessible network services, which are important but only one half of the picture. You also want an inside view of how your server’s resources are being consumed, and how internal-only network devices (such as backend database servers) are performing.
Completeness also means that it’s economically feasible to watch everything. If the pricing structure of your monitoring tool is setup in a way that makes it cost prohibitive to watch everything then the value of your monitoring setup is greatly diminished. The last thing you want to run into when troubleshooting a complex problem is to find that you don’t have data about one crucial server because you weren’t monitoring it.
Make sure your monitoring system is able to handle all of your server and network components and gives you a complete view of your infrastructure.
3.Notify the Right People at the Right Time
You know when the pager beeps or the phone rings about an outage, your heart beats a little faster. Of course, it’s usually in the middle of the night and you’re sleeping right?! As painful as it may be, you want your monitoring system to get you up when things are really hitting the fan – it’s still better than hearing from angry customers (and bosses!) the next morning.
However, not all outages are created equally and you may not want to be woken up when one of your clustered webservers briefly goes down and then corrects itself a few minutes later. The key to a successful monitoring solution is to have plenty of flexibility in your notification setup including being able to setup different notification types based on the criticality of the service.
You also want to be able to escalate a problem, bringing in additional resources for long-running problems. This way outages don’t go unnoticed for hours while the on-call admin who perpetually sleeps through pages gets more shut-eye.
Make sure that when it comes to notification, your monitoring system is able to work with your team’s preferred setup, not the other way around.
4. Don’t Just Detect Problems, Streamline Fixing Them
Sending out alerts about a problem is important, but it’s just the first step in getting things back to normal. Ideally after being alerted an admin can jump in and solve whatever the problem is and life goes on. All too often though, things don’t go this smoothly.
You’ve probably run into situations where an on-call admin is up most of the night with a problem. That’s great, but when the rest of the team comes in the next morning they have no idea what was done. What if the problem comes up again? Are there important updates that need to be deployed to other servers?
Or maybe you have a big problem that attracts interest from your call center and support staff (your monitoring system did alert you before they walked up, right?) Or management from other departments interrupt to get updates on the problem so they can head off a possible PR disaster.
These are important to the operation of your business, but they pull administrators away from actually solving the problem, which just makes things worse. There should be a better way to handle these situations. Given it’s central role in your infrastructure management, your monitoring system is in a great position to help streamline the problem solving process.
Make sure your monitoring system gives you tools to keep everyone on the same page by letting everyone easily communicate and log what was ultimately done to resolve the problem.
5. Demonstrate how Your Infrastructure is Performing
Your role as an administrator is to keep your infrastructure up and running. It’s unfortunately a tough spot to be in – do your job really well and no one notices. But mess up, and it’s clearly visible to everyone.
Solid reporting capabilities from your monitoring system give you a tool to help balance this situation. Be sure to get summary reports that can demonstrate how well things are running or make the argument for making changes and then following up to show progress. Availability reports also let you see a “big picture” view of how your infrastructure is performing that often gets lost in the chaos of day-to-day operations.
Detailed reporting gives you the data you need to accurately assess and promote the health of your infrastructure.
The Panopta Difference
There are quite a few options available for monitoring your servers, each of which come with trade offs. We’ve designed Panopta to focus on these five criteria, and having built on top of SoftLayer’s infrastructure from the very beginning are excited to be a part of the SoftLayer Technology Marketplace.
I would encourage you to try out Panopta and other solutions and see which is the best fit to the specific requirements for your infrastructure and your team – you’ll appreciate what a good night’s sleep feels like when you don’t have to worry about whether your infrastructure is up and running.
-Jason Abate, Panopta
- Lilah Brown's Planets, Part II (or, Season II preview) [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Snow White needs a bailout [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- To the moon [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- S/1 90482 (2005) needs your help [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- We'll always have Regulus [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Orcus Porcus [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Kant's Crowded Universe [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Look up! [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Baby Pictures [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Encore: Yelping at Saints [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Godspeed [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Heavens above! [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Homeward bound [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Sony Pictures and the end of the world [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Thank you from the future [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Lunar dreams [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The first of the Pluto books! [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Don't try to blame it on Rio [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Rio roundup [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The long road to a Titan storm [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Planetary Placemats [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Fog! Titan! Titan Fog! (and a peer review experiment) [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Millard Canyon Memories [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The problem with science [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- P.S. on the problem with science [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- How Big is 10 TB? [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Showing You Your Servers [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Pick Your Partnership: Referral Partners, Resellers and Affiliates [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Server Form Factors: Towers v. Rack-Mounts [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Lights-Out in the Data Centers [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Disruptive Technologies: Virtualization and The Cloud [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Know Thy Backups – Part I [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Know Thy Backups – Part II [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Boo Bash 2009 – Desktop Costume Included! [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Why No One Will Talk About “Cloud Computing” in 10 Years [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The end of the fall [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
- We Love ‘Server Huggers’ [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
- All About the Cloud: An Interview with Dell’s Cloud Evangelist [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
- Happy Solstice [Last Updated On: December 21st, 2009] [Originally Added On: December 21st, 2009]
- A ghost of Christmas past [Last Updated On: December 31st, 2009] [Originally Added On: December 31st, 2009]
- Learning from a Blender [Last Updated On: January 5th, 2010] [Originally Added On: January 5th, 2010]
- Changing my world [Last Updated On: January 6th, 2010] [Originally Added On: January 6th, 2010]
- A Server. From Scratch. [Last Updated On: January 7th, 2010] [Originally Added On: January 7th, 2010]
- The Planet Sand Castle: Upgrade Your Sandbox [Last Updated On: January 12th, 2010] [Originally Added On: January 12th, 2010]
- Hosting for Haiti [Last Updated On: January 20th, 2010] [Originally Added On: January 20th, 2010]
- Redefining Value [Last Updated On: January 26th, 2010] [Originally Added On: January 26th, 2010]
- My Experience as a Newbie at The Planet [Last Updated On: January 28th, 2010] [Originally Added On: January 28th, 2010]
- Confessions of Another New Planeteer [Last Updated On: February 1st, 2010] [Originally Added On: February 1st, 2010]
- How I Learned to Stop Worrying and Love Permissions [Last Updated On: February 11th, 2010] [Originally Added On: February 11th, 2010]
- Where at The Planet is Rachel? [Last Updated On: February 15th, 2010] [Originally Added On: February 15th, 2010]
- The Planet Storage Cloud: FYI [Last Updated On: February 19th, 2010] [Originally Added On: February 19th, 2010]
- Meet us in March [Last Updated On: February 25th, 2010] [Originally Added On: February 25th, 2010]
- The Planet in “The Channel” [Last Updated On: March 2nd, 2010] [Originally Added On: March 2nd, 2010]
- The Planet Server Challenge [Last Updated On: March 13th, 2010] [Originally Added On: March 13th, 2010]
- The Definitive Guide to Finding The Planet at SXSW [Last Updated On: March 13th, 2010] [Originally Added On: March 13th, 2010]
- The SXSW Iron Geek Champion! [Last Updated On: March 15th, 2010] [Originally Added On: March 15th, 2010]
- Drinking from the Fire Hose [Last Updated On: March 16th, 2010] [Originally Added On: March 16th, 2010]
- The Fastest Hands at SXSW [Last Updated On: March 17th, 2010] [Originally Added On: March 17th, 2010]
- System.out.println(“Hello World!”); [Last Updated On: March 22nd, 2010] [Originally Added On: March 22nd, 2010]
- Westmere – Get it Here [Last Updated On: March 23rd, 2010] [Originally Added On: March 23rd, 2010]
- Orbit on Your iPhone: A Sign of Things to Come [Last Updated On: March 24th, 2010] [Originally Added On: March 24th, 2010]
- #ShowMeMyServer 2.0 [Last Updated On: March 25th, 2010] [Originally Added On: March 25th, 2010]
- Get to Know Your Visitors [Last Updated On: March 30th, 2010] [Originally Added On: March 30th, 2010]
- The Next Big Thing in Hosting: The Hostatulator [Last Updated On: April 1st, 2010] [Originally Added On: April 1st, 2010]
- Storage Cloud and the City [Last Updated On: April 4th, 2010] [Originally Added On: April 4th, 2010]
- American Heart – Why I Walk [Last Updated On: April 7th, 2010] [Originally Added On: April 7th, 2010]
- The Cake Shouldn’t Be a Lie [Last Updated On: April 8th, 2010] [Originally Added On: April 8th, 2010]
- April Showers Bring May Flowers [Last Updated On: April 9th, 2010] [Originally Added On: April 9th, 2010]
- First at The Planet: Nehalem EX 4-Socket Servers [Last Updated On: April 15th, 2010] [Originally Added On: April 15th, 2010]
- Intel Guest Blog: Xeon 5600 [Last Updated On: April 16th, 2010] [Originally Added On: April 16th, 2010]
- Inside the Office: A Birthday Surprise [Last Updated On: April 18th, 2010] [Originally Added On: April 18th, 2010]
- The Planet @ Cloud Expo East [Last Updated On: April 19th, 2010] [Originally Added On: April 19th, 2010]
- The Planet @ ad:tech SF [Last Updated On: April 22nd, 2010] [Originally Added On: April 22nd, 2010]
- ad:tech Server Challenge [Last Updated On: April 22nd, 2010] [Originally Added On: April 22nd, 2010]
- ad:tech Panel: Developing Communities Online [Last Updated On: April 23rd, 2010] [Originally Added On: April 23rd, 2010]
- The Planet @ Interop Las Vegas [Last Updated On: April 27th, 2010] [Originally Added On: April 27th, 2010]
- Overflowing With Value: 10TB is Back! [Last Updated On: April 28th, 2010] [Originally Added On: April 28th, 2010]
- The Cloud is NOT the Revolution [Last Updated On: April 29th, 2010] [Originally Added On: April 29th, 2010]
- The Importance of Orbit 2.0 [Last Updated On: May 5th, 2010] [Originally Added On: May 5th, 2010]
- The Planet @ Web 2.0 Expo [Last Updated On: May 6th, 2010] [Originally Added On: May 6th, 2010]