In this article, we will discuss setting up a network, server and application monitoring. This is a topic that is typically high on the list of all systems administrators out there. The one thing system admins like is “Control” and control means having a full overview of their network and infrastructure and knowing everything that is occurring on the systems and network at any given time. Network and server monitoring allow them to have this and also allows them to be more proactive rather than reactive and stay on top of equipment that is under-performing below baselines and prompt for more Server and Network Maintenance tasks. The three core systems that should be monitored are the network, the servers and the applications.
The first step to network monitoring is documenting baselines and mapping out your network into a diagram of some sort. If you already have a diagram then this needs to be kept up to date and easily accessible on a wall in a clear vision of the team. From this diagram, you can easily identify the core components that make up your network that will need monitoring. Once you know what you are going to monitor then the next step is to document baselines. The term baseline refers to what’s normal for that particular system. So a baseline can include what the normal average CPU utilisation of a server is usually performing or the average bandwidth being used during a typical day on a WAN link.
So how do your devices report or gather this information for you? Your network devices can be monitored using a protocol called SNMP Simple Network management protocol. SNMP is an industry standard and has been around for a very long time and it is currently up to version 3. You will find the majority of manufacturers of networking equipment will have an option to use SNMP. SNMP allows your monitoring software of choice to collect information from an SNMP-enabled device using this protocol. As long as both the software program and hardware device are using the same community string then all monitored data is captured and made available to you.
Netflow is another form of a monitoring protocol commonly used in Cisco devices and is basically SNMP on steroids. It will capture a massive amount of information that SNMP will not such as top talkers by IP address, top applications or protocols being used or resource-hungry users on your network link. Netflow basically gives you layer 7 reporting and monitoring on your network. If you get Netflow set up on your networking equipment and especially your firewall you will have a better understanding of what is actually happening on your network in more detail.
When it comes to monitoring Security on your network, if you have an IPS or IDS-enabled appliance it’s good practice to set up some sort of monitoring on any detection or patterns occurring. Most systems will allow you to alert on pattern monitoring so you can see if there are any abnormal threats potentially occurring on your network.
Configuration management can apply to all three core systems within your infrastructure. Not only are you responsible for monitoring your system’s health but you are also monitoring any changes being made to your systems. If someone makes a change to a piece of networking equipment after hours then it is a good practice to capture these and alert the team to ensure they are aware of these changes. If you have a change management system then this is highly encouraged to ensure that these types of changes are captured and monitored. Change management systems would require all users who make a change to complete a form and when approved all staff in your team will be notified of the change. These alerts are very valuable to the team to make sure your team is aware of the changes.
There should be a little more emphasis placed on monitoring your servers. If a piece of networking equipment fails then the time it takes to recover can be usually fairly quick but if a server dies unexpectedly then recovering the server can take a little more time causing downtime to the organisation. By monitoring your servers you might catch issues earlier before they crash and cause a recovery situation. SNMP can be used to monitor servers as well as networking equipment. To use SNMP on your servers you must Activate this manually within your add roles and features wizard. WMI (Windows management instrumentation) is enabled by default on windows servers and this can also be used as a way to provide monitoring on basically everything about the server and also other Microsoft products such as SQL and Exchange. Things that you would look for in server monitoring would be page file size, CPU utilisation, Hard drive utilisation and free space, Network traffic and obviously network connectivity and up-time via a simple ping monitor. Also, it is a good habit to also monitor your server’s event logs. Occasionally log into your servers and take a look for reoccurring errors in system, application and security event logs and look to resolve those issues if possible.
Along with monitoring your network and servers, you will also need to monitor the core applications you use on your network. Organisations are moving away from hosting applications on-premise but if you do host Exchange or an SQL server for example it’s very important that you set up monitoring for these as well. For on-premise Exchange servers, the components you will focus on are the Exchange queues, storage and database sizes, User mailbox sizes and OWA/ Active Sync connectivity. For SQL server you can set up monitoring for your transaction logs and performance. If transaction logs are growing at a rapid rate then you should be alerted. You can also monitor active sessions connected to the SQL server or even queries run by a user then you can be alerted when a query has been running for a long period of time that could be taking up valuable resources. If you are hosting your own website then this is something else that you should consider monitoring. For this, simply set up a couple of monitors to ensure ISS is running and your app pools and the performance of your server remains healthy.
If you are running VMware then is great to monitor your CPU load and memory usage of your ESXi hosts to make sure they are not spiking. It is also important to make sure your VMs are resourced correctly and not over-provisioned in any way.
Last, of all, you should monitor your Backups and the program that is running your backups. You should be alerted for both successful backups and failed backups and also on the storage you are backing up to make sure that is not filling up.
Once you have all of your network and server monitoring sensors set up then it is important to decide who is going to get notified of these alerts and even more importantly who will action certain alerts for particular systems. It’s great to have these alerts being sent out but if they are not being action-ed then they are really deemed pointless. Network and server monitoring is not only important to give the team control over the workings and health of the systems but also allows the team to be more proactive resulting in ensuring the systems are consistently running at peak performance and increasing employee productivity.