SSD Grow

Proven VPS Troubleshooting Techniques for Downtime Reduction

You may visit your website or service and find that nothing loads. You may have received angry emails from viewers or customers or your monitoring system has sent you a dire warning. This means that your server is down. Regardless of how you learn about the downtime, your priority is to get it back online as soon as possible. Think of downtime as a fire, it can happen to anyone, no matter how careful you are and sometimes it happens for completely unexpected reasons. Just like in fire drills from your middle school days, having a plan outlined ahead of time minimizes panic and gets the job done fast. You can also develop a plan of attack for VPS Hosting troubleshooting. The better prepared you are in advance, the faster you’ll get back online, minimizing downtime. Here are just over a half-dozen ways to get back online.
Uptime is a useful tool for monitoring your server’s performance, including system load and potential CPU overloading.

$ uptime
14:35:45 up 1 day, 18:41, 1 user, load average: 0.04, 0.03, 0.05

The output of the command provides information about the current time of the system, its uptime, the number of active users, and the system load. The system load is indicated by three numbers which represent the load averages of the previous minute, 5 minutes, and 15 minutes, respectively. If the load averages are higher than 1, it could indicate that some process is overloading the CPU.

Get the full history

To view a chronological list of the commands that have been recently executed, use the ‘history’ command. By reviewing our past actions, we can often gain insight into why something is not functioning correctly in the present. For example, can you determine if recent updates via yum/apt might be causing any downtime? Is there anything unusual that stands out?

Getting a handle on who

The `w` command provides information about who is currently logged into the system and what they are doing. This command is particularly useful if you have multiple users accessing your VPS via SSH, as it allows you to keep track of their activities. Additionally, it can alert you if an unknown user is accessing your system and potentially violating your privacy.

$ w
14:34:59 up 1 day, 18:41, 1 user, load average: 0.08, 0.03, 0.05
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
joel pts/0 123.456.78.9 14:34 3.00s 0.01s 0.00s w

Here’s how to read the output:
  • USER – The user’s name.
  • TTY – The terminal type.
  • FROM – The IP or hostname from which the user accessed your VPS.
  • LOGIN@ – The time at which the user logged in.
  • IDLE – Their idle time.
  • JCPU – This refers to the time used by all the processes related to this terminal instance.
  • PCPU – This refers to the time used by only the current process that’s displayed in the WHAT field.
  • WHAT – The user’s current process from the command line.

You can also use the last | head # to see a list of all the previous logins.

Running vps

It can be helpful to know what processes are running on your VPS during a downtime, if any, as it can aid in diagnosing issues. To see all the current running processes, you can use the command “ps auxf”. This will display both CPU and memory, making it easier to identify a process that may be causing problems.

Never use disk space

Famous last words, eh? People often underestimate their disk space needs, and running out of space can be disastrous.

To visualize the available space on every disk and partition, you can use the df -h command. If you are running out of space, there could be several reasons for it. For instance, you may have uploaded too many GIFs on your WordPress blog or have a massive apt cache. However, using this command will help you identify the issue and get started with troubleshooting.

 

To take it a step further, you can also check for available inodes by using the df -hi variation.

Killing is never good

If you are running out of RAM, you might start getting some notifications in /var/log/messages related to “killing” processes. Use grep to search for these messages like so:

$ sudo grep kill /var/log/messages

If there’s any output, that’s one sign that your VPS is trying its very best to free up RAM by killing any processes it can.If there’s any output, that’s one sign that your VPS is trying its very best to free up RAM by killing any processes it can.

Looking down from on high

If you’re looking for a single command that can give you much of the same information you’d find in the many above, check out top, and its variant htop. You’ll need to install the former via your OS’ package manager, but top comes with most all Linux installations.

$ top
top - 16:08:53 up 1 day, 20:15, 2 users, load average: 0.00, 0.01, 0.05
Tasks: 25 total, 1 running, 24 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8388608 total, 8220476 free, 44288 used, 123844 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 3024374 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 1 -19 42960 3316 2296 S 0.0 0.0 0:02.26 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd/210ddf
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
56 root 1 -19 223060 73908 73616 S 0.0 0.9 0:17.83 systemd-journal

htop offers another level of detail and some little graphs that might be easier to understand than raw numbers. Either is useful for seeing tons of information about your VPS at a glance.

Setting it all together for quicker vps troubleshooting

Although all the aforementioned commands are helpful in their way, only you can determine which ones are the most relevant and in which order you want to execute them, based on your experience and specific needs.
It’s important to note that there are numerous other similar commands available, so consider these as just a starting point. Once you’ve become proficient with these commands, you can delve into other tools like dmesg, ss, sar and more.

Conclusion

These tried-and-tested VPS troubleshooting techniques offer effective strategies for minimizing downtime and ensuring optimal system performance and reliability.

Related Articles