What “monitoring” means (and why it matters)
You’re watching
key resources in real time (or from logs/history) so you can answer:
CPU busy? memory full? disk or network slow? which process is the
culprit?
Core areas: CPU, Memory (RAM/Swap), Disk & Filesystem, Network, Processes/Services, Logs.
0) Fast “what’s wrong?” checklist
uptime # load average & how long the system’s been up
top # live view of CPU/mem/processes (q to quit)
free -h # RAM & swap usage
df -h # disk space per filesystem
du -sh * # which folders are big (in current dir)
ss -tuna # open TCP/UDP sockets
journalctl -p err -n 50 # last 50 error-level log lines (systemd)
Tip: rerun any
command every 2s with watch
-n2 "<command>"
.
1) CPU monitoring
·
top /
htop: per-process CPU%, memory,
load; press 1
in top
to see per-CPU cores.
· ps: quick sorted snapshots.
·
ps aux --sort=-%cpu | head
·
ps aux --sort=-%mem | head
·
mpstat (per-CPU), pidstat (per-process over
time) – from sysstat
package:
·
sudo apt install sysstat
·
mpstat 2 5 # every 2s, 5 samples
·
pidstat -p <PID> 1 # track one process each second
·
Load
average (from uptime
or top
): rough queue length
of runnable tasks.
A rule of thumb: if load ≫
number of CPU cores for sustained time → CPU-bound or lots of I/O wait.
2) Memory (RAM & swap)
· free -h: human-readable RAM and swap.
·
In top
: check RES (actual RAM used) vs VIRT
(address space).
·
vmstat 2 (from procps
): quick view of si/so
(swap in/out), wa (I/O wait).
· If memory is tight, look for big processes:
·
ps aux --sort=-%mem | head
3) Disk & filesystem
· Space:
·
df -h # per mountpoint free/used
·
du -sh * | sort -h # largest folders
· I/O performance:
o iostat -xz 1 (sysstat): device utilization, queue, await latency.
o iotop (needs root): per-process disk I/O in real time.
o
sudo apt install iotop && sudo iotop
4) Network
· Interfaces & counters:
·
ip -s link # RX/TX stats; check for drops/errors
·
ss -tuna | head # which ports/connections are open
·
ping -c 3 8.8.8.8 # basic reachability
·
traceroute example.com # path (sudo apt install traceroute)
· Live bandwidth (pick one):
·
sudo apt install iftop nload bmon
·
sudo iftop -i eth0 # per-connection bandwidth
·
nload # simple in/out meters
·
bmon # interface graphs
5) Processes, services, and logs
·
Which
process is heavy? → top
, ps
, pidstat
.
· Service status (systemd):
·
systemctl status nginx
·
systemctl --failed
· Logs:
·
journalctl -u nginx --since "1 hour ago" # one service
·
journalctl -p warning -b # warnings+ since last boot
·
dmesg -w # kernel messages stream
· Open files / ports (when something is “in use”):
·
sudo lsof -p <PID> | head
·
sudo lsof -i :5432
6) Historical monitoring (see past, not just live)
·
Enable sysstat
collection to use sar
for history:
·
sudo apt install sysstat
·
sudo sed -i 's/ENABLED="false"/ENABLED="true"/' /etc/default/sysstat
·
sudo systemctl enable --now sysstat
·
sar -u 1 5 # CPU samples now
·
sar -r # historical memory (from cron-collected stats)
· For longer term and visuals later: glances, btop, netdata, prometheus+grafana (names to know).
7) Handy one-liners
# Top 10 memory hogs
ps -eo pid,comm,%mem,%cpu --sort=-%mem | head
# Show top I/O wait culprits (needs pidstat/sysstat)
pidstat -d 1 | head
# Follow a log live (search as you go)
journalctl -f | grep -i "error"
# See CPU usage per core for 10 seconds
mpstat -P ALL 1 10
8) Mini-labs (30–40 min total)
Lab A: CPU & load
yes > /dev/null & # start a busy loop
top # see CPU% & load; press 1
pkill yes
Lab B: Memory pressure
free -h
python3 -c "a='x'*200*1024*1024; import time; time.sleep(20)" &
free -h # see used & cached
kill %1
Lab C: Disk usage & I/O
dd if=/dev/zero of=bigfile bs=1M count=500 oflag=direct status=progress
iostat -xz 1 | head -n 20
rm bigfile
Lab D: Network glimpse
ping -c 5 google.com
ss -tuna | head
Lab E: Logs & services
sudo systemctl status cron
journalctl -u cron --since "10 min ago"
9) Safety & exam tips
·
Don’t kill random
PIDs on shared systems; prefer kill <PID>
(polite)
over kill -9
.
· Load average ≠ CPU% but correlates; sustained load way above core count is a flag.
· For disks: high util% / long await → I/O bottleneck.
·
For memory: heavy
swap or oom-killer messages in dmesg
→ RAM pressure.
· Know the big tools by name: top/htop, ps, free, df/du, iostat/iotop, ss/iftop, journalctl, dmesg, sar.
If you want, I can bundle these into a 2-page printable cheat sheet or a guided lab PDF for your class.