Friday 27 September 2013

health-check revisited

Earlier this month I wrote about health-check, a tool to sanity check application resource usage.  Since then I've re-worked and simplified the system call tracing and added some new features.

Originally, health-check was designed to attach itself to running processes. To make the tool even easier to use it can now start applications and follow any subsequent new processes spawned off from fork() or clone(). For example:
sudo health-check -u joeuser -f thunderbird
..will start thunderbird (running as user "joeuser") and follow any new processes it creates.

Sometimes applications will force flushing of data or metadata to disk by excessive use of fsync(), fdatasync() and sync().  Health-check will now keep track of these system calls and provide feedback on their use.

Health-check already has some memory checking techniques - however, it now has been extended to check explicitly for heap changes by examining the brk() system call and also keeping track of mmap() and munmap() mappings.  This allows better tracking of potential memory leaks.

Source code can be found at: git://kernel.ubuntu.com/cking/health-check

Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:
 sudo add-apt-repository ppa:colin-king/white  
 sudo apt-get update  
 sudo apt-get install health-check  

Thursday 5 September 2013

health-check: a tool to diagnose resource usage

Recently I have been focused on ways to reduce power consumption on Ubuntu phones. To identify resource hungry issues one can use tools such as top, strace and gdb, or inspect per process properties in /proc/$pid, however this can be time consuming and not practical with many processes in a system. Instead, I decided it would be profitable to write a tool to inspect and monitor a running program and report on areas where it appeared to be sub-optimal.  And so health-check was born.

One provides health-check with a list of one or more processes and it will monitor all the associated threads (and child processes) and then report back on the resources used. Heath-check will report on:
  • CPU utilisation
  • Wakeup events
  • Context Switches
  • File I/O operations (Open/Read/Write/Close using fnotify)
  • System calls (using ptrace)
  • Analysis of polling system calls
  • Memory utilisation (including memory growth)
  • Network connections
  • Wakelock activity
CPU utilisation, wakeup events and context switches are useful just to see how much CPU related activity is occurring.   For example,  a program that has multiple threads that frequently ping-pongs between them will have a high context switch rate, which may indicate that it is busy passing data or messages between threads.  A process may be frequently polling on short timer delays may show up as generating a high level of wakeup events.

Some applications may be sub-optimally writing out data frequently, causing dirty pages and meta data that needs to be written back to the file system.  Health-check will capture file I/O activity and report on the names of the files being opened, read, written and closed.

To help identify excessive or heavy system call usage, health-check uses ptrace to trap and monitor all the system calls that the program makes.  For example, it has been observed that some applications excessively call poll() and nanosleep() with poorly chosen timeouts causing excessive CPU utilisation.  For system calls such as these where they can wait until an event or a timeout occur, health-check has some deeper monitoring.  It inspects the given timeout delay and checks to see if the call timed out, for example,  health-check can identify CPU sucking repeated polling where zero timeouts are being used or excessive nanosleeps with zero or negative delays.

The ptrace ability of heath-check also allow it to monitor per-process wake lock writes. Abuse of wakelocks can keep the a kernel from suspending into deep sleep so it is useful to keep track of wakelock activity on some processes. This is not enabled by default as it is an expensive operation to monitor this via ptrace and also some kernels may not have wakelocks, so one has to use the -W option to enable this.

Health-check also inspects /proc/$pid/smaps and will determine if memory utilisation has grown or shrunk.  Unusually high heap growth over time may indicate that an application has a memory leak.

Finally, health-check will inspect /proc/$pid/fd and from this determine any open sockets and then try and resolve the host names of the IP addresses.  For example, it is entirely possible for an application to be making spurious or unwanted connections to various machines, so it is helpful to check up on this kind of activity.

Health-check is still very early alpha quality, so beware of possible bugs.   However, it has been helpful in identifying some misbehaving applications, so it is already proving to be rather useful.

Source code can be found at: git://kernel.ubuntu.com/cking/health-check

Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:

 sudo add-apt-repository ppa:colin-king/white  
 sudo apt-get update  
 sudo apt-get install health-check  
..and go and track down some resource sucking apps..