The Relativity of Performance Improvements

 

Today, after receiving a 1.7MB daily security log message containing thousands of ssh failed login attempts from bots around the world, I decided I had enough. I enabled IPFW to a FreeBSD system I maintain, and added a script to find and block the offending IP addresses. In the process I improved the script's performance. The results of the improvement were unintuitive.

This is the original version of the script I found on a FreeBSD wiki.

#!/bin/sh
if ipfw show | awk '{print $1}' | grep -q 20000 ; then
        ipfw delete 20000
fi
for ips in `cat /var/log/auth.log | grep sshd | grep "Illegal" |
awk '{print $10}' | uniq -d` ; do
        ipfw -q add 20000 deny tcp from $ips to any
done
cat /var/log/auth.log | grep sshd | grep "Failed" | rev  |
cut -d\  -f 4 | rev | sort | uniq -c | \
( while read num ips; do
    if [ $num -gt 5 ]; then
         if ! ipfw show | grep -q $ips ; then
                ipfw -q add 20000 deny tcp from $ips to any
        fi
    fi
  done
)
As I read the script in order to understand it, I saw many ways I could simplify it. The following is the revised code. In revising it I
  • used one search pass for catching failed attempts for both legal and illegal users,
  • integrated the countinng of multiple attempts into awk using an associative array,
  • replaced the superfluous cat command with input redirection, and
  • removed the check for duplicate entries, since the IPFW rule is always deleted at the beginning of the script.
#!/bin/sh
if ipfw show | awk '{print $1}' | grep -q 20000 ; then
        ipfw delete 20000
fi

awk '/sshd.*authentication error/ {try[$(NF)]++}
END {for (h in try) if (try[h] > 5) print h}' /var/log/auth.log |
while read ip
do
        ipfw -q add 20000 deny tcp from $ip to any
done

I made the changes, just because the original code offended my sense of parsimony, not because I believed that the code was fundamentally inefficient. Having made them I decided to measure their impact. On my system the new version of command runs twice as fast as the old one (20ms against 50ms). However, the reduction on the overall load system load is negligible. I calculated that if I run the command every 3 minutes, it will take up 0.01% of the system resources; the old one would consume 0.02%. No wonder we're seeing software bloat everywhere. For most cases tuning a non-optimal design is simply not worth the effort. Then, when the need for performance truly arises, the knowledge and experience of how to improve it is probably missing.

Comments   Toot! Share


Last modified: Monday, January 7, 2008 10:58 am

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.