On Tuesday March 17th 2020 my free online massive open online course (MOOC) on the use of Unix command line tools for data, software, and production engineering goes live on the edX platform. Already more than one thousand participants from around the world have registered for it; you should still be able to enroll through this link. In response to the course's announcement seasoned researchers from around the world have commented that this is an indispensable course and that it is very hard to beat the ROI of acquiring this skillset, both for academia and industry. In an age of shiny IDEs and cool GUI tools, what are the reasons for the enduring utility and popularity of the Unix command line tools? Here's my take.
Continue reading "Seven reasons to add Unix command line expertise to your tool chest"
Applied Code Reading: Debugging FreeBSD Regex
When the code we're trying to
read is inscrutable,
inserting print statements and running various test cases can be
two invaluable tools.
Earlier today I fixed
a tricky problem in the FreeBSD regular expression library.
originally written by Henry Spencer in the early 1990s,
is by far the most complex I've ever encountered.
It implements sophisticated algorithms with minimal commenting.
Also, to avoid code repetition and increase efficiency,
the 1200 line long main part of the regular expression execution engine is
included in the compiled C code
three times after modifying various macros to adjust the code's behavior:
the first time the code targets small expressions and operates
with bit masks on long integers,
the second time the code handles larger expressions
by storing its data in arrays,
and the third time the code is also adjusted to handle multibyte characters.
Here is how I used test data and print statements to locate and fix the problem.
Continue reading "Applied Code Reading: Debugging FreeBSD Regex"
Monitor Process Progress on Unix
I often run file-processing commands that take many hours to
finish, and I therefore need a way to monitor their progress.
The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple
of years back in 1993 (don't ask)
had a facility that displayed for any executing
command the percentage of work that was completed.
When I first saw this facility working on the programs I maintained,
I couldn't believe my eyes, because I was sure that those rusty
Cobol programs didn't contain any functionality to monitor their progress.
Continue reading "Monitor Process Progress on Unix"
Open and Closed Source Kernels Go Head to Head
Earlier today I presented at the
30th International Conference on Software Engineering a
research paper comparing the
code quality of Linux, Windows (its
research kernel distribution),
For the comparison I parsed multiple configurations of these systems (more than ten million lines), and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records.
(Iíve made the databases and the SQL queries available
The areas I examined were file organization, code structure, code style, preprocessing, and data organization.
To my surprise there was no clear winner or looser, but there were interesting differences in specific areas.
Continue reading "Open and Closed Source Kernels Go Head to Head"
The Power of an Integrated Platform
FreeBSD, unlike Linux, is not a kernel, but a complete
This allows a much smoother integration of its components,
which is a real boon when you try to locate and fix a problem.
The source code for all the parts is all ordered in a single
directory tree for you to examine and experiment with.
Continue reading "The Power of an Integrated Platform"
The Relativity of Performance Improvements
Today, after receiving a 1.7MB daily security log message containing
thousands of ssh failed login attempts from bots around the
world, I decided I had enough.
I enabled IPFW to a FreeBSD system I maintain, and added a script
to find and block the offending IP addresses.
In the process I improved the script's performance.
The results of the improvement were unintuitive.
Continue reading "The Relativity of Performance Improvements"
International BSD Conference in Turkey
I'm on my way back from the
International BSD Conference in Turkey,
which a group of enthusiastic members of our community organized on
Friday and Saturday.
Continue reading "International BSD Conference in Turkey"
The Memory Savings of Shared Libraries
A recent thread in the
FreeBSD ports mailing list
discusses the benefits and drawbacks of static builds.
How can we measure the memory savings of shared libraries?
Continue reading "The Memory Savings of Shared Libraries"
The Tools we Use
It is impossible to sharpen a pencil with a blunt ax. It is equally vain to try to do it with ten blunt axes instead.
— Edsger W. Dijkstra
Continue reading "The Tools we Use"
A Humbling Upgrade
Yesterday I upgraded one of the servers I maintain from
FreeBSD 4.11, which had reached its
end of life, into the latest
production release 6.2.
It was a humbling experience.
Continue reading "A Humbling Upgrade"
Cross compiling software on a host platform to run on a different
target used to be an exotic stunt to be performed by
the brave and desperate.
One had first to configure and build the compiler, assembler, archiver,
and linker for the different architecture, then cross-build the other
architecture's libraries, and finally the software.
This week, while preparing a new release of the
CScout refactoring browser
I realized that what was once a feat is nowadays a routine operation.
Continue reading "Cross Compiling"
NASSCOM Quality Summit 2006
Last week I attended NASSCOM's 2006 Quality Summit in Bangalore, India.
There I gave a tutorial on tooling with open source software, and
delivered a talk on Global Software Development in the FreeBSD Project.
It was an edifying trip.
Continue reading "NASSCOM Quality Summit 2006"
Hardware and Software Debugging
5.0 server I tried to run as part of a
MediaWiki installation under
crashed during initialization, and a Tomy Walkabout
digital baby monitor started emitting a low beeping sound.
I solved both cases through educated guesses.
Continue reading "Hardware and Software Debugging"
Surprising Findings on Software Reuse
Kevin DeSouza and his colleagues in a recent
article in the
Communications of the ACM published some surprising
findings regarding software reuse:
reuse happens more by novices rather than by experts,
within projects rather than across them, and in
transient teams rather than permanent ones.
The statement regarding the higher propensity of rookies to reuse
compared to older professionals rang particularly true to my ears.
Continue reading "Surprising Findings on Software Reuse"
You're searching the internet to answer a question you have,
and after some painstaking detective work you locate the answer.
Where do you store the answer for future reference?
Continue reading "Public Bookmarking"
A Tree of Mentors
In the FreeBSD project, new
committers are assigned a mentor who overlooks their work, until they
are judged to be confident enough to work on their own.
As lots of things in the open-source landscape, having a mentor is a loan,
which we should pay back by mentoring somebody else.
Continue reading "A Tree of Mentors"
A Pipe Namespace in the Portal Filesystem
The portal filesystem allows a daemon running as a userland program
to pass descriptors to processes that open files belonging to its
It has been part of the *BSD operating systems since 4.4 BSD.
I recently added a pipe namespace to its FreeBSD implementation.
This allows us to
perform scatter gather operations without using temporary files,
create non-linear pipelines, and
implement file views using symbolic links.
Continue reading "A Pipe Namespace in the Portal Filesystem"
Maintainability of the FreeBSD System
Last November Ioannis Samoladas and his colleagues published an article
in the Communications of the ACM  that compared the maintainability
of open-source versus-closed source projects.
I applied the maintainability index  they used on the FreeBSD source
repository following the code's maintainability over time, and comparing
the maintainability of different modules.
Here are the results.
Continue reading "Maintainability of the FreeBSD System"
Measuring the Effect of Shared Objects
For the Code Quality
book I am writing I wanted to measure the memory savings of
On a lightly loaded web server these amounted to 80MB,
on a more heavilly loaded shell access machine these ammounted
Continue reading "Measuring the Effect of Shared Objects"
System administration stories: The Revolt
Can a small embedded system the size of a paperback
lead a group of machines into revolt?
Continue reading "System administration stories: The Revolt"
Detective Work and Dropped TCP Connections
I had problems with TCP connections (mostly long-lasting ssh sessions)
getting dropped on my ADSL line.
In the end, I found that the problem had two different roots.
The detective work behind establishing them is, I believe, interesting.
It also shows how accessible source code, and the will to use it,
can be a tremendous boost to difficult system administration problems.
Continue reading "Detective Work and Dropped TCP Connections"
The hypot() Mystery
I was writing a section for the
followup volume, and wanted to demonstrate the pitfalls of
using homebrewn mathematical functions instead of the library
As an example, I chose to compare the C library
Continue reading "The hypot() Mystery"
sqrt(x * x, y * y).
I created a plot of "unit in last place" (ulp) error values between
the two functions, which demonstrated how the error increased for larger
values of y.
Optimizing ppp and Code Quality
While debugging a problem of my ppp connection I noticed that
ppp was apparently doing a protocol lookup (with a file open,
read, close sequence) for every packet it read.
This is an excerpt from the strace log, one of my
favourite debugging tools.
Continue reading "Optimizing ppp and Code Quality"
Binary File Similarity Checking
How can one determine whether two binary files
(for example, executable images) are somehow similar?
I started writing a program to perform this task.
Such a program could be useful for determing
whether a vendor had included GNU
Public License (GPL)
code in a propriatary product, violating the GPL license.
After writing about 20 lines, I realized that I needed an accurate
definition of similarity than the vague
"the two files contain a number of identical subsequences"
I had in mind.
Continue reading "Binary File Similarity Checking"
Software Complexity: Open Source vs Microsoft
In a readable and interesting paper titled
CyberInsecurity: the cost of a monopoly
seven notable security experts argue that the Microsoft's near monopoly
in the desktop operating system and office productivity markets is creating
a dangerous monoculture that exacerbates the effect of security vulnerabilities.
Continue reading "Software Complexity: Open Source vs Microsoft"
I became a FreeBSD committer.
I've been using BSD Unix systems
since 1986 starting with 4.3 BSD on a pair of VAX 780 machines. In
1992, as a bored PhD student, I reimplemented sed(1) and contributed it
the unencumbered BSD version that was then being put together; it is now
part of the *BSD family. I crossed again paths with BSD software when
the prize of the 2000 Usenix technical conference ``win a pet Shark
contest'', Digital's Network Appliance Reference Design-DNARD, came with
a NetBSD boot image. I used that code for drawing about 500 examples
for my book
Code Reading: The Open Source Perspective (Addison-Wesley
2003), detailing how to read software code others have written
Since 2001 I 've been using
FreeBSD to control my home's security, communications, and entertainment
systems as described in a
SANE conference paper
in Personal and Ubiquitous Computing
(as an academic I have to live by the "publish or perish" motto).
Continue reading "FreeBSD Committer"