How to avoid redoing manual corrections
Say you have an automated process to create a report, which you then have to polish by hand, because there are adjustments that require human judgment. After three hours of polishing, you realize that the report is full of errors due to a bug in the initial reporting process. Is there a way to salvage the three hours of work you put into it?
Continue reading "How to avoid redoing manual corrections"
Verifying the Substitution Cipher Folklore
A substitution cipher has each letter substituted with another.
Cryptography folklore has it that simple substitution ciphers
to break by looking at the letter frequencies of the encrypted text.
I tested the folklore and the results were not quite what I was expecting.
Continue reading "Verifying the Substitution Cipher Folklore"
The Birth of Standard Error
Earlier today Stephen Johnson, in a mailing list run by the
The Unix Heritage Society,
described the birth of the standard error concept:
the idea that a program's error output is sent on a channel
different from that of its normal output.
Over the past forty years, all major operating systems and language libraries
have embraced this concept.
Continue reading "The Birth of Standard Error"
How to Calculate an Operation's Memory Consumption
How can you determine how much memory is consumed by a specific
operation of a Unix program?
Valgrind's Massif subsystem could help you in this regard,
but it can be difficult to isolate a specific operation from
Here is another, simpler way.
Continue reading "How to Calculate an Operation's Memory Consumption"
Choosing between people you want to invite to a function and people you
have to invite is sometimes difficult.
Say Alice wants to invite Tom, Dick, and Harry to a party, but she'd actually
prefer if Dick didn't show up.
Here's how Alice can send invitations by email from an email-capable
Unix system to achieve the desired result,
while covering her scheming with plausible deniability.
Continue reading "Pretend Invitations"
Apps are the New Users
Some facilities provided by mature multi-user operating systems appear arcane today. Administrators of computers running Mac OS X or Linux can see users logged-in from remote terminals, they can specify limits on the disk space one can use, and they can run accounting statistics to see how much CPU time or disk I/O a user has consumed over a month. These operating systems also offer facilities to group users together, to specify various protection levels for each user's files, and to prescribe which commands a user can run.
Continue reading "Apps are the New Users"
Code Verification Scripts
Which of my classes contain instance variables?
Which classes call the method
Continue reading "Code Verification Scripts"
but don't call the method
These and similar questions often come up when you want to verify
that your code is free from some errors.
For example, instance variable can be a problem in servlet classes.
Or you may have found a bug related to the
and you want to look for other places where this occurs.
Your IDE is unlikely to answer such questions,
and this is where a few lines in the Unix shell can save
you hours of frustration.
Batch Files as Shell Scripts Revisited
Four years ago I wrote
about a method that could be used to have the Unix Bourne shell interpret
Windows batch files.
I'm using this trick a lot, because programming using the Windows/DOS
batch files facilities is decidedly painful, whereas the Bourne
shell remains a classy programming environment.
There are still many cases where the style of Unix shell programming
outshines and outperforms even modern scripting languages.
Continue reading "Batch Files as Shell Scripts Revisited"
Useful Polyglot Code
Four years ago I blogged about an
incantation that would allow the Windows command interpreter (cmd) to execute
Unix shell scripts written inside plain batch files.
Time for an update.
Continue reading "Useful Polyglot Code"
Tags for Bibliography References
I love writing my papers in LaTeX.
Its declarative style allows me to concentrate on the content,
rather than the form.
I even format the text according to the content,
keeping each phrase or logical unit on a separate line.
Many publishers supply style files that format the article according
to the journal's specifications.
Even better, over the years I've created
an extensive collection
I can therefore use BibTeX to cite works with a simple command,
without having to re-enter their details.
This also allows me to use style files
to format references according to the publisher's specification.
Yet, there is still the problem of navigating from a citation to
the work's details.
Here is how I solve it.
Continue reading "Tags for Bibliography References"
Applied Code Reading: Debugging FreeBSD Regex
When the code we're trying to
read is inscrutable,
inserting print statements and running various test cases can be
two invaluable tools.
Earlier today I fixed
a tricky problem in the FreeBSD regular expression library.
originally written by Henry Spencer in the early 1990s,
is by far the most complex I've ever encountered.
It implements sophisticated algorithms with minimal commenting.
Also, to avoid code repetition and increase efficiency,
the 1200 line long main part of the regular expression execution engine is
included in the compiled C code
three times after modifying various macros to adjust the code's behavior:
the first time the code targets small expressions and operates
with bit masks on long integers,
the second time the code handles larger expressions
by storing its data in arrays,
and the third time the code is also adjusted to handle multibyte characters.
Here is how I used test data and print statements to locate and fix the problem.
Continue reading "Applied Code Reading: Debugging FreeBSD Regex"
How to Create a Self-Referential Tweet
Yesterday Mark Reid
create a self-referential tweet (one that links to itself).
clarified that the
tweet should contain in its text its own identifier
(the number after "/status/" bit should be its own URL).
I decided to take up the challenge
("in order to learn a bit about the Twitter API" was my excuse),
and a few hours later I won the game by posting the first
Here is how I did it.
Continue reading "How to Create a Self-Referential Tweet"
Fixing the Orientation of JPEG Photographs
I used to fix the orientation of my photographs through an application
that would transpose the compressed JPEG blocks.
This had the advantage of avoiding the image degradation of a
decompression and a subsequent compression.
Continue reading "Fixing the Orientation of JPEG Photographs"
Parallelizing Jobs with xargs
With multi-core processors sitting idle most of the time
and workloads always increasing,
it's important to have easy ways to make the CPUs earn their money's worth.
told me today how the Unix xargs command can help in this regard.
Continue reading "Parallelizing Jobs with xargs"
A Well-Tempered Pipeline
I am studying the use of open source software in industry.
One way to obtain empirical data is to look at the operating systems and
browsers used by the Fortune 1000 companies by examining browser logs.
I obtained a list of the Fortune 1000 domains and wrote a pipeline
to summarize results by going through this site's access logs.
Continue reading "A Well-Tempered Pipeline"
Monitor Process Progress on Unix
I often run file-processing commands that take many hours to
finish, and I therefore need a way to monitor their progress.
The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple
of years back in 1993 (don't ask)
had a facility that displayed for any executing
command the percentage of work that was completed.
When I first saw this facility working on the programs I maintained,
I couldn't believe my eyes, because I was sure that those rusty
Cobol programs didn't contain any functionality to monitor their progress.
Continue reading "Monitor Process Progress on Unix"
Unzipping Files in Order
Over the past couple of years I've enjoyed listening to the
audio edition of the
The material is superb
(although I occasionally get the feeling of listening to the
Voice of America),
the articles are read in a clear voice,
the data's encoding is plain MP3,
unencumbered by digital rights (restrictions) management silliness,
and the audio format is convenient to listen on the metro or while jogging.
Unfortunately, the articles in the audio edition's zip file are
haphazardly ordered, which, until today, marred the enjoyment of my listening.
Continue reading "Unzipping Files in Order"
A Child's Crontab
When the time to go to sleep is approaching,
all children seem to be configured with the same crontab.
Continue reading "A Child's Crontab"
Over the past few days I worked over a large code body correcting various
accumulated errors and style digressions.
When I finished I wanted to see who wrote the original lines.
(It turned out I was not entirely innocent.)
Continue reading "Assigning Responsibility"
The Treacherous Power of Extended Regular Expressions
I wanted to filter out lines containing the word "line" or a double quote
from a 1GB file.
This can be easily specified as an extended regular expression,
but it turns out that I got more than I bargained for.
Continue reading "The Treacherous Power of Extended Regular Expressions"
Breaking into a Virtual Machine
Say you're running your business on a rented
virtual private server.
How secure is your setup?
I wouldn't expect it to be more secure than the system your server runs
on, and a simple experiment confirmed it.
Continue reading "Breaking into a Virtual Machine"
Make vs Ant: Observability
I've long felt uncomfortable with ant
as a build management tool.
I thought that my uneasiness stemmed from the verbose XML used for
describing tasks, and the lack of default dependency resolution.
Today, email from a UMLGraph user
struggling with a complex ant task
made me realize another problem:
lack of observability.
Continue reading "Make vs Ant: Observability"
Cracking Software Reuse
[Newton] said, "If I have seen further than others, it is because I've stood on the shoulders of giants." These days we stand on each other's feet!
— Richard Hamming
Sometimes we encounter ideas that inspire us for life. For me, this was a Unix command pipeline I came across in the '80s:
Continue reading "Cracking Software Reuse"
Batch Files as Shell Scripts
Although the Unix Bourne shell offers a superb environment for combining
existing commands into sophisticated programs, using a Unix shell
as an interactive command environment under Windows can be painful.
Continue reading "Batch Files as Shell Scripts"
Efficiency Will Always Matter
Many claim that today's fast CPUs and large memory capacities make
time-proven technologies that efficiently harness a computer's power irrelevant.
I beg to differ, and my experience in the last three days demonstrated
that technologies that originated in the 70s still have their place today.
Continue reading "Efficiency Will Always Matter"
A Clash of Two Cultures
I dug the following gem from the Usenix
HotOS X Conference
Panel titled "Do we work within existing frameworks or start from scratch?",
summarized by Prashanth Bungale.
Continue reading "A Clash of Two Cultures"
Working with Unix Tools
A successful [software] tool is one that was used to do something undreamed of by its author.
— Stephen C. Johnson
Line-oriented textual data streams are the lowest useful common denominator for a lot of data that passes through our hands. Such streams can be used to represent program source code, web server log data, version control history, file lists, symbol tables, archive contents, error messages, profiling data, and so on. For many routine, everyday tasks, we might be tempted to process the data using a Swiss army knife scripting language, like Perl, Python, or Ruby. However, to do that we often need to write a small, self-contained program and save it into a file. By that point we've lost interest in the task, and end-up doing the work manually, if at all. Often, a more effective approach is to combine programs of the Unix toolchest into a short and sweet pipeline that we can run from our shell's command prompt. With the modern shell command-line editing facilities we can build our command bit by bit, until it molds into exactly the form that suits us. Nowadays, the original Unix tools are available on many different systems, like GNU/Linux, Mac OS X, and Microsoft Windows, so there's no reason why you shouldn't add this approach to your arsenal.
Continue reading "Working with Unix Tools"
Tool Writing: A Forgotten Art?
Merely adding features does not make it easier for users to do things—it just makes the manual thicker. The right solution in the right place is always more effective than haphazard hacking.
— Brian W. Kernighan and Rob Pike
In 1994 Chidamber and Kemerer defined a set of six simple metrics for object-oriented programs. Although the number of object-oriented metrics swelled to above 300 in the years that followed, I had a case where I preferred to use the original classic metric set for clarity, consistency, and simplicity. Surprisingly, none of the six open-source tools I found and tried to use fitted the bill. Most tools calculated only a subset of the six metrics, some required tweaking to make them compile, others had very specific dependencies on other projects (for example Eclipse), while others were horrendously inefficient. Although none of the tools I surveyed managed to calculate correctly the six classic Chidamber and Kemerer metrics in a straightforward way, most of them included numerous bells and whistles, such as graphical interfaces, XML output, and bindings to tools like ant and Eclipse.
Continue reading "Tool Writing: A Forgotten Art?"
A Pipe Namespace in the Portal Filesystem
The portal filesystem allows a daemon running as a userland program
to pass descriptors to processes that open files belonging to its
It has been part of the *BSD operating systems since 4.4 BSD.
I recently added a pipe namespace to its FreeBSD implementation.
This allows us to
perform scatter gather operations without using temporary files,
create non-linear pipelines, and
implement file views using symbolic links.
Continue reading "A Pipe Namespace in the Portal Filesystem"
XML Versus Text Files
package dependency analyzer can output its results
either as XML or as plain text.
Instead of using the XML output,
I found myself processing the text output using awk.
Am I becoming tied to old-world thinking,
or are text files easier to process?
Continue reading "XML Versus Text Files"
System administration stories: The Revolt
Can a small embedded system the size of a paperback
lead a group of machines into revolt?
Continue reading "System administration stories: The Revolt"
A Unix-based Logic Analyzer
A circuit I was designing was behaving in unexpected ways:
the output of a wireless serial receiver based on Infineon's TDA5200
was refusing to drive an LS TTL load.
To debug the problem I needed an oscilloscope or a logic analyzer,
but I had none.
I searched the web and located
software to convert the PC's parallel port to a logic analyzer.
I downloaded the 900K program, but that was not the end.
Unfortunately the design of Windows 2000 does not allow direct access
to the I/O ports, so I also downloaded
a parallel port device driver and a program to give the appropriate privileges to other
Finally, I also downloaded from a third site the Borland runtime libraries
required by the logic analyzer.
Needless to say that the combination refused to work.
Continue reading "A Unix-based Logic Analyzer"