Seven reasons to add Unix command line expertise to your tool chest

On Tuesday March 17th 2020 my free online massive open online course (MOOC) on the use of Unix command line tools for data, software, and production engineering goes live on the edX platform. Already more than one thousand participants from around the world have registered for it; you should still be able to enroll through this link. In response to the course’s announcement seasoned researchers from around the world have commented that this is an indispensable course and that it is very hard to beat the ROI of acquiring this skillset, both for academia and industry. In an age of shiny IDEs and cool GUI tools, what are the reasons for the enduring utility and popularity of the Unix command line tools? Here’s my take.

Continue reading "Seven reasons to add Unix command line expertise to your tool chest"

Last modified: Monday, March 16, 2020 0:34 am

How to Perform Set Operations on Terabyte Files

The Unix sort command can efficiently handle files of arbitrary size (think of terabytes). It does this by loading into main memory all the data that can fit into it (say 16GB), sorting that data efficiently using an O(N log N) algorithm, and then merge-sorting the chunks with a linear complexity O(N) cost. If the number of sorted chunks is higher than the number of file descriptors that the merge operation can simultaneously keep open (typically more than 1000), then sort will recursively merge-sort intermediate merged files. Once you have at hand sorted files with unique elements, you can efficiently perform set operations with them through linear complexity O(N) operations. Here is how to do it.

Continue reading "How to Perform Set Operations on Terabyte Files"

Last modified: Tuesday, April 3, 2018 8:44 pm

Monitor Process Progress on Unix

I often run file-processing commands that take many hours to finish, and I therefore need a way to monitor their progress. The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple of years back in 1993 (don't ask) had a facility that displayed for any executing command the percentage of work that was completed. When I first saw this facility working on the programs I maintained, I couldn't believe my eyes, because I was sure that those rusty Cobol programs didn't contain any functionality to monitor their progress.

Continue reading "Monitor Process Progress on Unix"

Last modified: Monday, October 27, 2008 1:34 pm

Open and Closed Source Kernels Go Head to Head

Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines), and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I�ve made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or looser, but there were interesting differences in specific areas.

Continue reading "Open and Closed Source Kernels Go Head to Head"

Last modified: Friday, May 16, 2008 1:44 am

The Treacherous Power of Extended Regular Expressions

I wanted to filter out lines containing the word "line" or a double quote from a 1GB file. This can be easily specified as an extended regular expression, but it turns out that I got more than I bargained for.

Continue reading "The Treacherous Power of Extended Regular Expressions"

Last modified: Tuesday, August 28, 2007 10:37 am

What Can System Administrators Learn from Programmers?

Although we often hear about program bugs and techniques to get rid of them, we seldom see a similar focus in the field of system administration. This is unfortunate, because increasingly the reliability of an IT system depends as much on the software comprising the system as on the support infrastructure hosting it.

Continue reading "What Can System Administrators Learn from Programmers?"

Last modified: Sunday, July 23, 2006 0:10 am

Code Reading Example: the Linux Kernel Load Calculation

A colleague's Linux machine was exhibiting a very high load value, for no obvious reason. I wanted to make him point the kernel debugger on the routine calculating the load. It has been more than 7 years since the last time I worked on a Linux kernel, so I had to find my way around from first principles. This is an annotated and slightly edited version of what I did.

Continue reading "Code Reading Example: the Linux Kernel Load Calculation"

Last modified: Thursday, November 25, 2004 9:40 am

Navigation

blog contents
dds blog
dds home

Become a Unix command line wizard

edX MOOC on Unix Tools: Data, Software, and Production Engineering

Debug like a master

Compute with style

Book cover of The Elements of Computing Style

Syndication

This blog is also available as an RSS feed:

Recent posts

Is it legal to use copyrighted works to train LLMs? (2025-06-26)
I’m removing the BSD advertising clause (2025-05-20)
The perils of GenAI student submissions (2025-04-11)
Unix make vs Apache Airflow (2024-10-15)
How (and how not) to present related work (2024-08-05)
An exception handling revelation (2024-02-05)
Extending the life of TomTom wearables (2023-09-01)
How AGI can conquer the world and what to do about it (2023-04-13)
Twitter’s overrated dissemination capacity (2023-04-02)
The hypocritical call to pause giant AI (2023-03-30)

Category Tags

AI (6)
AWS (4)
Android (2)
Apple (11)
C (21)
C++ (17)
Computers (58)
Databases (6)
Debugging (10)
Discussion (6)
Electronics (15)
Environment (1)
FreeBSD (26)
Funny (14)
GSIS (5)
Git (2)
Google (6)
Government (3)
Hacks (26)
Hardware (27)
History (13)
Information systems (1)
Internet (12)
Java (26)
JavaScript (1)
Linux (7)
Management (27)
Microsoft (11)
One Laptop Per Child (3)
Open source (59)
Opinion (30)
Parenting (11)
Perl (13)
Photos (13)
Politics (5)
Programming (110)
Python (3)
R (1)
Raspberry Pi (6)
Risks (7)
Scala (1)
Science (35)
Security (26)
Sights (19)
Smartphones (3)
Software (22)
Software engineering (93)
Standards (7)
System administration (46)
Teaching (10)
Technology (33)
Testing (3)
Tips (43)
Tools of the Trade (52)
Travel (9)
UML (6)
Unix (53)
Web (31)
Windows (17)
Writing (47)
XML (10)
vim (5)

Posts Tagged Linux