On Tuesday March 17th 2020 my free online massive open online course (MOOC) on the use of Unix command line tools for data, software, and production engineering goes live on the edX platform. Already more than one thousand participants from around the world have registered for it; you should still be able to enroll through this link. In response to the course's announcement seasoned researchers from around the world have commented that this is an indispensable course and that it is very hard to beat the ROI of acquiring this skillset, both for academia and industry. In an age of shiny IDEs and cool GUI tools, what are the reasons for the enduring utility and popularity of the Unix command line tools? Here's my take.
By combining a couple of Unix command line into a pipeline you can quickly obtain answers to questions that crop up in software development, production engineering (or site reliability engineering or IT operations depending on where you work), and business analytics. In most cases such answers aren't directly available from existing applications and other methods for obtaining them would require you to write a dedicated program or script. Consider the following examples.
acquire_resource()doesn't match the number of calls to
Most Unix tools work as filters, which means they rarely require to hold in memory more than a few lines of text. Consequently, they can handle arbitrary (read petabytes) amounts of data, without a problem. Unix tools that do require in-memory processing, such as sort, have been carefully and painstakingly engineered to handle vast data sets. Specifically, the Unix sort command will use temporary files to sort batches in memory and then merge-sort the intermediate files. In addition, sort will also apply the merge sorting technique recursively when it runs out of file descriptors for accessing the temporary files. (With 10GB intermediate files and 1024 file descriptors, this will happen when sorting more than 10TB of data.)
You can use your Unix skills on systems ranging from a $5 Raspberry Pi Zero to the world's largest supercomputers. In between you'll find cloud platforms, embedded devices, such as routers, set-top boxes, PBXs, network equipment, and TVs, and even your Android phone.
Non-trivial Unix commands consist of a few programs connected together through pipes, in a form where the output of one command feeds another; for example, fetch a file from the web, uncompress it, and find records satisfying some condition. On modern systems the corresponding commands are automatically allocated to multiple processor cores, exploiting your processor's power. In other cases you can process data chunks in parallel just by passing the
-P argument to the xargs program or by invoking GNU parallel. In most other languages or applications this natural and effortless parallelization is either impossible or can only be achieved through blood and tears.
Unix tools aren't tied to a specific platform, application, or language ecosystem. In fact most specialized applications also offer a command-line interface. For example, if you want to convert an SVG diagram into PNG or PDF you can readily do that by invoking the amazing Inkscape design tool with the suitable command-line options. (I used this a couple of years ago to automatically send beautiful customized certificates to hundreds of IEEE Software reviewers.) As another example, if you want to analyze JVM bytecode you can run the javap command.
Nowadays, the Unix tools are available as free or open source software through efforts such as GNU/Linux, FreeBSD, NetBSD, and OpenBSD. They are also one click away on Apple's macOS systems and can be easily installed on Windows 10 through Microsoft's Windows Subsystem for Linux or through CygWin. In addition many hardware vendors offer and support their own bespoke version of Unix.
The power, simplicity, and elegance of the Unix tools has allowed them to grow and mature over the past half century. New tools and applications are developed every day. Their effective use will continue to be an essential skill for advanced software developers, system engineers, data analysts, and researchers. Given their wide availability and applicability, I've seen that the payoff associated with learning and using them increases continuously and will last a lifetime.Read and post comments, or share through
Last modified: Monday, March 16, 2020 0:34 am
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.