Posts Tagged Linux

 

Seven reasons to add Unix command line expertise to your tool chest

On Tuesday March 17th 2020 my free online massive open online course (MOOC) on the use of Unix command line tools for data, software, and production engineering goes live on the edX platform. Already more than one thousand participants from around the world have registered for it; you should still be able to enroll through this link. In response to the course’s announcement seasoned researchers from around the world have commented that this is an indispensable course and that it is very hard to beat the ROI of acquiring this skillset, both for academia and industry. In an age of shiny IDEs and cool GUI tools, what are the reasons for the enduring utility and popularity of the Unix command line tools? Here’s my take.

Continue reading "Seven reasons to add Unix command line expertise to your tool chest"

How to Perform Set Operations on Terabyte Files

The Unix sort command can efficiently handle files of arbitrary size (think of terabytes). It does this by loading into main memory all the data that can fit into it (say 16GB), sorting that data efficiently using an O(N log N) algorithm, and then merge-sorting the chunks with a linear complexity O(N) cost. If the number of sorted chunks is higher than the number of file descriptors that the merge operation can simultaneously keep open (typically more than 1000), then sort will recursively merge-sort intermediate merged files. Once you have at hand sorted files with unique elements, you can efficiently perform set operations with them through linear complexity O(N) operations. Here is how to do it.

Continue reading "How to Perform Set Operations on Terabyte Files"

Monitor Process Progress on Unix

I often run file-processing commands that take many hours to finish, and I therefore need a way to monitor their progress. The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple of years back in 1993 (don't ask) had a facility that displayed for any executing command the percentage of work that was completed. When I first saw this facility working on the programs I maintained, I couldn't believe my eyes, because I was sure that those rusty Cobol programs didn't contain any functionality to monitor their progress.

Continue reading "Monitor Process Progress on Unix"

Open and Closed Source Kernels Go Head to Head

Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines), and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I’ve made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or looser, but there were interesting differences in specific areas.

Continue reading "Open and Closed Source Kernels Go Head to Head"

The Treacherous Power of Extended Regular Expressions

I wanted to filter out lines containing the word "line" or a double quote from a 1GB file. This can be easily specified as an extended regular expression, but it turns out that I got more than I bargained for.

Continue reading "The Treacherous Power of Extended Regular Expressions"

What Can System Administrators Learn from Programmers?

Although we often hear about program bugs and techniques to get rid of them, we seldom see a similar focus in the field of system administration. This is unfortunate, because increasingly the reliability of an IT system depends as much on the software comprising the system as on the support infrastructure hosting it.

Continue reading "What Can System Administrators Learn from Programmers?"

Code Reading Example: the Linux Kernel Load Calculation

A colleague's Linux machine was exhibiting a very high load value, for no obvious reason. I wanted to make him point the kernel debugger on the routine calculating the load. It has been more than 7 years since the last time I worked on a Linux kernel, so I had to find my way around from first principles. This is an annotated and slightly edited version of what I did.

Continue reading "Code Reading Example: the Linux Kernel Load Calculation"

Become a Unix command line wizard
edX MOOC on Unix Tools: Data, Software, and Production Engineering
Debug like a master
Book cover of Effective Debugging
Compute with style
Book cover of The Elements of Computing Style
Syndication
This blog is also available as an RSS feed:

Category Tags
AI (4)
AWS (4)
Android (2)
Apple (11)
C (21)
C++ (17)
Computers (58)
Databases (5)
Debugging (10)
Discussion (6)
Electronics (15)
Environment (1)
FreeBSD (26)
Funny (14)
GSIS (5)
Git (2)
Google (6)
Government (3)
Hacks (26)
Hardware (27)
History (13)
Information systems (1)
Internet (12)
Java (26)
JavaScript (1)
Linux (7)
Management (27)
Microsoft (11)
One Laptop Per Child (3)
Open source (58)
Opinion (30)
Parenting (11)
Perl (13)
Photos (13)
Politics (5)
Programming (110)
Python (3)
R (1)
Raspberry Pi (6)
Risks (7)
Scala (1)
Science (34)
Security (26)
Sights (19)
Smartphones (3)
Software (22)
Software engineering (93)
Standards (7)
System administration (46)
Teaching (9)
Technology (33)
Testing (3)
Tips (43)
Tools of the Trade (52)
Travel (9)
UML (6)
Unix (52)
Web (31)
Windows (17)
Writing (45)
XML (10)
vim (5)
Archive
Complete contents (380)
2024 (1)
2023 (5)
2022 (2)
2021 (3)
2020 (15)
2019 (4)
2018 (5)
2017 (20)
2016 (7)
2015 (6)
2014 (5)
2013 (13)
2012 (17)
2011 (14)
2010 (13)
2009 (40)
2008 (40)
2007 (41)
2006 (48)
2005 (44)
2004 (30)
2003 (7)

Last update: Monday, February 5, 2024 5:49 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.