Java Stream Methods and Unix Pipeline Commands: A Dictionary

While preparing my class notes for functional programming in Java I was struck between the neat correspondence between many Java Stream methods and Unix commands. I decided to organize the most common of these in a dictionary form that allows the mapping between the two. I’d very much welcome comments regarding common patterns that I’ve missed.

Continue reading "Java Stream Methods and Unix Pipeline Commands: A Dictionary"

Last modified: Thursday, December 6, 2018 9:42 pm

Debugging had to be discovered!

I start my Communications of the ACM article titled Modern debugging techniques: The art of finding a needle in a haystack (accessible from this page without a paywall) with the following remarkable quote. “As soon as we started programming, […] we found to our surprise that it wasn’t as easy to get programs right as we had thought it would be. […] Debugging had to be discovered. I can remember the exact instant […] when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.” A Google search for this phrase returns close to 3000 results, but most of them are cryptically attributed as “Maurice Wilkes, discovers debugging, 1949”. For a scholarly article I knew I had to do better than that.

Continue reading "Debugging had to be discovered!"

Last modified: Friday, November 16, 2018 5:38 pm

How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands

I was trying to run a simple join query on MariaDB (MySQL) and its performance was horrendous. Here’s how I cut down the query’s run time from over 380 hours to under 12 hours by executing part of it with two simple Unix commands.

Continue reading "How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands"

Last modified: Sunday, August 5, 2018 8:20 pm

How to Perform Set Operations on Terabyte Files

The Unix sort command can efficiently handle files of arbitrary size (think of terabytes). It does this by loading into main memory all the data that can fit into it (say 16GB), sorting that data efficiently using an O(N log N) algorithm, and then merge-sorting the chunks with a linear complexity O(N) cost. If the number of sorted chunks is higher than the number of file descriptors that the merge operation can simultaneously keep open (typically more than 1000), then sort will recursively merge-sort intermediate merged files. Once you have at hand sorted files with unique elements, you can efficiently perform set operations with them through linear complexity O(N) operations. Here is how to do it.

Continue reading "How to Perform Set Operations on Terabyte Files"

Last modified: Tuesday, April 3, 2018 8:44 pm

The Shoemaker’s Children Go Barefoot

Earlier today I submitted the camera-ready version of a technical briefing on mining Git repositories, which Georgios Gousios and I will be presenting at the 2018 International Conference on Software Engineering. I was struck by the complexity and inefficiency of the administrative process.

Continue reading "The Shoemaker’s Children Go Barefoot"

Last modified: Tuesday, February 13, 2018 10:09 am

Navigation

blog contents
dds blog
dds home

Become a Unix command line wizard

edX MOOC on Unix Tools: Data, Software, and Production Engineering

Debug like a master

Compute with style

Book cover of The Elements of Computing Style

Syndication

This blog is also available as an RSS feed:

Recent posts

Is it legal to use copyrighted works to train LLMs? (2025-06-26)
I’m removing the BSD advertising clause (2025-05-20)
The perils of GenAI student submissions (2025-04-11)
Unix make vs Apache Airflow (2024-10-15)
How (and how not) to present related work (2024-08-05)
An exception handling revelation (2024-02-05)
Extending the life of TomTom wearables (2023-09-01)
How AGI can conquer the world and what to do about it (2023-04-13)
Twitter’s overrated dissemination capacity (2023-04-02)
The hypocritical call to pause giant AI (2023-03-30)

Category Tags

AI (6)
AWS (4)
Android (2)
Apple (11)
C (21)
C++ (17)
Computers (58)
Databases (6)
Debugging (10)
Discussion (6)
Electronics (15)
Environment (1)
FreeBSD (26)
Funny (14)
GSIS (5)
Git (2)
Google (6)
Government (3)
Hacks (26)
Hardware (27)
History (13)
Information systems (1)
Internet (12)
Java (26)
JavaScript (1)
Linux (7)
Management (27)
Microsoft (11)
One Laptop Per Child (3)
Open source (59)
Opinion (30)
Parenting (11)
Perl (13)
Photos (13)
Politics (5)
Programming (110)
Python (3)
R (1)
Raspberry Pi (6)
Risks (7)
Scala (1)
Science (35)
Security (26)
Sights (19)
Smartphones (3)
Software (22)
Software engineering (93)
Standards (7)
System administration (46)
Teaching (10)
Technology (33)
Testing (3)
Tips (43)
Tools of the Trade (52)
Travel (9)
UML (6)
Unix (53)
Web (31)
Windows (17)
Writing (47)
XML (10)
vim (5)

Posts in 2018

Java Stream Methods and Unix Pipeline Commands: A Dictionary

Debugging had to be discovered!

How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands

How to Perform Set Operations on Terabyte Files

The Shoemaker’s Children Go Barefoot