An Embarrassing Failure

My colleague Georgios Gousios and I are studying the impact of software engineering research in practice. As part of our research, we identified award-winning and highly-cited papers, and asked their authors to complete an online survey. Each survey was personalized with the author's name and the paper's title and publication venue. After completing a trial and a pilot run, I decided to contact the large number of remaining authors. This is when things started going horribly wrong.

Who are the Publishers of Computer Science Research?

To answer this question, I downloaded the DBLP database and used the DOI publisher prefix of each publication to determine its publisher. I grouped the 3.4 million entries by publisher and joined the numeric prefixes with the publisher names available in the list of Crossref members. Based on these data, here is a pie chart of the major publishers of computer science research papers.

The Origins of Malloc

The 1973 Fourth Edition Unix kernel source code contains two routines, malloc and mfree, that manage the dynamic allocation and release of main memory blocks for in-memory processes and of continuous disk swap area blocks for swapped-out processes. Their implementation and history can teach us many things regarding modern computing.

Of BOOL and stdbool

The C99 standard has added to the C programming language a Boolean type, _Bool and the bool alias for it. How well does this type interoperate with the Windows SDK BOOL type? The answer is, not at all well, and here's the complete story.

Debugging in Practice: dgsh Issue 85

Fixing an insidious bug in the new Unix directed graph shell dgsh allowed me to demonstrate in practice 10 of the 66 principles, techniques, and tools I describe in the book Effective Debugging. Almost all steps all documented in the corresponding issue and commits. Here's a detailed retrospective.

Display Git's and Current Directory on Terminal Bar

I typically have more than ten windows open on my desktop and rely on their names to select them. Being a command-line aficionado, most of them are terminals. I have them configured to display the current directory by setting the bash PROMPT_COMMAND environment variable to 'printf "\033]0;%s:%s\007" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}"'. The problem is that the directory I'm often in has a generic name, such as src or doc, so the terminal's name isn't very useful.

Impact Factor of Computer Science Journals 2016

Clarivate Analytics (ex Thomson Reuters, ex ISI) has published the 2016 InCites Journal Citation Reports. Following similar studies I have performed in the past, here is my analysis of the current status and trends for the impact factor (IF) of computer science journals.

Modular SQL Queries with Unit Tests

I'm sure I'm not the only person on earth facing a complex and expensive analytical processing task. The one I've been working on for the past couple of years, runs on the GHTorrent 98.5 GB data set of GitHub process data. It comprises 99 SQL queries (2599 lines of SQL code in total) and takes more than 20 hours to run on a hefty server. To make the job's parts run efficiently and reliably I implemented simple-rolap, a bare-bones relational online analytical processing tool suite. To ensure the queries produce correct results, I wrote RDBUnit, a unit testing framework for relational database queries. Here is a quick overview on how to use the two.

Open Collaboration at Eclipse

The International Conference on Software Engineering is the premier research conference on the topic. This year it began with a keynote address by the Eclipse Foundation Executive Director, Mike Milinkovich, on Open Collaboration: The Eclipse Way.

Unix Architecture Evolution Diagrams

Today I put online two diagrams depicting the architecture of the Unix operating system, one for the 1972 First Research Edition and one for FreeBSD, one of its direct descendants. Here are the details on how I created these diagrams.

