Java Stream Methods and Unix Pipeline Commands: A Dictionary

 

While preparing my class notes for functional programming in Java I was struck between the neat correspondence between many Java Stream methods and Unix commands. I decided to organize the most common of these in a dictionary form that allows the mapping between the two. I’d very much welcome comments regarding common patterns that I’ve missed.

Java stream processing and Unix pipelines share many traits.

  • No intermediate storage is needed for processing data
  • Functional processing: the original data is not modified
  • Lazy operations: no more data than what is needed are processed (on Unix this is made transparently possible through the SIGPIPE signal)
  • An indefinite (possibly infinite) number of elements can be processed
  • Operations can be parallelized

Compared to Java streams Unix pipelines have two advantages

  • Operations on multiple streams are possible with commands such as join, comm, paste and shell extensions such as Bash’s process substitution and dgsh
  • Non-homogeneous binary data can be easily processed though toolsets such as SoX, NetPBM, FFmpeg, OpenSSL and diverse compression programs.

On the other hand, compared to Unix pipelines, Java streams can efficiently process homogeneous streams of binary objects through custom-build functions.

Without further ado, here is the mapping between Java stream methods and Unix pipeline commands, divided into sources, intermediate methods (filters), and terminal methods.

Stream Sources

Java’s stream sources generate stream data from other objects. Many Unix commands produce an output stream from files, the filesystem, or databases.

Java Stream Methods Unix Pipeline Commands
BufferedReader.lines() cat or curl
Files.list() ls
Files.find(Path start, ...) find
IntStream.range(int, int) seq first last
Arrays.stream(Object[]) dd
JarFile.stream() jar tv or tar tv
Random.ints() shuf -i
Collection.stream() Database CLI, e.g. mysql
Stream.concat() cat

Intermediate Stream Methods

Java’s intermediate stream methods are the equivalent of typical Unix filters: they process stream data generating another stream as output.

Java Stream Methods Unix Pipeline Commands
filter(Predicate predicate) grep RE or awk 'predicate'
map(Function mapper) sed 's/RE/text/' or awk '{print ...}' or tr or cut or recode or rev or …
distinct() uniq
sorted() sort
parallel() xargs -P or parallel
peek(Consumer action) tee >(...)
limit(long maxSize) head
skip(long n) tail
takeWhile(Predicate predicate) sed '!/RE/q' or awk '{print} predicate{exit}'

Terminal Stream Methods

Java’s terminal stream methods consume a stream generating a result. The corresponding Unix commands produce the result as a single output line, or, for Boolean values, as their exit code.

Java Stream Methods Unix Pipeline Commands
void forEach(Consumer action) xargs or while read x ; do ... done
Object[] toArray() dd
T reduce(BinaryOperator accumulator) uniq -c or awk '{...} END {print ...}'
long count() wc
boolean anyMatch(Predicate predicate) grep -q RE
boolean allMatch(Predicate predicate) awk 'BEGIN {s = 0} ! predicate{s = 1} END {exit s}'
boolean noneMatch(Predicate predicate) ! grep -q RE
Optional findFirst() sed -n '/RE/ { p; q; }'

Comments   Toot! Share


Last modified: Thursday, December 6, 2018 9:42 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.