Newsgroup: comp.risks


Received: from hermes.aegean.gr ([195.251.128.2]) by eupalinos.samos.aegean.gr with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
id WQ3X1Z0Q; Wed, 28 Nov 2001 13:01:04 +0200
Received: by hermes.aegean.gr with Internet Mail Service (5.5.2653.19)
id <XXR86X5F>; Wed, 28 Nov 2001 13:02:57 +0200
Received: from aueb.gr (hermes.aueb.gr [195.251.255.142]) by hermes.aegean.gr with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
id XXR86X5B; Wed, 28 Nov 2001 13:02:53 +0200
Received: from quarter.csl.sri.com (quarter.csl.sri.com [130.107.1.30])
by aueb.gr (8.8.5/8.8.5) with ESMTP id NAA07281
for <dds@aueb.gr>; Wed, 28 Nov 2001 13:04:00 +0200 (EET)
Received: from quarter.csl.sri.com (localhost [127.0.0.1])
by quarter.csl.sri.com (8.12.1/8.12.1) with SMTP id fAS0c04G028683;
Tue, 27 Nov 2001 17:02:29 -0800
Received: from chiron.csl.sri.com (chiron.csl.sri.com [130.107.15.74])
by quarter.csl.sri.com (8.12.1/8.12.1) with ESMTP id fAS0bQ1k028641
for <risks-resend@csl.sri.com>; Tue, 27 Nov 2001 16:37:26 -0800
Received: (from risko@localhost)
by chiron.csl.sri.com (8.11.2/8.8.7) id fAS0bQi03423
for risks-resend; Tue, 27 Nov 2001 16:37:26 -0800
From: RISKS List Owner <risko@csl.sri.com>
Date: Tue, 27 Nov 2001 16:37:26 PST
precedence: bulk
Subject: Risks Digest 21.79
To: risks@csl.sri.com
Message-ID: <CMM.0.90.4.1006907846.risko@chiron.csl.sri.com>
Precedence: bulk
Sender: risks-owner@csl.sri.com
RISKS-LIST: Risks-Forum Digest  Tuesday 27 November 2001  Volume 21 : Issue 79

   FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

***** See last item for further information, disclaimers, caveats, etc. *****
This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.79.html>
and by anonymous ftp at ftp.sri.com, cd risks .

  Contents:
Harry Potter related risks (Richard Akerman)
Phone banking hiccups (Geoffrey Brent)
Risks of the space character in Unix filenames (Diomidis Spinellis)
FBI: home-grown terrorists (Scrounger)
Misdirected criticism of Google (Chris Adams, Gary McGraw)
Re: Mobile phone jamming (Markus Kuhn)
Re: Stupid virus filters (Leonard Erickson)
Re: Let's get really paranoid about e-mail and spam (Skip La Fetra)
REVIEW: "The CISSP Study Guide", Ronald L. Krutz/Russell Dean Vines 
  (Rob Slade)
Abridged info on RISKS (comp.risks)

----------------------------------------------------------------------
[...]

Date: Thu, 22 Nov 2001 23:50:39 +0300
From: Diomidis Spinellis <dds@aueb.gr>
Subject: Risks of the space character in Unix filenames

The root of the problem reported in the "Glitch in iTunes Deletes Drives
(Solomon, RISKS-21.74)" article is the default way the Unix shell handles
filenames with embedded spaces.  Although a space can legally appear in a
Unix filename, such an occurrence is not usual; Unix filenames tend to be
terse, often even shorter than a single word, (e.g. "src", "doc", "etc",
"bin") so they can be swiftly typed.  A number of more recent and supposedly
user-friendly operating systems like the Microsoft Windows family, and, I
understand, the MacOS, use longer and more descriptive file names
("Documents and Settings", "Program Files").  Many of these filenames
contain spaces; the ones I listed are by default used by Windows 2000 as the
location to store user data and application files (the equivalent of
/home/username and /bin under Unix).

As Unix-style tools and relevant applications are increasingly ported to run
under Windows (see for example [1, 2, 3] and my Windows outwit tool suite
described in [4]) or natively run under Mac OS X, problems and associated
Risks arise.  The main reason is that some often-used Unix shell constructs
fail when applied to filenames containing a space character.  Unfortunately,
these constructs appear in many existing programs, and even in the writings
of the original system developers, who, in all fairness, could not have
foreseen how their tools would have been used 25 years after their
conception.

Technically, the problem manifests itself when field splitting (the process
by which the shell splits input into words) is naively applied on the output
of an expansion that generates filenames with embedded spaces.  Consider the
following example, appearing on page 95 of one of the classic texts on Unix
programming [5]:
  for i in ch2.*
    echo $i:
    diff -b old/$i $i
    echo
  done

The above code will compare all files matching the ch2.* pattern in the
current directory with copies presumably stored in the directory called
"old".  Consider what will happen when the code is applied to a file called
"ch2.figure 3.dot" (notice the space between the word figure and the "3").
The shell variable i will be set to the correct filename, but then the shell
will execute the "diff" command with the following argument list
(customarily passed to C programs in the argv array):
  argv[0] = "diff"
  argv[1] = "-b"
  argv[2] = "old/ch2.figure"
  argv[3] = "3.dot"
  argv[4] = "ch2.figure"
  argv[5] = "3.dot"

and diff will complain
  diff: extra operand
as more than two filenames were passed as arguments.  This happens,
because words are expanded by most Unix shells in the following order:
  1. Parameter (including variable) expansion, command substitution.
  2. Field splitting.
As a result, the variable $i is first expanded into "ch2.figure 3" and
then the result is split into fields for further processing or for
passing them as arguments to a command. 

The most common dangerous constructs that can appear in step 1 are variable
references (e.g. $PATH, $word) and commands inside backquotes (e.g. `find
. -type f -name 'ch2.*'`).  These dangerous constructs are quite common,
appearing among other places in the original article describing the Bourne
shell [6] (for i in * do if test -d $d/$i [...]), in other scripts in the
reference of the original example [5 p. 141, 143], and even in quite recent
work by the same authors [7, p. 149].  It is also prevalent in existing
operating system tools; I counted 43 occurrences of one suspicious pattern
("$*") in a NetBSD source tree, 8 in a FreeBSD command path, and 49 in the
shell scripts of a Mandrake Linux distribution.  The Unix world is
definitely not ready to deal with filenames containing the space character.

Avoiding this problem is not trivial.  A radical solution would be to change
the value of the shell's "internal field separator" (IFS) variable.  This
variable contains the characters that shell uses to split words.  Its
default value is "<space><tab><newline">.  This solution however would break
more things than it would fix, since most scripts expect words to be
separated by spaces.  As an example the construct "A='ls -l';$A" would not
work.  The most practical solution is to manually enclose variables inside
double quotes when using them in contexts where only a single word is
normally expected.  The shell will still expand the variable inside the
quotes, but will treat the result as a single word.  Thus the offending part
in the original example should have been written as:
  diff -b "old/$i" "$i"
In addition, whenever a shell script uses the variable $* to obtain the
values of all parameters passed to a script, the $* variable should be
replaced by the variable $@, again inside double quotes.  Thus the
common code pattern
  for arg in $*
should be written as
  for arg in "$@"
Interestingly, Kernighan and Pike were aware of the $* problem and the above
solution since 1984; they aptly characterize the "$@" solution as "almost
black magic" [5 p. 161].

Still, these changes will not correctly handle filenames with embedded
whitespace returned from a command substitution.  In this case, temporarily
changing the IFS variable before executing a command may be the only
feasible solution.  The following example illustrates this approach:
  # Save original IFS
  OFS="$IFS"
  # Set IFS to newline
  IFS='
  '
  # The find command might output filenames with spaces
  wc -l `find . -type f`
  # Restore original IFS
  IFS="$OFS"

By searching existing shell scripts for the patterns I described and
applying the suggested changes most problems can be solved.  Other scripting
languages like Tcl and, to a lesser extend, Perl may also have problems
dealing with filenames with spaces.  Similar approaches (appropriate quoting
in Perl "eval" blocks and use of the "list" command in Tcl) can be used to
avoid these problems.

References

[1] David G. Korn. Porting Unix to Windows NT. In Proceedings of the
USENIX 1997 Annual Technical Conference, Anaheim, CA, USA, January 1997.
Usenix Association.
[2] Geoffrey J. Noer. Cygwin32: A free Win32 porting layer for UNIX
applications. In Proceedings of the 2nd USENIX Windows NT Symposium,
Seattle, WA, USA, August 1998. Usenix Association.
[3] Stephen R. Walli. OPENNT: UNIX application portability to Windows NT
via an alternative environment subsystem. In Proceedings of the USENIX
Windows NT Symposium, Seattle, WA, USA, August 1997. Usenix Association.
[4] Diomidis Spinellis. Outwit: Unix tool-based programming meets the
Windows world. In USENIX 2000 Technical Conference Proceedings, pages
149-158, San Diego, CA, USA, June 2000. Usenix Association.
<    http://www.spinellis.gr/pubs/conf/2000-Usenix-outwit/html/utool.html>
[5] Brian W. Kernighan and Rob Pike. The UNIX Programming Environment.
Prentice-Hall, 1984.
[6] S. R. Bourne. The UNIX shell. Bell System Technical Journal,
57(6):65-84 July/August 1978.  (Also appears in volume 2 of the Unix
Programmer's Manual and in AT & T, UNIX System Readings and
Applications, volume I. Prentice-Hall, 1987.)
[7] Brian W. Kernighan and Rob Pike. The Practice of Programming.
Addison-Wesley, 1999.

Diomidis Spinellis -     http://www.spinellis.gr/
Athens University of Economics and Business (AUEB)

------------------------------
[...]

End of RISKS-FORUM Digest 21.79
************************



Newsgroup comp.risks contents
Newsgroup list
Diomidis Spinellis home page

Creative Commons License Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.