Newsgroup: comp.risks


Received: from quarter.csl.sri.com (quarter.csl.sri.com [130.107.1.30])
by aueb.gr (8.8.5/8.8.5) with ESMTP id PAA20820
for <dds@aueb.gr>; Thu, 21 Feb 2002 15:49:35 +0200 (EET)
Received: from quarter.csl.sri.com (localhost [127.0.0.1])
by quarter.csl.sri.com (8.12.1/8.12.1) with SMTP id g1L1cx4m017602;
Wed, 20 Feb 2002 18:07:40 -0800
Received: from chiron.csl.sri.com (chiron.csl.sri.com [130.107.15.74])
by quarter.csl.sri.com (8.12.1/8.12.1) with ESMTP id g1L1cL1k017552
for <risks-resend@csl.sri.com>; Wed, 20 Feb 2002 17:38:21 -0800
Received: (from risko@localhost)
by chiron.csl.sri.com (8.11.2/8.8.7) id g1L1cL304083
for risks-resend; Wed, 20 Feb 2002 17:38:21 -0800
From: RISKS List Owner <risko@csl.sri.com>
Date: Wed, 20 Feb 2002 17:38:21 PST
precedence: bulk
Subject: Risks Digest 21.92
To: risks@csl.sri.com
Message-ID: <CMM.0.90.4.1014255501.risko@chiron.csl.sri.com>
Precedence: bulk
Sender: risks-owner@csl.sri.com
Content-Type: text
RISKS-LIST: Risks-Forum Digest  Weds 20 February 2002  Volume 21 : Issue 92

   FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

***** See last item for further information, disclaimers, caveats, etc. *****
This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.92.html>
and by anonymous ftp at ftp.sri.com, cd risks .

  Contents:
Patriot misses again (Lord Wodehouse)
Researchers claim to crack Wi-Fi security (Monty Solomon)
When machine metadata fails, address humans (Diomidis Spinellis)
Unwitting cell calls swamp 911 systems (Monty Solomon)
Abuse of intercept capabilities: 'Tampa' affair (Geoffrey Brent)
PayPal's tenuous situation (Jeff Jonas)
Ice-skating judging solution (Ken Knowlton)
Re: Miami-Dade OKs touchscreen voting (Alan Brain)
An unlocked system can be compromised quickly (Greg Searle)
Dangerous characters (Mark Lomas)
Computerized assistance with non-standard punctuation (David Piper)
Re: Homograph problems (Geoffrey Brent)
What's a buffer overrun problem? (William P. N. Smith)
Sorry, that number is now in service (Gene Spafford)
Re: Officer calls for refund of 'speeding' fines (Henry Baker)
Re: Social Security numbers on tax envelopes (Robert Ellis Smith)
The Security Risks of Programs That Automatically Update (Scott Schram)
New Security Conference - GOVSEC, Call for Presentations (Jack Holleran)
Abridged info on RISKS (comp.risks)

----------------------------------------------------------------------
[...]
Date: Tue, 19 Feb 2002 15:16:20 +0300
From: Diomidis Spinellis <dds@aueb.gr>
Subject: When machine metadata fails, address humans

The aggressive indexing of the Google search engine combined with the
on-line caching of the pages in the form they had when they were indexed, is
resulting in some perverse situations.

A number of RISKS articles have already described how sensitive data or
supposedly non-accessible pages leaked from an organization's intranet or
web-site to the world by getting indexed by Google or other search engines.
Such problems can be avoided by not placing private information on a
publicly accessible web site, or by employing metadata such as the robot
exclusion standard to inform the various web-crawling spiders that specific
contents are not to be indexed.  Of course, adherence to the robot exclusion
standard is left to the discretion of the individual spiders, so the second
option should only be used for advisory purposes and not to protect
sensitive data.

Today I came across a web page <http://www.rietta.com/sqlconnect/> with
metadata addressing the humans reading a page rather than the spiders.  The
page was apparently inadvertently, from the company's point of view indexed
by Google:

"NOTE: This page has been picked up by Google before we intended for it to
become visible.  The SQL Connect software is completed, but we still have to
finalize the documentation and this website in order to release it.  Please
check back soon for the download, or if you have questions, you can e-mail
products@rietta.com."

Worryingly, the same company also markets RoboGen, a product to manage the
robot exclusion specification file: "RoboGen allows you to easily manage a
robot exclusion file to control search engines indexing your website.
Featured in magazines and books, RoboGen is the most popular and easy to use
program for managing search engines that visit your website."

The moral?  The web has a long (and growing, see
<http://www.archive.org>) memory.  Information leaks due to incorrect
spider metadata and other errors can only be partially contained by
addressing new metadata to humans.

Diomidis Spinellis -     http://www.spinellis.gr/
Athens University of Economics and Business (AUEB)

------------------------------
[...]

End of RISKS-FORUM Digest 21.92
************************



Newsgroup comp.risks contents
Newsgroup list
Diomidis Spinellis home page

Creative Commons License Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.