Documentation

Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr

Documentation Types

Shortcut for Code Understanding

Code

    line = gobble = 0;
    for (prev = '\n'; (ch = getc(fp)) != EOF; prev = ch) {
        if (prev == '\n') {
            if (ch == '\n') {
                if (sflag) {
                    if (!gobble && putchar(ch) == EOF)
                        break;
                    gobble = 1;
                    continue;
                }
                [...]
            }
        }
        gobble = 0;
        [...]
    }

Documentation

-s  Squeeze multiple adjacent empty lines, causing the output to be
             single spaced.

Specifications for Code Inspection

Code

(Apache)
    switch (*method) {
        case 'H':
           if (strcmp(method, "HEAD") == 0)
               return M_GET; /* see header_only in request_rec */
           break;
        case 'G':
           if (strcmp(method, "GET") == 0)
               return M_GET;
           break;
        case 'P':
           if (strcmp(method, "POST") == 0)
               return M_POST;
           if (strcmp(method, "PUT") == 0)
               return M_PUT;
           if (strcmp(method, "PATCH") == 0)
               return M_PATCH;

Specification

(RFC-2068)
The Method token indicates the method to be performed on the
resource identified by the Request-URI. The method is
case-sensitive.

       Method         = "OPTIONS"                ; Section 9.2
                      | "GET"                    ; Section 9.3
                      | "HEAD"                   ; Section 9.4
                      | "POST"                   ; Section 9.5
                      | "PUT"                    ; Section 9.6
                      | "DELETE"                 ; Section 9.7
                      | "TRACE"                  ; Section 9.8
                      | extension-method

Obtain System Structure

Sendmail Files

arpadate.c, clock.c, collect.c, conf.c, convtime.c, daemon.c, deliver.c,
domain.c, envelope.c, err.c, headers.c, macro.c, main.c, map.c, mci.c,
mime.c, parseaddr.c, queue.c, readcf.c, recipient.c, safefile.c,
savemail.c, srvrsmtp.c, stab.c, stats.c, sysexits.c, trace.c, udb.c,
usersmtp.c, util.c, version.c,

Sendmail Documentation Headings

2.5. Configuration file readcf.c
3.3.1. Aliasing alias.c
3.4. Message collection collect.c
3.5. Message delivery deliver.c
3.6. Queued messages queue.c
3.7. Configuration conf.c
3.7.1. Macros macro.c
3.7.2. Header declarations headers.c, envelope.c
3.7.4. Address rewriting rules parseaddr.c

Understand complicated algorithms

Code

for (arcp = memp->parents ; arcp ; arcp = arcp->arc_parentlist) {
    [...]
    if ( headp -> npropcall ) {
        headp -> propfraction += parentp -> propfraction
                * ( ( (double) arcp -> arc_count )
                  / ( (double) headp -> npropcall ) );
    }
}

Documentation

Algorithm Documentation

Obtain the Meaning of Source Code Identifiers

Code

#define TCPS_ESTABLISHED 4 /* established */
(Notice useless comment.)

Documentation

RFC-793

ESTABLISHED - represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

                                   +---------+ ---------\      active OPEN  
                                   |  CLOSED |            \    -----------  
                                   +---------+<---------\   \   create TCB  
                                     |     ^              \   \  snd SYN    
                        passive OPEN |     |   CLOSE        \   \           
                        ------------ |     | ----------       \   \         
                         create TCB  |     | delete TCB         \   \       
                                     V     |                      \   \     
                                   +---------+            CLOSE    |    \   
                                   |  LISTEN |          ---------- |     |  
                                   +---------+          delete TCB |     |  
                        rcv SYN      |     |     SEND              |     |  
                       -----------   |     |    -------            |     V  
      +---------+      snd SYN,ACK  /       \   snd SYN          +---------+
      |         |<-----------------           ------------------>|         |
      |   SYN   |                    rcv SYN                     |   SYN   |
      |   RCVD  |<-----------------------------------------------|   SENT  |
      |         |                    snd ACK                     |         |
      |         |------------------           -------------------|         |
      +---------+   rcv ACK of SYN  \       /  rcv SYN,ACK       +---------+
        |           --------------   |     |   -----------                  
        |                  x         |     |     snd ACK                    
        |                            V     V                                
        |  CLOSE                   +---------+                              
        | -------                  |  ESTAB  |                              
        | snd FIN                  +---------+                              
        |                   CLOSE    |     |    rcv FIN                     
        V                  -------   |     |    -------                     
      +---------+          snd FIN  /       \   snd ACK          +---------+
      |  FIN    |<-----------------           ------------------>|  CLOSE  |
      | WAIT-1  |------------------                              |   WAIT  |
      +---------+          rcv FIN  \                            +---------+
        | rcv ACK of FIN   -------   |                            CLOSE  |  
        | --------------   snd ACK   |                           ------- |  
        V        x                   V                           snd FIN V  
      +---------+                  +---------+                   +---------+
      |FINWAIT-2|                  | CLOSING |                   | LAST-ACK|
      +---------+                  +---------+                   +---------+
        |                rcv ACK of FIN |                 rcv ACK of FIN |  
        |  rcv FIN       -------------- |    Timeout=2MSL -------------- |  
        |  -------              x       V    ------------        x       V  
         \ snd ACK                 +---------+delete TCB         +---------+
          ------------------------>|TIME WAIT|------------------>| CLOSED  |
                                   +---------+                   +---------+

Rationale Behind Nonfunctional Requirements

Code

if (newdp->d_cred > dp->d_cred) {
   /* better credibility.
    * remove the old datum.
    */
   goto delete;
}

Documentation

(P. Vixie's BIND Security Paper)

5.1. Cache Tagging

BIND now maintains for each cached RR a "credibility" level showing whether the data came from a zone, an authoritative answer, an authority section, or additional data section. When a more credible RRset comes in, the old one is completely wiped out. Older BINDs blindly aggregated data from all sources, paying no attention to the maxim that some sources are better than others.

Design Intelligence

Case

Pike and Thompson on adopting UTF over 16-bit Unicode representation in Plan 9:

Unicode defines an adequate character set but an unreasonable representation. The Unicode standard states that all characters are 16 bits wide and are communicated in 16-bit units.... To adopt Unicode, we would have had to convert all text going into and out of Plan 9 between ASCII and Unicode, which cannot be done. Within a single program, in command of all its input and output, it is possible to define characters as 16-bit quantities; in the context of a networked system with hundreds of applications on diverse machines by different manufacturers, it is impossible.


[...]

The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.

Internal Programming Interfaces

Examples:

Test Cases and Examples of Actual Use

Examples from the tcpdump documentation:

To print all ftp traffic through internet gateway snup: (note that the expression is quoted to prevent the shell from (mis-)interpreting the parentheses):

tcpdump 'gateway snup and (port ftp or ftp-data)'

To print the start and end packets (the SYN and FIN packets) of each TCP conversation that involves a non-local host.

tcpdump 'tcp[13] & 3 != 0 and not src and dst net localnet'

Implementation Problems and Bugs

at: limitations

At and batch as presently implemented are not suitable when users are competing for resources. If this is the case for your site, you might want to consider another batch system, such as nqs.

cat: caveats

Because of the shell language mechanism used to perform output redirection, the command
"cat file1 file2 > file1"
will cause the original data in file1 to be destroyed! This is performed by the shell before cat is run.

strftime: humor

There is no conversion specification for the phase of the moon.

ctags: bugs

Recognition of functions, subroutines and procedures for FORTRAN and Pascal is done in a very simpleminded way. No attempt is made to deal with block structure; if you have two Pascal procedures in different blocks with the same name you lose.

Development and Execution Environment Problems

// The following function is not inline, to avoid build (template
// instantiation) problems with Sun C++ 4.2 patch 104631-07/SunOS 5.6.
(Often comments are harsher)

Trouble Spots

2001-09-17 Urban [...]
  * proc.c: Go back to the interruptible sleep as reconnects
    seem to handle it now.
[...]
2001-07-09 Jochen [...]
  * proc.c, ioctl.c: Allow smbmount to signal failure to reconnect
    with a NULL argument to SMB-IOC-NEWCONN (speeds up error
    detection).
[...]
2001-04-21 Urban [...]
  * dir.c, proc.c: replace tests on conn-pid with tests on state
    to fix smbmount reconnect on smb_retry timeout and up the
    timeout to 30s.
[...]
2000-08-14 Urban [...]
  * proc.c: don't do interruptable_sleep in smb_retry to avoid
    signal problem/race.
[...]
1999-11-16 Andrew [...]
  * proc.c: don't sleep every time with win95 on a FINDNEXT

Undocumented Features

Why?

Additional Documentation Sources

Common Open-Source Documentation Formats

Important: properly typeset the documentation for printing.

Further Reading

Exercises and Discussion Topics

  1. Select three large projects from the course's reference source code and classify the available documentation.
  2. Comment on the applicability of the documentation types we described in open-source development efforts.
  3. Present an overview of the source organization of apache Web server by examining the provided documentation.
  4. Locate one instance of a published algorithm reference in the course's reference source code. Map the published version of the algorithm against its implementation.
  5. Categorize and tabulate the types of problems described in the Bugs section of the Unix manual pages and sort them according to their frequency. Discuss the results you obtained.
  6. The course's reference source code contains over 40 references to undocumented behavior. Locate them and discuss the most common observed cause of documentation discrepancies.
  7. Compare the documentation formats we described on usability, readability, features provided, and amenability to automated processing by ad-hoc tools.
  8. Locate and typeset on a high quality output device each of the different documentation formats available in the course's reference source code in your local environment. Discuss the difficulties you encountered.