http://www.spinellis.gr/pubs/tr/softman/html/softman.html
This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:

The document's metadata is available in BibTeX format.

Find the publication on Google Scholar

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Diomidis Spinellis Publications

The Software Management Process at ECRC

Diomidis Spinellis
January 1989

1 Introduction

1.1 ECRC

ECRC (European Computer-industry Research Centre) is an industrial research centre located in Munich and supported as a common resource by the European manufacturers Bull, ICL and Siemens. It is owned and financed equally by them. Its creation reflects current European industrial thinking, which aims at consolidating Europe's presence in the electronic data processing market. This consolidation is being achieved by addressing such topics as scientific cooperation, standardization and implementation of technical standards. ECRC parallels current European initiatives such as the European Community's ESPRIT program and the EUREKA project.

Basic research is carried out on behalf of the three member companies which use the results freely, all rights of research being shared by all three companies. ECRC has 16 different nationalities among its staff of 120. The average age of the research staff is around 30 and about half of them hold doctorates.

Research is structured around two key concepts:

  1. the fundamental role of knowledge bases in developing systems to assist decision making,
  2. the importance of logic programming in defining and implementing the concepts.
Coupled with the concepts named above are the issues of interacting with the complex systems during the decision making process and architectural issues needed to be resolved in order to provide the speed and environment needed in order to make the systems attractive.

1.2 The structure of the centre

The research program of ECRC is divided into four groups.

The logic programming group focuses on implementation and efficiency techniques for logic programming languages. Research topics are the integration of more powerful programming and problem solving styles like object oriented programming and constraints propagation techniques, programming environment tools like debugging aids, incremental programming techniques and exploration of decision systems.

The knowledge bases groups researches into extensions of the relational model used in database systems to include deduction and semantic capabilities to deal with a richer description of the world and theoretical issues arising in such systems such as consistency and integrity constraints and the completeness of the search.

The computer architecture group looks into the design of symbolic architectures adapted to the execution of the systems described above such as parallel systems and coprocessors modelling parallel computational models.

Finally the human computer interaction group studies the components and strategies involved in computer assisted decision making systems, issues of presentation and techniques necessary to specify and develop graphic interfaces.

The work done in ECRC is not devoted to specific application development. It is more technology driven than application driven. This has important repercussions in the way software is designed and implemented; these will be discussed later.

Each group has a group leader and is subcomposed into smaller teams devoted to specific projects. Each team has a team leader. The work is coordinated and supported by the administrative staff of ECRC. A managing director coordinates the research effort while an assistant managing directories is responsible for the day to day running of the centre. Three secretaries, an accountant and a person devoted to general clerical tasks supplement the lean administration. During my stay I was amazed by the efficiency and responsiveness of the administration to all problems facing the researchers. The administrative team essentially provided a fire wall which enabled the researchers to work with the least number of interruptions and distractions. They solved problems ranging from liaising with the German bureaucracy, to accommodation finding, to ensuring that all the equipment needed would be there and working.

Part of the administration was the technical group which was responsible for the running of the computer systems. Because of their importance to the software development process I will deal with this group in later section.

The ECRC management reports to the Shareholders' Council which approves programs and budgets and monitors their execution. The council is comprised of representatives from each shareholder company.

A separate Scientific Advisory Committee advises the Shareholders' Council in determining future research directions. Its members are experienced members of the academia and industry, usually independent from the shareholders, but approved by them.

In the following sections we will examine how software production was managed in ECRC. We will focus on the tools and methods used for managing a number of diverse research projects.

Clearly the software management in a research environment is radically different from the management needed in a production environment. The goals that exist in the two environments are different and in many cases contradictory. Software that will be marketed must be correct, secure (penetration proof) in its code, data and usage, must exhibit operational resilience, be fail safe and cost effective. In a research environment emphasis lies on the development of new approaches, methods, algorithms and techniques. Thus the production of software proceeds in an experimental and ad hoc way. Some aspects of the research approach to software engineering are:

Furthermore the end users of the software produced are different. In the production environment software is targeted usually towards an unknown user whose background and computer literacy are unknown. End user product support is expensive and can be difficult to obtain. Thus the programs have to be well tested, documented and bullet proof. In a research environment the users of the programs are likely to be the same people who are writing them or someone at the next office. Documentation can be substituted by reading the source code and emphasis is placed upon the extensibility and novelty of the programs. Efficiency is also important as it often is a research goal.

2 The working environment

An essential factor to the process of software development is the environment in which it takes place. I have already described the framework of the centre and the help that is provided by the administration. In the following two sections I will deal with the hardware that was used for software development and the software provided in general. More specific software issues will be dealt with in later chapters.

2.1 Hardware

Although ECRC is owned by three European computer firms almost none of the equipment used was made by them. The hardware where the software was developed and run consisted mainly of Sun 3/60 workstations. Almost all the workstations had a monochrome high resolution screen. The management did not believe that offering colour monitors and display adapters would provide any tangible increases in productivity. Even the introduction of the workstations themselves was met by resistance by the management. The first workstation was purchased for the human-computer interaction group after considerable pressure from the group. The argument of the management for not providing them with a workstation was that if they did then everyone in ECRC would also ask for one. Indeed what the management had prophesied happened. When I arrived there already were 60 workstations which increased to about 80 in the six months I stayed. Almost all researchers had their own workstation. The workstations were initially bought with 4 MB of memory. As the operating system was upgraded it was found that the memory available was insufficient for the needs of the operating system, the windowing software and the application packages. The machines were becoming very slow due to excessive paging over the network. For this reason during the summer all machines were upgraded to 8 MB.

All workstations were diskless and relied on file servers for file storage and swapping. The file servers were Sun 3/280 machines with about 500 megabytes of file storage each. Each group had their own fileserver which was also used as a fast machine for CPU intensive tasks. Each workstation had access however to files from all other groups. For space efficiency reasons the operating system executables and utilities (such as compilers editors etc.) were all stored on a single machine which during the summer was supplemented by a duplicate in order to enhance reliability and throughput. This was done by mounting all the executable files through an automatic mounting program. Whenever a workstation demanded an executable it would get serviced by the server that was first in answering the request. Thus if one of the two servers was down or overloaded the other would take its place. This setup had some problems, but still was much more reliable than the one provided at Imperial College. Every night a program ensured that any updates made on one server were propagated to the other. One of the servers was also used for the library system. Through that system researchers could search for books and reports, lend and return them. No library officer existed and the system maintenance was done at a very high standard, by a secretary.

In addition to the Sun workstations and servers a VAX 785 provided mail and USENET news service, service for administrative personnel, and some peripheral functions like the menu at the Umweltsministerium (ministry for the environment) canteen which was also used by ECRC and an on-line internal telephone directory. The up time accord of the VAX was excellent. During May it provided continuous service for 41 days a number I have again not observed on any Imperial College provided UNIX facility. Other machines that were used were a 12 parallel processor Sequent Symmetry machine which was to be used for experimental purposes by the computer architecture group, three microvaxes and some Bull SM machines which were the only machines produced by any of the shareholders and were being sold because they were very unreliable. When they were offered for sale to the ECRC employees they were described as 50 kg of extremely high quality metal. A number of Symbolics and XEROX LISP machines were also being phased out signalling the failure of specialised architectures. Accounting was done on an IBM PC compatible machine which was quite fortunate given the security problems of the UNIX installation which will be described latter.

Backup facilities were centralised. All the file servers were backed up each night by an automatic procedure. There were no computer operators working at ECRC. Backup was done by relying on the newly developed Exabyte technology whereby around two gigabytes of data can be stored on a single 8mm video tape. I never needed to have any of my files restored, but accounts of other people described the whole procedure as safe and practical.

The security of the setup was rather lax. Physical security for outsiders was acceptable since entrance to ECRC and the machine room was controlled by a magnetic card system. Any ECRC worker could however access the machine room and this was a practice not frown upon, as the line printer was in the machine room. What was abysmal was the software security. The technical support group had honourable intentions towards that area and had even gone to the extend of modifying the system kernel on the VAX to disallow setuid shell scripts (a well known security hole) and ran every night a script to search for such programs. However during my first month I discovered that the password entry for the managing director of ECRC was blank on all Sun machines allowing anyone to log in as the MD. In addition to that in order to ease system administration, becoming superuser on one machine meant that one was superuser on all of them. The well publicised security holes used by the November 1988 Internet worm were still open. Although process accounting was kept on all the servers it was not kept for the workstations a fact which meant that any actions performed on them could not be traced. In addition to that accounting had been turned off for the UNIX utility which was used to connect to the outside world, although one of the managements' security concerns was the unauthorised distribution of internally developed software to outsiders. Some programs developed by the technical group and run as setuid root (i.e. run with superuser privileges) were insecure and provided another route for becoming superuser. Furthermore some people were in group kmem (group of people privileged to read and modify the kernel memory) and I found that one of them used that privilege (or some other method) in order to snoop over the account and mailbox of the managing director. I reported some of the holes to the technical group and some of them were fixed. Their response to me mentioning the Internet virus problems was that they were irrelevant since ECRC was not on the Internet!

The machines were internally networked using thick cable Ethernet. During my stay thin cable Ethernet was also added. On top of that TCP/IP and Berkeley network applications were run despite the European and shareholders commitment to the ISO protocols (to be fair here, there was no other viable alternative). The network performed surprisingly well considering that there were about 80 machines on a single Ethernet segment. Externally the centre was connected to the Datex-P X.25 network as provided by the Deutsche Bundespost (the German telecommunications authority (PTT)). Uucp (UNIX to UNIX CoPy) connections were established with the German and French EUNET (European UNIX Network) backbone sites (University of Dortmund and INRIA, with Dortmund also being the German BITNET backbone), with the German CSNET backbone (University of Karlsruhe), with Pyramid Corporation in the USA and some other sites. Electronic mail and USENET news were provided over these connections. The electronic mail service was excellent; cost considerations limited the number of USENET news groups provided, something which was resented by a number of researchers (some circumvented this by just logging into other sites and read news there, thus multiplying the costs). At the time I was leaving ways for the centre to become even less dependent on the German backbone and telecommunications authority were being looked into. The centre was already not using the German backbone for USA mail, calling instead Pyramid in the States directly in order to minimise costs. The next step under examination was that, of not using the German PTT Datex-P X.25 network, but extremely fast MODEMS using proprietary technology (Trailblaizers by the USA Telebit corporation) instead. One problem that often surfaced in ECRC was its non-connectivity to EuInternet (the European Internet). This would allow for remote login and file transfer to and from a huge number of sites in Europe and USA. There were valid technical and economic reasons for this, but still a researcher at ECRC was disadvantaged over his colleagues in the States or the Scandinavic countries in that he would have to wait for weeks to get a tape of software they could get in seconds. It is indeed a sad state of affairs that the solutions and standards offered by the international and European standards bodies and organisations are proving to be impractical, uneconomical and a hindrance to research by not providing the functionality needed.

Printing was provided by a line printer which almost nobody used (I got some very strange looks when I walked in the corridors with the lined perforated paper in my hands) and by a number of laser printers. These were connected serially to the VAX. They were used for printing listings (although this was not done very often, I presume the introduction of workstations and windowing environments saved a significant number of trees) and research papers and reports. The speed of the serial link was clearly a bottleneck for the size of the files output by modern document preparation systems such as TEX. Ethernet connected printers were being looked into.

I will end the description of the computer setup by going into the people who run them, the technical group. Considering the number of different machines, operating systems and software they performed marvellously. It was an extremely welcome surprise to find out that the technical group actually tried to help the researchers instead of hinder them as I found that was often the case at Imperial College. The work of the group although within a formal framework of meetings and objectives was done in an ad hoc manner, often found in successful technical support groups. Persons of varying skills and ability were tackling a great variety of problems ranging from hardware (installing the thin cable Ethernet and memory expansions) and purchasing decisions, to system support, to supporting research activity by giving advice on performance issues to researchers. In contrast to Imperial College there were no forms to fill in, nor user liaison officers and operators. There was a group leader and each member had a speciality but in my many dealings with the group I was never referred to someone else as an excuse.

2.2 Software

At this point I will give a brief description of the software used within ECRC. Later on the report I will focus on software tools that were used for the development.

All the software development machines in ECRC were running varieties of the Berkeley version of the UNIX operating system. The VAX was running 4.3 BSD and the Suns were running SunOS 3.2 when I arrived and SunOS 4.0.1 when I left. The Sequent machine was running the Sequent UNIX which was also by default set to emulate the Berkeley environment. The microvaxes were running 4.3 BSD as provided by Mount Xinu. Clearly AT&T's hype on the advantages of its System V release of UNIX were not taken seriously. Most of the machines were running Sun's NFS (Network File System). When I arrived almost all researchers were using Suntools as the windowing system of their workstation. During my stay the technical group tried to convince the researchers to switch to the X-Window system. Because of my experience with the X-Window system which is widely used at Imperial College I was asked by the technical group to provide some assistance during the switch over. As a result I got the opportunity to get a first hand experience with the problems the users faced during the switch over, which by the time I left, was far from complete. In summary the X-Window system although it offers significant advantages compared to Suntools or other proprietary windowing systems does not have the same quality of documentation, solid implementation and range of tools as do the other window systems.

The source code for the operating system, its utilities and the X-Window system was stored online and was open for reading. This proved to be very helpful when trying to solve difficult bugs or battling with badly documented utilities. In one case a mysterious bug appearing under seemingly irrelevant circumstances was traced to a race condition in the UNIX kernel. I do not think I would have been able to explain the bug without the kernel source available.

The standard UNIX compilers and debuggers were the ones used for all development although the GNU C and C++ compilers and the gdb debugger were available. Editing was done using the vi and GNU Emacs. The switch to GNU Emacs was done after it was decided to stop supporting the commercial Gossling Emacs in favour for the free GNU one. When I arrived initial opposition to the change had subsided.

In a research establishment like ECRC document processing is a very important area. A significant portion of the work done is judged by the reports and articles that are produced. The traditional document processing system used in ECRC was the commercially available Scribe system which had also been abandoned by the time I arrived in favour of the free LATEX system. Diagrams were still very difficult to produce and also in great demand. The UNIX pic utility the picture environment of LATEX and two Apple Macintoshes together with a pair of scissors and glue were being used when I arrived. I started using the X-Window system interactive drawing editor and by the time I left some had followed suit. Still creating a diagram and fitting it into the text was not something trivial. A number of researchers demanded the provision of commercial WYSIWYG desktop publishing software. This was under consideration by the management, although economic factors delayed the decision to purchase it. As usual management was ready to pay for hardware, but hesitant on software purchases.

A number of different systems were used for electronic communication. The most usual system was mail. Sending mail to the alias all resulted to mail being sent to all people in ECRC. Analogous aliases for specific groups existed as well. This system was sometimes distractive; at times one would receive five messages in an hour. It also had the funny effect of beeps progressing through the corridor as mail was delivered and people were progressively notified that new mail had arrived. One other system used was that of notes. This is a system analogous to USENET news. Private newsgroups for specific groups existed. This system was used for communication of specific software problems. It had the advantage of acting as a diary and repository for problems and solutions. In addition to the systems described above USENET news were used as an open communications forum with the whole world with groups like the comp.lang.prolog being almost private groups in the sense that people writing in them were either ECRC researchers or well known to them. The UNIX write and talk utilities were also sometimes used supplementing the internal telephone system e.g. ``echo "Go for lunch ?" | write periklis.''

3 Design and specifications

In this section we will examine the way software developed at ECRC was designed and specified. The review will be structured in the way a requirements specification is usually structured, starting from computer characteristics and hardware interfaces and continuing to software functions, various constraints and response to undesirable events.

3.1 Computer characteristics

During the design process in my group in ECRC I never witnessed the mention of hardware as a starting point or guide for the design. I find this quite acceptable for a research environment as hardware constraints could strangle an otherwise brilliant idea before it could flourish. Some implicit assumptions were made, restricting the design to the hardware available. Even that was not always the case as the computer architecture group secured the purchase of a multi processor machine in order to develop a system. One case where hardware properties were taken into account was when designing for efficiency. I will describe two cases which fell into my attention during my stay.

One case where hardware properties were taken into account was that over the decision of what code to generate for the Prolog compiler. There is an argument in the logic programming community over the two options one has during code generation. The one option is to emit native code for the machine the code is to be executed. The advantage of this method is the inherent speed with the on disadvantage being the hidden costs which surface in the form of cache misses and frequent paging (due to the big size of the code). The other option is to emit some form of abstract code (such as Warren abstract machine instructions) and interpret that. The advocates of the latter method maintain that the interpreter can be held in the machine cache and with the elimination of paging due to the greater expressive power of the abstract instructions speed advantages are to be gained.

Another time where the underlying hardware was taken into account was while I was designing the architecture of the database for a debugger. The database had to contain data about all program lines executed. The previous approach used and the one recommended to me was to keep the database in a temporary file on the disk. I thought that I could keep it in virtual memory. An examination of the temporary file partitions and the swapping partitions of machine setups in ECRC revealed that most machines had more virtual memory than space for temporary files. Thus the database was stored in memory with significant speed advantages. Furthermore buffering was now handled by the paging algorithm of the operating system. This eliminated the buffering code that was used in the previous release and with it a number of bugs whose source had not been traced. The code was 10% of the size of the old code resulting in obvious advantages in maintaining it It was also 20 times faster.

Computer characteristics were naturally of particular importance to the computer architecture group. There the characteristics were the second most important factor in software design. During the period of my stay there was even a trend to make computer characteristics the guiding factor in design. I will elaborate a bit on the above. Initially the computer architecture group started with a software concept (such as parallel execution of Prolog programs) designed hardware that could efficiently implement that concept and finally tailored the software to the hardware. During my stay this approach was altered. It was decided that ECRC would not directly design hardware, but would use existing equipment. The purchase of the Sequent multi-processor machine and examination of the new SPARC chips fitted the new concept. Thus software would have to be designed tailored around existing hardware.

3.2 Hardware interfaces

Hardware interfaces were not an important issue in my group. The research done only used the standard hardware provided my the manufacturers and thus all interfacing was done using the operating system. I understand that the hardware interfaces were of some concern to the computer architecture group although surprisingly enough not to the human-computer interaction and knowledge base groups.

I was expecting the human-computer interaction group to experiment with new methods and channels for interaction between the human and the machine or use computer controlled experimental equipment to measure aspects of that interaction and was disappointed to find that the devices used were colour screens, mice and keyboards. The knowledge base group could also profit from directly controlling the hardware interfaces between the processor and the direct access storage devices (disks). General purpose operating systems and UNIX in particular provide very blunt tools for handling disks as they don't allow control over the physical placing of data on disk. Furthermore the data caching they provide introduces additional unwanted complications to database systems.

3.3 Software functions

The most important aspect of design is certainly the way software functions are decided upon. This was also the most vague part of the specification process in ECRC. I can not give a concrete procedure that was followed in order to arrive to a specific software functions requirements document. The main reason for this is the research nature of the work. I will however attempt to outline the process though which the software took shape.

Research directions were specified by the shareholders council. These were further refined by the scientific advisory committee which in turn presented them to the ECRC management. The group leaders took these research goals and - I presume - after some brainstorming arrived to some software products which would meet those goals. At that point individual researchers would start implementing that software using rapid prototyping methods. Since only one person was involved in the design and the implementation, detailed requirement specification documents and design descriptions were not necessary. If at some point a project gained enough momentum such that more support was needed then some parts of it would be redesigned in a better way. At the same time the interfaces between the modules would be documented so as to enable the increase of people working on the project. I have the feeling that this was the way most ECRC projects were developed.

3.4 Timing constraints

Timing constraints were most of the time irrelevant to initial stages of the design. Since the systems were not geared towards real time control the only existing timing constrain was that the system should be sufficiently fast to prevent its human user from falling asleep. This was not a goal that was taken consciously into account when designing. Any shortcoming were noticed at the testing or demo stage and were usually dealt with faster machines (a Sun 4 was purchased in order to run an expert system implemented using CHIP (Constraint Handling In Prolog) designed to manage the harbour of Hong Kong), additional memory (during my stay most researchers got a 4 MB upgrade on their systems because software was running incredibly slow) and sometimes better written software (this was especially true for systems whose design goal was speed.) This brings us to the other major timing constraint which I observed at ECRC: ``Our system must be faster than the competition.'' This is indeed a noble goal to aim for. Its major effect was that during the design stage the fastest method available for doing a specific task would be chosen sometimes irrespective of other implications such as relevance to the research goal or contribution to the total running costs. An example of this approach was that SEPIA (the Prolog system developed by ECRC) had the fastest existing compiler, but - this is a personal opinion - because of the complexity of the resulting system the speed of the code produced left a lot to be desired.

3.5 Accuracy constraints

I never witnessed accuracy constraints being discussed during the design process at ECRC.

3.6 Response to undesired events

The main undesired events that were relevant to the systems developed at ECRC were erroneous user input and exhaustion of finite computing resources such as memory and file space. I don't think that any of these events were taken into account during the design process.

Since the initial end users of the systems would be researchers erroneous user input was not regarded as a serious issue. The systems implemented did of course some rudimentary checking of user input, but there was no determined design effort geared towards it. For that reason error messages were not organised and appeared in various formats and ways. As a project reached maturity and the intended audience widened the unorganised response of the system to erroneous input became annoying and even hindered the wider acceptance and use of the product. I found it initially hard to learn the system I was supposed to modify because of its sometimes inexistent or inexplicable error messages.

The problem of exhaustion of memory or file space is one that can not be solved effectively unless a strategy is developed from the beginning of the products lifetime. If such a strategy has not been designed into the product then the only viable strategy to follow when a resource is exhausted is to print a message and exit the program. Adding a garbage collection or graceful termination strategy after the main product has been designed and implemented is next to impossible. This was the case with most projects in ECRC. Exhaustion of resources was dealt in an ad hoc matter. Sometimes it was even assumed that infinite amounts of the resource existed which resulted in mysterious program crashes and errors. I have also the impression that one of the reasons garbage collection was not implemented in SEPIA, but was constantly promised was that it had not been designed from the beginning.

From the two events described above I believe that the software development process at ECRC would be enhanced if the exhaustion of finite resources was taken into account in the design phase. Erroneous user input checks can always be added at any point of a projects lifetime since their effect is usually very narrowly localised. On the contrary memory exhaustion errors and recovery strategies have to be built into the products from day one.

4 Development and tools used

During my stay in ECRC I had the chance to experience a range of problems that developers faced during software development. Probably due to my own interest most of the problem I noticed are connected with the use of software engineering tools. Here I outline some observations that I made, in connection to the tools being used, during the software development process.

4.1 Revision control systems

4.1.1 The problem

A well known problem in multi person programming projects is that of the coordination between the members of the team. An undisciplined mode of work is a sure recipe for disaster. Many are the modes which lead to problems: Developers can simultaneously edit the same module, make incompatible changes to different modules, take private copies of modules, change them and after some time return them back thereby deleting all changes that have been made by others etc.

Version control systems like SCCS and RCS are supposed to help solving the problems listed above. The situation in ECRC suggested otherwise. Both systems were being used. At the same time however almost every researcher or team had in their directories private copies of the whole product being developed. That copy was being changed and developed without, in many cases, trying to reconcile it with the rest of the work. The result was, that over a number of years programs had diverged to the extent of becoming totally different. In a case I witnessed two groups comparing their two programs which had clearly evolved from the same system as if comparing two products from competing organisations.

4.1.2 Problems with existing systems

From the description above it should be clear that the primitive version control systems provided in the UNIX environment like SCCS and RCS are not sufficient for adequate version control. The main reason for this is that they deal with a low level abstraction, the text file which is in turn decomposed into lines. A user can only deal in this level of granularity. Changes are internally specified as lines that are added, deleted or changed and a user can only lock a specific file or number of files. In some cases the granularity of locking is too coarse. For example a user might wish to lock just a small subroutine that exists in a file in order to try to optimise it. That process might take weeks and clearly it is not desirable to have the whole file locked during that time. In other cases locking on files might be a blunt instrument. This is the case when a programmer needs to change the name of an identifier throughout the whole system. All modules need to be locked thus disabling everyone else from working on the system.

More sophisticated software engineering tools exist. Their main problems are:

Recent approaches to software engineering environment like PCTE (Portable Common Tool Environment) which started as an Esprit project in 1983 solve the problem of the ``all or nothing'' approach by providing a hosting structure designed to be the basis of a software engineering environment. Each PCTE based environment is regarded as a collection of tools and services specific to a particular project life cycle model and application domain. I believe that some of the work I did used this approach in providing project related environments, only at a much more rudimentary level. Instead of PCTE I used UNIX as a hosting structure and the tools and services I provided were developed by improvising and following the users' needs rather than a formally developed framework.

4.1.3 My team

The team I was working with was part of the logic programming group and consisted of three members. The object of the team was research into an advanced debugging environment for logic programs. The existing software consisted of approximately 20000 lines of C and 30000 lines of Prolog. It was divided in about 40 files. Part of the software was based on a Prolog interpreter developed by Lee Naish in Australia. Some parts of the source, were automatically generated by other programs that were also part of the system, a situation which contributed to a very complicated bootstraping process. Initially the whole system was being generated by a makefile which sometimes exhibited a nondeterministic behaviour.

When I arrived I enquired on the revision management tool used and was told that RCS was being used, but they were trying to switch to SCCS as this had been decided as the revision control system to be used throughout the centre. I checked to see how the system was used and to my surprise I found that the mode of use was rather idiosyncratic: All the files were checked out (that is locked, so that they could be modified). From time to time (where time could be anything from a year to a couple of months) all the files were checked in and then immediately checked out and locked again. In addition to that, various directories contained the whole source tree at various stages of development. Conflict between the members of the group was avoided by having each member take a copy of the source, work on it and then manually reconcile the resulting files.

I found the existing situation to be a challenge on managing a software production environment. There were definitive problems and the existing systems could not provide the answers. Since it was my responsibility to provide the new version of the software I was in a position to provide a new version of a working environment as well.

While I worked on the implementation of my project I tried to find out what the problems with the existing systems were and how I could improve on those. My modest aim was, by the end of my placement to have solved the particular software management problems of my team. The result of this exercise would be experience which could, if added to other experiences, give some usable generalisations in the long run. I do not claim that the methods I used in that team are viable for other environments and indeed I have since used other approaches in different situations.

4.1.4 User education

The first problem that I encountered was that of user education. The way the existing tools were used was clearly incorrect. It was an excusable consequence of their documentation. Most UNIX commands are documented by a set of manual pages and a more descriptive paper, usually appearing in a ``Supplementary Documents'' collection. Usually only the manual pages appear on-line and few users bother to read the supplementary document. The manual pages however explain the mechanics of using a command, but not the philosophy behind the mechanics. In my team exactly that was the situation. The team leader had realised from reading the manual pages that RCS could be used to keep track on the versions of a software system. What was not apparent from the manual pages (and what I explained to her) was the way in which the system was to be used. During my stay at Imperial College I was fortunate enough to work with more experienced students who knew how to use the tools. It is sad that the use of these tools was not part or prerequisite of any course until the seconds term of the final year.

The practice within the team was to have all the files checked out. On the contrary the correct use of the system is to have all the files checked in. Whenever a user wants to change a file they would check it out, edit it, debug it and then check it in again. After I explained that, we made a snapshot of the existing situation (I was not trusted to keep the integrity of the environment) and checked all files in.

RCS was not being provided at the machine we were working on, so I compiled it for the private use of the team. I compiled it with a snoop feature which puts a log of all the actions on the files in a special log file and informed the other team members of the fact. From time to time I checked the contents of the log file and sent mail to members of the team if a file was being used in an incorrect way. With surprise and satisfaction I found out that the system was being used almost perfectly and that the members of the team were very pleased by it.

4.1.5 Independent development and testing

A plain revision control system does not give a framework for a multi member development effort. If all members work in a common directory and use the revision control system to avoid simultaneously editing the same file they still have the problem that while a file is being edited the rest of the team can not be assured that they have an error free system to successfully compile and check their modules against. For that reason I developed the following structure:

A common directory was used as a repository for the revision control system. There a single copy of every file used for the system was kept together with backwards deltas to all its previous version. Each member of the team had a symbolic link to that directory in their own private directory. Whenever they wanted to change something they would check out the file and lock it. Then they would edit the file and check the system until it all worked in their won directory. Once the system worked they would then return the module to the common source tree. It is clear from the above description that the same module can not be edited by two people at the same time. This however was not a problem in our team, because different members were working on different parts of the system. Furthermore a member usually had less than 10% of the modules locked for their own use. Typically a member would work on the Prolog part of the project, while another would work on the C part.

4.1.6 Release control

An additional requirement from the team was, that of release control. What they wanted was the ability of recreating a specific release from the set of files. As development continued they wanted to be able to return to a specific debugged release on which they would be able to run demo programs or use it as a point to benchmark a new system against. The method that had been used was that of keeping a copy of the source tree in another directory and sometimes on tape.

Although RCS allows getting a specific revision of a file it does not give a method for managing a collection of files as a specific release. Reading the USENET newsgroup on software engineering (comp.soft-eng) at that time I realised that a number of people were experiencing the same problems. One solution proposed was that of giving a release number to the RCS state variable. That however was a misuse of a feature which had clearly been clearly provided for another purpose. One other attractive solution discussed in the group was offered by the concept of s-lists.

An s-list is short for state list. It describes the versions of all modules used to build a specific release of the system. An example of an s-list might be:

Release 1.10
main.c 1.4
symtab.c 2.3
memory.c 1.12
kernel.p 2.3
fget.p 1.4
makefile 1.3
By retrieving the specific versions of the files given in the s-list a release can be accurately reconstructed. Based on the idea of the s-lists I wrote a range of tools to create an s-list given a set of file names, create the s-list of the latest files in a system and given an s-list retrieve the appropriate versions from the file repository. These tools were integrated into the makefile for the system. When one had tested a release and wanted to keep it as a stable release they would say ``make s-list''. The system would then prompt for a comment for that release and display the number of the release (say 1.5). If at a later point (and several releases latter) a user wanted to recreate that release they would only have to say ``slco 1.5''. The files associated with that release would be automatically checked out. Typing ``make'' would then automatically generate that release.

4.1.7 Information on locked files

A feature which RCS lacks is information about which files in a system are being used by whom. This is expectable since RCS works at a file level and not at a system level. I solved that problem by writing a shell script called rcstell which listed the names of all modules checked out as locked and optionally their version and the name of the person that had checked them out. The script was relatively easy to write as it was based on existing UNIX and RCS utilities. Three months later members of the team complained that the command although very useful was too slow. By that time the initial version had changed to accommodate the needs of the group. I rewrote the program from a shell script to a C program. The rewriting process was easy as I could base it on the structure and user interface of the shell script and a template C program written by Henry Spencer. The new version of the command was more than three times faster than the old one and for me it was the first time I went from a utility developed by rapid prototyping methods into a tool frozen in efficient code.

4.1.8 Dependency generation

After about a month of usage of the environment I developed we came across some strange problems. At some point a member of the team discovered that she was editing files that were clearly out of date. Furthermore some other files I updated removed changes she had introduced. Fortunately the use of RCS enabled us to find the nature of the problems and fix them. The reason behind the problem was that of dependency generation and getting the latest modules. As the system was installed, unless a user explicitly asked to check a module out they would not get the latest version. This resulted in members of the team using out of date versions of files.

The solution to that problem was to provide a way to automatically get all the latest versions of the modules. Getting them from the file repository every time one compiled was clearly inefficient and undesirable. What was needed was a facility to only check a newer version of a module out when one in fact existed. The facility of only performing an action if a file is older than another is provided by the make program. The problem with make is that a makefile describing these dependencies is needed. The makefile must be kept up to date, or chaos is created. At that time ECRC received the tape containing the software that had been released as non proprietary by the Berkeley University. Inside that tape was a tool called mkdep that automatically creates and maintains module dependency rules in makefiles. These are appended to the end of a makefile (after the old ones are removed). I modified that tool and added to it the capability to automatically create dependency rules on the files kept in the RCS file repository. That is if a file main.c existed in the working directory and a corresponding file RCS/main.c^v existed in the RCS repository a rule of the form ``main.c: RCS/main.c^v'' would be added in the makefile. A rule at the beginning of the makefile instructed make to use the check out program of RCS to retrieve files from the repository. Thus whenever the date and time of RCS/main.c^v was newer than the one in the working directory the newest version would be automatically retrieved.

As the tools I developed provided solutions to real problems which were also relevant to discussions taking place in the USENET software engineering newsgroup I took permission from the ECRC management and posted these programs to the comp.soft-eng newsgroup. Three months later someone reposted them in the alt.sources newsgroup. I take the repost as an indication that they were indeed found to be useful.

4.2 Tag generators

A problem one has when working on code used by other people is that of finding what each procedure is supposed to do. Unless a data dictionary exists and is always kept up to date the only alternative (other than a guess based on the name of the procedure) is to look at the source for that procedure. This involves going through all the modules searching for it. Fortunately the UNIX system provides a tool which goes through all the files of a system and creates a tag file giving the file name and a search pattern for each procedure. Editors can make use of that file and by pressing a suitable key sequence whilst the cursor lies on an identifier the editor can transfer context to the place where the identifier is defined.

Unfortunately the utility can create a tag file for C, FORTRAN, Lisp and Pascal code, but not for Prolog programs. I found this restriction quite annoying as a lot of code I dealt with was badly documented Prolog code. I modified the utility for my personal use to understand Prolog programs. After I had used it for a while I gave details of its existence to other people in ECRC . During the same day I received mail from many different people in different groups about Prolog programming styles, editors and CPU architectures the utility did not support. Clearly the utility was proving to be very popular. As at that time I was well within schedule in my regular work I worked in providing a version of the utility to work with the Emacs editor, compiled it for different CPU architectures and also modified it to deal with different styles of programming.

This was not trivial as there is no notion in Prolog of a single definition of a procedure. A term can be defined a number of times and can also be dynamically asserted. The syntax of Prolog makes the definition of a term difficult to trace as a term usage and term definition only differ by ending in . or , instead of :- . Heuristics were used and after some experimentation a set that closely matched the pattern of use in ECRC was found. I was surprised that such a utility did not already exist and delighted by the positive response I received.

4.3 External name checkers

While cleaning up the code of our project I found that the persons who had written it were rather economical with the usage of the C keyword static. In other words almost all identifiers were made globally visible throughout the system even when they were not used by any other module. The reason I was given for this was that the implementor thought that global variables were more efficient than local. This notion was probably a result of confusing the scope of variables with their storage class as in many architectures variables that are stored on the stack are more expensive to access than the ones that are stored in the statically allocated data area. Keeping all identifiers global is a bad software engineering practice as it increases the probability of name clashes by polluting the name space. Furthermore, because the linker does not check the type of the identifiers, variables of different types could be put in the same memory locations with disastrous results. The lint program checker warns about such problems, but the source I dealt with was initially so carelessly written that it produced about 5000 error messages. Dealing with each message individually was prohibitively time consuming. By automating the process and using methods like the one described bellow by the time I left the lint error messages had been reduced to 30.

In order to fix the problem of unnecessarily globally defined variables and procedures I developed a tool that went through the code and printed the name of each module and the variables and procedures that should have been declared as static in the module, but weren't. One approach for doing this, is going through the source code, parsing it and building a database of definitions and uses for each globally defined variable. This involves considerable effort and thus a simpler and more general, although not as portable, strategy was chosen. Each object file contains a list of the global identifiers it declares and a list of the global identifiers it uses. The tool first went though all the object files and created a list of all identifiers that were used in a module other than the one in which they were defined. Then for each module it went through the list of identifiers that were defined and printed out those that did not exist in the second set. This was done in an extremely fast way by using the UNIX fgrep utility which can search for a big number of alternative fixed strings in a file by using the Aho and Corasick algorithm. The tool was later extended to create a shell script which started up consecutive editing sessions, each time placing the editor cursor on the item that was erroneously declared. In this way 115 global variables and procedures were converted to local within a single day.

4.4 Documentation

As I have indicated before, a lot of the software developed at ECRC had sparse user documentation. To be fair here, I have to admit that there was a wealth of technical documentation in the form of reports and papers for most of the work done. Still the end user had to rely on either guessing the way a system was supposed to behave or try to get the information from examples presented in the technical reports. Some more mature products like SEPIA had a user manual associated with them, which had always to be kept up to date.

A possible reason for the lack of up to date documentation is the research orientation of the work. The constant pressure for obtaining new systems did not allow the researchers to stand back and produce things like user documentation. During my work I came across a novel way of producing user documentation. It was generated automatically. Each new command for the debugging system had to have as part of its definition a string that explained how it was to be used. This string was never left blank as it was also used as a comment inside the code to explain the purpose of the procedure. A program could then extract those strings and the associated command names and convert them into suitable LATEX macros that when processed by LATEX produced a very nice looking user manual. I used the manual while learning the debugger. The manual was of course styled in a reference like fashion which meant that I didn't know where to start from, but after the initial problems were overcome I found it useful.

5 Testing

The attitudes of testing in ECRC varied. Some times testing procedures were very informal. In our team if the system was able to bootstrap and the debugger able to trace through the execution of the ``true'' Prologue predicate then the code was probably functioning correctly. In most cases this was true. Whenever a particular part of the code was changed specific tests were devised for that part. These tests were not recorded or added to a regression testing suite. This cavalier attitude towards testing was not as bad as it sounds. For all the foreseeable future the only users of the system would be the members of the group. Time spent in testing could be spent in using the system or further development. If an annoying bug appeared during the use then it would be fixed. In fact when I started work at the project I was told about a number of known bugs. The members of the team either worked around them or ignored them.

On the other hand in the team developing the SEPIA Prolog system a regression test suite was automatically run every night and the results were mailed back to the team leader. He in turn would redistribute any errors found to the members who were responsible for the maintenance of the code that had failed.

I find both approaches to testing reasonable. The final output of our team was to be technical reports and papers about debugging. The team was prototyping thus testing was minimised. On the other hand the SEPIA team had to eventually deliver a product. The rigorous testing procedure ensured that the constant updates and changes did not introduce more bugs than the ones they were removing.

6 Conclusions

In this final section I will try to give a list of software engineering and management related things that I learned during my industrial experience and a very brief assessment of the relevance of the undergraduate course to the problems faced during the industrial experience. Concluding, I have to say, that contrary to the impression one might get in the previous pages, the education provided in the undergaduate course at Imperial College proved to be extremely useful in tackling the varied problems that I had to face during my work in ECRC. Although at no point could I use a prepackaged solution from the course, the exposure to a very wide variety of methods, techniques and modes of thinking allowed me to adapt to the various problems and every time use the most appropriate paradigm or method.