System administration stories: The Revolt

 

Can a small embedded system the size of a paperback lead a group of machines into revolt? Apparently yes.

This week the power company (DEI/PPC) graced us with a power failure

Sep  3 02:53:19 spiti upsmon[224]: UPS spiti-ups@localhost on battery
which lasted more than the UPS batteries could hold
Sep  3 03:20:07 spiti upsmon[224]: UPS spiti-ups@localhost battery is critical
so the main system providing DHCP, DNS, mail, and bootp services was shut down
Sep  3 03:20:07 spiti upsmon[224]: Executing automatic power-fail shutdown
until power came back, almost an hour later
Sep  3 03:42:48 spiti /kernel: FreeBSD 4.10-STABLE #4: Tue Aug 31 02:41:28 EEST

In the morning I found that all diskless machines (the DNARD Shark and a 133MHz Pentium MP3 player), the wavelan bridge, and the SpeedTouch ADSL router had disapeared from the network. The link lights on the hub were lit, but would not respond to pings. Thinking the hub failed I started patching them together, to no avail. I suspected a bad network card on the server, but the same problem occurred pinging from other machines as well.

Suddenly the solution dawned on me like a flash. The ADSL router contains an embedded DHCP server, which, helpfully, is automatically disabled if it finds another one on the network. When the power came up, the ADSL router was running long before the normal server had a chance to boot. Its DHCP server started distributing IP addresses to the diskless machines from its own 10.0.0.* pool. The server, having a 192.168.* address was thus unable to reach the revolting group. Rebooting each and every diskless machine solved the problem.

You might ask, why was the router configured as a DHCP server? There is an interesting and simple answer to that. When I installed the SpeedTouch 530 router and tried to disable the built-in DHCP server I found that the corresponding command

dhcp server config state=disabled
would crash the router. I left it at that, believing that the router's auto DHCP server enable was adequate, but apparently it isn't. This is the second bug I hit on this router within less than a month.

Following this incident and unsuccessful attempts to get support from Thomson and the local PTT (OTE) that sold me the device I was able to setup a workaround by adding the following lines in a configuration file I uploaded:

[ dhcp.ini ]
config autodhcp=on scantime=10 state=disabled trace=off

Comments   Toot! Share


Last modified: Saturday, September 11, 2004 10:47 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.