A few years in the planning, with several false starts, the dream of a smarter computing environment is taking shape. Like many small businesses, ours has a central point of data that is crucial to the business’s survival. Financial data, correspondence, photos, the publications we generate, and contact databases are our office’s lifeblood, and reside on a shared drive on a Windows computer.
Backing up this drive was initially a nightmare. A consultant installed a tape drive with terrible software, and it was my job to change the tapes daily. This wouldn’t have been so bad, but restoring any file would take a matter of two hours — if I was lucky. I was having a 50% rate of recovery. I suppose that if I had spent another $500 of the company’s cash and gone to a training session on the software, I may have improved my efficiency somewhat.
A couple years later, I switched my PC to Ubuntu Linux. After some research, I settled on a backup tool called rsnapshot. Once I started using that software, my nightmare went away. I was able not only to back up our drive, but in a fashion that was automatic and very easy to live with. It not only stores the files, but also the changes to them, so I can easily go back to previous edits. The backups are on my hard disk, which I trust much more than tapes that stretch and wind and grind. Restoring is super-simple — I just go into the appropriate directory and copy a file, no need to run any special software to restore.
As time wore on, our computers started wearing out. A power supply here, a hard drive there, entropy was starting to catch up with us. Our boss was reluctant to shell out for new hardware, and I was reluctant to keep installing software onto new computers. I’d spend two days reinstalling Windows and all the software we use, only to wind up with an environment that wasn’t an exact clone of what the worker had before — so they’d spend at least a couple days worth of work getting used to it.
I started again to think about that shared drive, and realized that if the machine it was on went south, it would mean some hours of downtime to restore it. Many times in the office it would not affect our bottom line much; but there are hectic moment where downtime would be disastrous. At some point, and I’m not sure when, computers turned from an efficiency tool into a production mechanism. And we need a production-grade environment.
Doing some heavy research on disaster recovery and high availability scenarios, I found that small businesses are greatly under-served by commercial computing vendors. The marketing teams generate huge amounts of hype for bad products that don’t work correctly, or try to take a product made for larger business and shoehorn it into cheap hardware. They didn’t seem to take value into account at all.
Like many in my situation, I played with the idea of getting a big box and using RAID1 or RAID5; that is multiple drives that would allow for one to crash and still be able to work. It seemed workable to me, and then we lost a computer entirely to a power spike. Not only the power supply burned out, but the drive fried as well. A few weeks later, our huge web hosting provider went down. The reason? A RAID controller in one of their servers burned out, and they had to wait for delivery from the other side of the state to get back up and running. All the while, they had been touting the safety and redundancy of their service. I was disgusted.
It was apparent that we needed to not only protect the drive itself, but the whole environment. Single points of failure are not a permanent option for a business that provides my livelihood. Back to scouring the internet for solutions, I came across the Linux-HA project, which provides tools called drbd and heartbeat. These tools make two drives on two separate computers appear as one drive to the rest of the machines. If one of the computers goes bad, the other takes over in about 15 seconds. The only downtime is that the worker would probably have to restart their machine. This was what I was looking for!
After making a few sales pitches to my boss, he agreed to allow me to construct a new server setup. I purchased three new identical machines, with additional large hard drives and filled with 4 Gigabytes of RAM. So there they were, sitting on my newly-cleared-off workbench in the basement — and then I had to wait. Other projects screamed for my attention, and the staff was going through some changes. So there it sat for a full year almost untouched. I am very thankful for the patience my life experiences have trained me for; yet stuck in my mind was this project I needed to build and knew that distraction wouldn’t cut it.
Setting up my computer
Last week I finally was able to start on the project. Ubuntu Linux has progressed to the point where I will be able to have support for the operating system without massive changes for another four years. The hardware purchased has proven to be wonderful, as I had taken one of the boxes and used it to run two virtualized copies of Windows with fair success. And I had done plenty of research, so I understood to a management level what I wanted out of this — but now for the implementation.
The primary focus of this project is simplicity. Taking complex situations and boiling them down to their underlying substance is the kind of puzzle I thrive on. So what I want to do with this setup is to turn each piece into a appliance, which can be fixed and maintained with a minimum of instruction.
So I’ve started by turning my work computer into a “netboot installer”. This is probably the most magical thing I’ve ever seen a computer do in all the years I’ve worked with them. I take one of the new computers, plug it into the network, and reboot it. When it starts up, I press a key (the F8 key on these) and tell it to boot from the network. It finds my work computer, and automatically wipes out everything on the drive, installs the operating system (plus any additional software I want loaded), and reboots. When it’s done, it’s completely loaded with the most current versions of everything; I don’t even need to go through another long round of updates!
Setting up this kind of environment wasn’t simple, of course. There are four pieces of software that I had to install and configure the hell out of on my work computer: dhcpd3 (the network magic), tftpd-hpa (serves configuration files and kernel for the installation software), apt-cacher (temporarily store the operating system files) and apache2 (a web server for the operating system files).
So we take the new machine and plug it into the network. Start it up, and when it first beeps mash the F8 key and tell it to boot from the network card. It’s hands-off from then on if the configuration is right. The new machine looks for the DHCP server that’s broadcasting on a special address. When it finds it, it downloads the installer operating system and starts it up. This temporary installer then looks again at the DHCP server, and gets its first instructions from it based on the new machine’s network hardware address. It then downloads the “preseed”; a series of answers to the questions it normally asks during installation.
I have the preseed set up to reformat the hard drive and set up new partitions. It also does many other things automatically, like telling the computer to use English, a US keyboard, and set the time zone.
It then starts to download the packages of the operating system from the internet. Since this normally takes more than an hour for a server (even longer for a desktop), we have it download through apt-cacher. What this does is to automatically store each package locally on my work computer, so it won’t have to be downloaded again unless it gets a new update. Once you’ve done one install, subsequent installs take only a few minutes; I’m down to seven minutes for a subsequent server install, and I believe a desktop install (with graphics) will take about twenty based on a test I did last year. So far, I’ve only done the first desktop install on this new setup, as I realize this isn’t the current project — but I may need to use it for another computer at some point.
Once the new computer has been installed, it reboots itself and loads the new operating system from the hard drive. So if I had to talk the boss through re-installation of a computer over the phone, it would go like this:
- Plug the computer into the network jack.
- Start it up, and press F8 (several times if you want) at the first beep.
- Press the down arrow until you reach the menu item “NVIDIA Boot Age”
- Watch for half a minute to make sure nothing gets stuck. If it does within that time, restart.
- Go do something else for a while. If you hear another beep from the computer, it’s done and will show a login screen.
Well, the workbench is in the basement with the new computers on it, but my office is on the upstairs floor. The first day I was working, I was running up and down two flights of stairs to reboot, adding to the distraction of my co-worker at the front desk. Well, I did move the hardware upstairs on the second day, so that’s much better.
My next step was to check out the new operating system environment to see how the installation went. So I added a remote-control program called openssh-server to the setup. Now once I hear the second beep (much easier now that it’s upstairs) I can log into the new computer and check it out.
The installer configuration lets me do some of the customization I need to do the operating system, but not everything. I can add packages to the setup, like I did with openssh-server. But beyond some very simple options, it installs everything to the default, which in some cases means that a program I want started automatically is not set up to. Well, I need to customize, and with that comes a problem.
In the many years I’ve worked with computers, I’ve seen a lot of installations. The tech setting them up would load the operating system, then play around with the settings until they get it the way they want. Most times, they would only document what they believe wasn’t obvious, invariably leaving out an important detail or two. Subsequent installs meant doing the same thing, relying on notes or memory from previous times.
This is no way to ensure quality installation every time. But we have another option, which is to customize the software packages. With the high quality package management that Ubuntu inherited from the Debian project, I can make my own repository of customized packages that will seamlessly integrate with the setup. So one package at a time, I’ll unpack it, tweak it, package it back up, and test it. If I hose the whole computer doing so, oh well — it will cost me another twenty minutes (or less) to reinstall the whole machine. Since I have two machines right now to test the installation, I just work on the other one while the reinstall is going on.
Time to back up
As I was driving home after the second day, I was quite pleased I got the bugs out of the installer setup (there had been many, almost all my fault for not understanding the manuals) and was looking forward to starting the package customization portion. Then it dawned on my that my setup on my work computer wasn’t backed up! First thing the next workday, I installed a simple backup program. Now after I test changes to the setup, I can press a button and make a new backup of my installer setup. Whew!
The third day at this I needed to get some other work done, but I was able to research how to make a simple “software repository” for my customized packaging. It turns out that this is a deceptively simple thing to do, but there was very little available on the internet to point me to the right documentation; so the research took me some time that I didn’t expect.
Now I have two major tasks ahead of me, to set up the drbd and heartbeat on the two computers to test it out; and clone my installer setup to the new computers. These are both fairly major undertakings since it’s new-to-me software; but I believe using the methods I have so far will let me plug away at it easily while doing other work as well during the day. I’m also going to be installing two monitoring tools; a generalized package called “monit” that can test for all sorts of things, and a specialized “smartmon-tools” that will notify me of potential hard drive problems before anything actually goes bad with them.
I’m so looking forward to this work!