A grand computing scheme

The plan

A few years in the planning, with several false starts, the dream of a smarter computing environment is taking shape. Like many small businesses, ours has a central point of data that is crucial to the business’s survival. Financial data, correspondence, photos, the publications we generate, and contact databases are our office’s lifeblood, and reside on a shared drive on a Windows computer.

Backing up this drive was initially a nightmare. A consultant installed a tape drive with terrible software, and it was my job to change the tapes daily. This wouldn’t have been so bad, but restoring any file would take a matter of two hours — if I was lucky. I was having a 50% rate of recovery. I suppose that if I had spent another $500 of the company’s cash and gone to a training session on the software, I may have improved my efficiency somewhat.

A couple years later, I switched my PC to Ubuntu Linux. After some research, I settled on a backup tool called rsnapshot. Once I started using that software, my nightmare went away. I was able not only to back up our drive, but in a fashion that was automatic and very easy to live with. It not only stores the files, but also the changes to them, so I can easily go back to previous edits. The backups are on my hard disk, which I trust much more than tapes that stretch and wind and grind. Restoring is super-simple — I just go into the appropriate directory and copy a file, no need to run any special software to restore.

As time wore on, our computers started wearing out. A power supply here, a hard drive there, entropy was starting to catch up with us. Our boss was reluctant to shell out for new hardware, and I was reluctant to keep installing software onto new computers. I’d spend two days reinstalling Windows and all the software we use, only to wind up with an environment that wasn’t an exact clone of what the worker had before — so they’d spend at least a couple days worth of work getting used to it.

I started again to think about that shared drive, and realized that if the machine it was on went south, it would mean some hours of downtime to restore it. Many times in the office it would not affect our bottom line much; but there are hectic moment where downtime would be disastrous. At some point, and I’m not sure when, computers turned from an efficiency tool into a production mechanism. And we need a production-grade environment.

Doing some heavy research on disaster recovery and high availability scenarios, I found that small businesses are greatly under-served by commercial computing vendors. The marketing teams generate huge amounts of hype for bad products that don’t work correctly, or try to take a product made for larger business and shoehorn it into cheap hardware. They didn’t seem to take value into account at all.

Like many in my situation, I played with the idea of getting a big box and using RAID1 or RAID5; that is multiple drives that would allow for one to crash and still be able to work. It seemed workable to me, and then we lost a computer entirely to a power spike. Not only the power supply burned out, but the drive fried as well. A few weeks later, our huge web hosting provider went down. The reason? A RAID controller in one of their servers burned out, and they had to wait for delivery from the other side of the state to get back up and running. All the while, they had been touting the safety and redundancy of their service. I was disgusted.

It was apparent that we needed to not only protect the drive itself, but the whole environment. Single points of failure are not a permanent option for a business that provides my livelihood. Back to scouring the internet for solutions, I came across the Linux-HA project, which provides tools called drbd and heartbeat. These tools make two drives on two separate computers appear as one drive to the rest of the machines. If one of the computers goes bad, the other takes over in about 15 seconds. The only downtime is that the worker would probably have to restart their machine. This was what I was looking for!

After making a few sales pitches to my boss, he agreed to allow me to construct a new server setup. I purchased three new identical machines, with additional large hard drives and filled with 4 Gigabytes of RAM. So there they were, sitting on my newly-cleared-off workbench in the basement — and then I had to wait. Other projects screamed for my attention, and the staff was going through some changes. So there it sat for a full year almost untouched. I am very thankful for the patience my life experiences have trained me for; yet stuck in my mind was this project I needed to build and knew that distraction wouldn’t cut it.

Setting up my computer

Last week I finally was able to start on the project. Ubuntu Linux has progressed to the point where I will be able to have support for the operating system without massive changes for another four years. The hardware purchased has proven to be wonderful, as I had taken one of the boxes and used it to run two virtualized copies of Windows with fair success. And I had done plenty of research, so I understood to a management level what I wanted out of this — but now for the implementation.

The primary focus of this project is simplicity. Taking complex situations and boiling them down to their underlying substance is the kind of puzzle I thrive on. So what I want to do with this setup is to turn each piece into a appliance, which can be fixed and maintained with a minimum of instruction.

So I’ve started by turning my work computer into a “netboot installer”. This is probably the most magical thing I’ve ever seen a computer do in all the years I’ve worked with them. I take one of the new computers, plug it into the network, and reboot it. When it starts up, I press a key (the F8 key on these) and tell it to boot from the network. It finds my work computer, and automatically wipes out everything on the drive, installs the operating system (plus any additional software I want loaded), and reboots. When it’s done, it’s completely loaded with the most current versions of everything; I don’t even need to go through another long round of updates!

Setting up this kind of environment wasn’t simple, of course. There are four pieces of software that I had to install and configure the hell out of on my work computer: dhcpd3 (the network magic), tftpd-hpa (serves configuration files and kernel for the installation software), apt-cacher (temporarily store the operating system files) and apache2 (a web server for the operating system files).

So we take the new machine and plug it into the network. Start it up, and when it first beeps mash the F8 key and tell it to boot from the network card. It’s hands-off from then on if the configuration is right. The new machine looks for the DHCP server that’s broadcasting on a special address. When it finds it, it downloads the installer operating system and starts it up. This temporary installer then looks again at the DHCP server, and gets its first instructions from it based on the new machine’s network hardware address. It then downloads the “preseed”; a series of answers to the questions it normally asks during installation.

I have the preseed set up to reformat the hard drive and set up new partitions. It also does many other things automatically, like telling the computer to use English, a US keyboard, and set the time zone.

It then starts to download the packages of the operating system from the internet. Since this normally takes more than an hour for a server (even longer for a desktop), we have it download through apt-cacher. What this does is to automatically store each package locally on my work computer, so it won’t have to be downloaded again unless it gets a new update. Once you’ve done one install, subsequent installs take only a few minutes; I’m down to seven minutes for a subsequent server install, and I believe a desktop install (with graphics) will take about twenty based on a test I did last year. So far, I’ve only done the first desktop install on this new setup, as I realize this isn’t the current project — but I may need to use it for another computer at some point.

Once the new computer has been installed, it reboots itself and loads the new operating system from the hard drive. So if I had to talk the boss through re-installation of a computer over the phone, it would go like this:

  1. Plug the computer into the network jack.
  2. Start it up, and press F8 (several times if you want) at the first beep.
  3. Press the down arrow until you reach the menu item “NVIDIA Boot Age”
  4. Watch for half a minute to make sure nothing gets stuck. If it does within that time, restart.
  5. Go do something else for a while. If you hear another beep from the computer, it’s done and will show a login screen.


Well, the workbench is in the basement with the new computers on it, but my office is on the upstairs floor. The first day I was working, I was running up and down two flights of stairs to reboot, adding to the distraction of my co-worker at the front desk. Well, I did move the hardware upstairs on the second day, so that’s much better.

My next step was to check out the new operating system environment to see how the installation went. So I added a remote-control program called openssh-server to the setup. Now once I hear the second beep (much easier now that it’s upstairs) I can log into the new computer and check it out.

The installer configuration lets me do some of the customization I need to do the operating system, but not everything. I can add packages to the setup, like I did with openssh-server. But beyond some very simple options, it installs everything to the default, which in some cases means that a program I want started automatically is not set up to. Well, I need to customize, and with that comes a problem.

In the many years I’ve worked with computers, I’ve seen a lot of installations. The tech setting them up would load the operating system, then play around with the settings until they get it the way they want. Most times, they would only document what they believe wasn’t obvious, invariably leaving out an important detail or two. Subsequent installs meant doing the same thing, relying on notes or memory from previous times.

This is no way to ensure quality installation every time. But we have another option, which is to customize the software packages. With the high quality package management that Ubuntu inherited from the Debian project, I can make my own repository of customized packages that will seamlessly integrate with the setup. So one package at a time, I’ll unpack it, tweak it, package it back up, and test it. If I hose the whole computer doing so, oh well — it will cost me another twenty minutes (or less) to reinstall the whole machine. Since I have two machines right now to test the installation, I just work on the other one while the reinstall is going on.

Time to back up

As I was driving home after the second day, I was quite pleased I got the bugs out of the installer setup (there had been many, almost all my fault for not understanding the manuals) and was looking forward to starting the package customization portion. Then it dawned on my that my setup on my work computer wasn’t backed up! First thing the next workday, I installed a simple backup program. Now after I test changes to the setup, I can press a button and make a new backup of my installer setup. Whew!

The third day at this I needed to get some other work done, but I was able to research how to make a simple “software repository” for my customized packaging. It turns out that this is a deceptively simple thing to do, but there was very little available on the internet to point me to the right documentation; so the research took me some time that I didn’t expect.

Next steps

Now I have two major tasks ahead of me, to set up the drbd and heartbeat on the two computers to test it out; and clone my installer setup to the new computers. These are both fairly major undertakings since it’s new-to-me software; but I believe using the methods I have so far will let me plug away at it easily while doing other work as well during the day. I’m also going to be installing two monitoring tools; a generalized package called “monit” that can test for all sorts of things, and a specialized “smartmon-tools” that will notify me of potential hard drive problems before anything actually goes bad with them.

I’m so looking forward to this work!


Almost everthing has purpose, including Microsoft Windows

I work with two computer operating systems every day, Ubuntu Linux and Microsoft Windows. I post new things I find on web forums, both problems I’m having and tips for others. Whenever I get help from someone for my problems, I try to help at least one other. Once in a while, I hang out in IRC channels for a quicker fix of my community addiction.

A good number of people I give help and get help from aren’t blinded by people who detract from what others like to use for an operating system. It’s difficult for us practical people though, because there are many others out there that shout to promote their favourite toys and disparage those who use other, competing wares. It seems that it should be hard to tune out this noise, yet an interesting article will have huge strings of posts fuelled by these “fanbois”; making the few interesting comments that practical people offer difficult to find.

In my view, an operating system is only a means to an end — tools to manipulate some bits into, hopefully, doing what you want. At this point, and for what I can see will be for the next few years, there is no one affordable operating system that can do everything a person needs. (A note to Mac fanbois, your financial priorities are much different than mine.) I use Ubuntu exclusively at home, and for the majority of my work. However, there are a few Windows programs that I can’t do without, such as Quickbooks, MS Access, MS Excel, and Irfanview. Coworkers that I support also have a few other programs that they wouldn’t want to swap out; not being wildly computer-oriented they also dislike changes in the what they see on the screen.

Microsoft Windows (and by that I mean the affordable XP Home, XP Pro, Vista Basic) is best for:

  • running software made for Microsoft Windows;
  • enabling the use of hardware devices that have only been on the market for a few months;

but it doesn’t:

  • give a simple way to seamlessly export its windows to other machines*;
  • provide a set of signed repositories that includes most software a person needs;
  • allow a technician to update or configure the software without a big production of user distraction and downtime;
  • stay easy for a person to keep updated and secure;
  • make it simple and frugal to use in a multi-user networked environment.

So what I envision as a workable solution is a combination of machines that offer the best of both operating systems. In an office or large household, it’s quite normal for there to be several machines; and that can be a huge advantage.

Newer machines with lots of RAM can be run with virtualization software (KVM, Virtualbox, VMWare). This allows for multiple operating systems (aka “virtual machines”) to be run on one piece of hardware. For these machines, I’ve been suggesting using AMD over Intel, as it’s simpler to know in advance of purchase that vitualization is fully supported by the hardware. However, I believe most newer computers have no problems doing this or need very quick adjustments to the system.

Older machines, regardless of the amount of RAM, can be used as thin clients to the larger machine. I’m appalled by the amount of computers people discard as junk that can be used for this purpose. Also, I’ve set up and used some very nice new specialized thin client boxes that use little electricity and are super-nice to the wallet.

A combination of those machines takes care of the first requirement, running software made for Microsoft Windows.

Older machines that have a good amount of RAM can be used to meet the second requirement, enabling the use of hardware devices that have only been on the market for a few months.

Read the rest of this entry »

Teaching a machine

A relationship with the truth



Once you come to the realization that you’ve committed a falsehood into your knowledge repository, and your basis for an assertion is junk, then it would be very helpful for you to know the source of that basis. If one thing is wrong, why would not other assertions from the same place?

When I think back
On all the crap I learned in high school
It’s a wonder
I can think at all — Rhymin’ Simon

Good enough for this purpose

Creeping Doubt


The Smallest Seeds of Knowledge

The Polluted Namespace of Database Applications

Wikipedia will tell you about relational databases with terminology that will confuse any person used to the English language. Drilling down to the source of the term “relational”, we find in the article Relational model that

Relation” is a mathematical term for “table”, and thus “relational” roughly means “based on tables”.

You can keep drilling down through the articles until you get into set theory and start approaching the human usage of the word “relation”, which I understand to be: A logical or natural association between two or more things; relevance of one to another. (From relation.)

I don’t want to knock the work of Dr. E.F. Codd; his invention of the relational database has helped support my livelihood for half my lifetime. All I take issue with is the terminology, and understandably when the terms were coined, there wasn’t the familiarity with how this invention would be used that would lend it to better wordage.

More Questions than Answers

Anyone who has used a table of data knows that there’s a lot of knowledge being presented in a small space. The very simple table below seems at first glance to show the most basic of relationship representation:

   id    name     team
   1     Amy      Blues
   2     Bob      Reds
   3     Chuck    Blues
   4     Dick     Blues
   5     Ethel    Reds

But the apparent simplicity actually assumes a greater knowledge. Boiling it down to pieces, we find that:

  • id is probably a unique integer the database uses to keep track of the link between name and team
  • name is the first name of a person
  • team is the name of a group of people

And so when we use the table, we have lots of questions.

  • Can a person be on more than one team?
  • Can a team exist if there are no people assigned to it?
  • What do we do if there is more than one person with the same first name? Do we put the last name in the name field, or do we make a separate field for a last name? And if there are two people with the same first and last name, what do we do then? If a team changes its name, how does that change the assignments?
  • Who is recorded this information? Did they spell everyone’s name right? Did they make any mistakes as to which team the people were assigned?
  • Who actually assigned these people to a team? What was their basis for assigning these people to one team over another?
  • Are there constraints to how many people are to be on a team? Do you have a minimum number in order to play? Do you have a maximum to play at a time? What do you do with additional people?

The Start of a Good Relationship

One thing that’s obvious to start with is that there are two teams. Those teams could, and might very well want to, change their names to something more sporty — lets say the Lobsters and Clams. So there is a relationship here between the name of the team and the team itself.

Unique Identifiers – Part 3, Values

So far in this series, I’ve been describing a scheme for Unique ID’s. We’ve gone through the mental gyrations of what the parts of an ID should consist of, but haven’t come to a conclusion yet of the final format. As you’ve seen by now, the more characters we have available, the more comprehensive our ID format can be. With registration, we can also cut down the size to a short string.

Before we cut it down to it’s minimum, though, let’s look at values. I’d like to format to not only be able to store unique ID’s, but also values so that we can map relationships against a number. Once we find a minimal length for useful numbers, then we can determine how to wedge our ID scenario into that space.

Through the many years I’ve been crunching number, I’ve been bitten so many times by software, especially Microsoft Excel, using floating point values. For the unaware, when a program stores a value, it has a choice — it can either store it as an integer, or use much less space by storing an approximation. The software very seldom tells you this, and if you rely on your numbers, your computer lies to you. I just can’t describe how wrong that situation is. By the time you notice the tiny bit off you are, you’re already layers deep in hundreds of calculations. Good accounting programs store numbers as integers, and even in MS Access currency is normally stored as an integer with four decimal places.

Hey, I don’t mind if a computer program can’t store pi accurately, but I want it to tell me that what it displays is an approximation if I need to know that for my purposes. So if it’s an approximation, how was it done? Was it truncated, rounded up, rounded down, an exact half? If a computer makes a mess of things on my behalf, I want to know about it.

When computers give you a number, such as 5,000, there is no information as to how many significant digits there are in that number. If this represents 5 boxes of 1000 paper clips, you can be pretty sure that each box is almost exact, maybe have one or two extra clips in it. But I could have derived this from the number of feet I guessed was equal to one inch I measured on a map. If I make a mess of things, I want to be able to document it.

If I know that a value is wonky, I can, if needed, use techniques such as significance arithmetic to make known how much an end result is off. It makes a big difference if the basis of the number is measured or counted.

Now we can start to enumerate the relationship of a value to a stored number in a computer: the basis of how it was derived, the number of significant digits, the accuracy of the final digit, it’s sign, and exponent.┬áSome of these can be encapsulated in the way most computers store a floating point number; sign, exponent, and fraction are encapsulated in 32 or 64 bits of binary data.

Unique Identifiers – Part 2, About Time

Time pervades all things. I remember in my first Geometry class, the teacher talked about three dimensions, and time being the fourth. This concept bothered me for months, it just didn’t seem right to me. A line, for instance, cannot be if it doesn’t exist for a period of time. And when old teach was trying to describe to us what a point was, the geometric concept of it, my confusion cleared up. A point, he said, had no length, breadth, or height, it was only a point. At which one of my little imaginary friends whispered in my ear, “But it has time. It is a point in time.”

I digress from the subject at hand to make my own point. Mankind’s understanding of the true nature of time is weak. We may think we can go back and forth in time if we can just find a De Lorean and build a flux capacitor out of junk parts in the garage. Or we can accept Stephen King’s explanation of the Langoliers, that eat away the past after the present is gone.

In my previous missive on Unique ID’s, I went through all the mental hoops I could to describe how an ID generator could be itself identified by thread, process, computer, and public parts. Now, we’re only required a series of numbers to finish off the Unique ID. That could be as simple as a counting loop within the thread of execution, and we might want to keep that in mind.

But usually, computer programs have a very simple way to get a time value, and that can come to our aid. At the risk of defying the You Ain’t Gonna Need It (YAGNI) concept, we’ll go on to figure out how we can incorporate time into our Unique ID’s, or at least leave room for implentation if we want.

Suppose we have gone through all the hoops to figure out “the best thing to do” for our computer process, and we have two or more items of equal weight (whatever we’ve defined that to be) in contention to be the winner. We could pick a random choice, or as random as our machine will give us, and hope that luck will lead us clearly to our goals. But if we keep arriving at this same decision, we could end up that a value is never chosen, that our luck has gone sour and karma has caught up with us. So, one way out of this mess is to choose the oldest value.

Having a timestamp at this point in the consideration process is extremely handy, in that we don’t need to implement another decision level. We just pick the ID with the smallest time value. Next time we’re at this juncture, our old value will have been gone, and we’ll pick the oldest ID now available.

One of the downsides of doing this is that we’re now shifting our decision process from a comparison of whole ID’s to a relation of embedded values. That will add additional complexity to our program thread, and we’ll only be able to determine its effectiveness when we finally get to implementation.

If we look back at the original spec for UUID’s, we see that time is captured at 100 nanosecond intervals, or 10 million of these values per second is our granularity. When programming, though, we’re very lucky if we can get anything better than granularity to the second without using an external library or weird procedure. I’ll suggest here, then, we keep our programming simple and call seconds as “close enough” for our purpose. This will probably bite me in the butt later as well, because YAGNI.

So, let’s look at the value that gets returned when we ask for a POSIX time value. It is “the number of seconds elapsed since midnight UTC of January 1, 1970, not counting leap seconds.” If we store this value in 32 bits of information, we’ll run out of space on January 19th, 2038. But this doesn’t bother us, as we’re adamant to store our ID in base 32, and that will get us through the rest of this millennium with seven characters.

Something cropped up when I was originally researching UUID’s for this project a few years back, what do you do when someone sets the clock backwards? Well, we could just ignore it if we trusted our system to check for duplicate values. That’s not in my nature, though, as we should try to do always be doing right thing when dealing with lower level functionality.

The UUID spec says they would use a Clock ID set at a random value, then increment the ID by one if a reset was detected. Well, that’s fine, it keeps the ID unique. But it doesn’t help with the sorting, because you’d have to intersperse the clock ID into the right spot in the series somehow. Thankfully, these resets are few and far between, and hopefully the clock resets on a fairly regular basis.

That just about wipes me out as far as using time as part of our Unique ID scheme. I ain’t gonna use it, mostly because of the shifting context problem. We’ll think about keeping space for it in our final version, as we can capture it with eight characters.

Unique Identifiers – Part 1, Identifying the Generator

Is a UUID really unique? This post seems to think there are problems. http://blog.joeware.net/2005/06/19/42/

There are several types of UUID’s, the most common being (supposedly) randomly generated, and one based on time and MAC addresses. There are problems with each of these.

The above link has good opinions on the fallacy of randomly generated addresses. Besides bad implementations, there is also a great reason for not using them — they are so damn unique, there is no way to look them up unless you have a database of all which are in use. You can’t specify ranges where you can divide the database into segments.

MAC addresses have nothing to do with Apple Macintosh computers, although the computers have MAC addresses in them. MAC addresses are unique identifiers brazed in read-only on your computer’s network card. Ranges of MAC addresses are doled out by a central authority to network card manufacturers.

There is something obviously wrong with that scheme in that we’re expecting the network card manufacturers to police themselves to ensure that addresses aren’t duplicated. When you pay sixteen dollars for a piece of hardware, made in the poorest manufacturing environment in the world, do you really feel that good that the manufacturer has paid their dues and been in compliance with this directive, that isn’t even backed by law? OK, but just be sure you take that painted wooden play block out of the baby’s mouth.

If that wasn’t enough, there’s a real possibility that someone has set up duplicate MAC addresses on their LAN intentionally. Some ISP’s (Internet Service Providers) and wireless configurations use MAC addresses to allow access to the next network over. Of course, you and I wouldn’t come up with a scheme like that since it relies only on the obscurity of the address, and not a shared secret or hashed password. But that’s the nature of the market.

So what happens is that the customer registers his MAC address with the ISP, and is now allowed into the network as long as he’s using that ethernet card. Later on, the customer trades in his computer, or replaces the ethernet card that’s gone bum, or sticks a router in between his computer and the internet. Whatever he’s done, he needs to change is MAC address; either register his new one with the ISP, or change the new hardware back to the old MAC address. Given the choice, he will probably not want to spend quality time listening to muzak while waiting for customer service to answer the phone and then find someone at the ISP that has a clue.

So, especially with the “insert a router in the mix” scenario, there’s a good possibility that there is a duplicate MAC address on the LAN. Intentionally done, with good intentions. The intentions don’t even have to be good, as one could use this scheme to leech access from the ISP from multiple locations.

Finally, MAC addresses are a lousy way to do generate unique ID’s that represent real-world data. That’s because in order to look up an unknown piece of information, you need a map of MAC addresses to computers. When collaborating with others outside your LAN, you’d have to replace your UUID’s to some central MAC address.

You also have a problem within one computer itself. You may want multiple programs to be able to generate unique ID’s, but there is only one MAC address per computer. So, you would have to have some locking mechanism so that more than one process can’t have access to the ID generator at a time. This scenario is even specified in the UUID specification document, www.opengroup.org/dce/info/draft-leachuuids-guids-01.txt

Providing a system-wide lock makes programming a tool much harder; harder than it has to be. At a particular time, a process has a unique process id, which the computer uses to keep track and schedule each thing that’s going on. Well, with Linux 2.6 (the current version), there are a maximum of 1,073,741,824 process id’s. Encoding that many numbers in base32 takes six characters. And in the future, Linus and crew might up that number — you would think a million things going on with a computer at once would be enough for anybody!

So it’s probable that when implementing a unique ID with a process ID incorporated, we’ll need to cut this number down to size. That would mean some extra programming in order to have a registry of processes allowed to give out an ID. So we’re back to a locking mechanism, but arguably more effective, since you only need to register the process one time, and provide some way to clean up the mess after you’re done.

But let’s say we didn’t want to make a registry, and just confine our ID’s to the computer we’re using right now. After all, we don’t need to come up with an ID for the computer until we’re ready to share the information outside of it, right? So we can flag the ID with a code that says we’re only going to have it valid on the current computer, then list the process ID after it. We could even have a multithreaded program be able to give a short ID to each thread; creating a in-memory mapping within a single program would be fairly easy, you’d just map the thread id when creating the thread, if mapping is really needed.

Now, if we’re not using MAC addresses as the primary way to identify the computer, what’s the best way to go about it? More registries, please! We’ll want to make some kind of global registry so that an unknown value can be looked up, and if the author wants to share that information, then you’d be able to get its definition.

At the public side, we may want to consider a range of five characters, which our base32 converter tells us will provide over 35 million identifiers. So that’s how many public domains would be available, and is still in the realm of a Sqlite database lookup without making a small computer go dizzy. Five’s a good number, since a sequence of characters can be read off over the phone and copied down without the recipient losing track of the characters spoken.

Now we have to have some layers of redirection from the public identifier to the individual computer. For one, this gives us some sense of privacy. But even more important, this allows a large organization to give a level of autonomy to its divisions.

Two base32 characters provide up to 1024 values, while one character is 32. So with five more characters, we can allocate one character to the process id registration (32 processes per computer), 1024 computers per division, then 1024 divisions per public identifier. Now, I’m curious if that would be a good spread. What do you think?

Another way would be to have 1024 process id’s per computer, using two characters, then let the public ID have 32,768 computer id’s to dole out as they see fit. Any more than that, they’d just need to get another public ID. Now I feel better about using that spread.

If we wanted to go whole hog, then five characters would be enough to directly convert internal IP addresses to computer id’s, no matter if they used the 10.x.x.x, 172.x.x.x or 192.161.x.x scheme, or even a combination of those. But now that we’re at our five-character maximum, we have to allocate even more characters for our process id. And if the company started using sub-sub-nets, which may be common with virtualized OS’s, internal IP addresses can very possibly be duplicated, as I’ve seen with VirtualBox.

For giggles, the full range of IPv4 address space comes to 4000000 in base 32. Some expect exhaustions of address ranges as early as two years from now; but I just don’t think it’ll happen for a while. But if the world moves to IPv6, that’s over ten with 38 zeroes after it — way to big for us to make a usable ID format out of anyway.

Well, I think that about covers all the scenarios we’d want to identify who generated the ID. Note that I am not making any effort to conceal privacy, if one wants to keep the data anonymous then they will have to arrange for an anonymous computer to do their dirty work. I would rather forsake simple anonymity than to make our ID’s so that they are incompatible with any type of accountancy.

Now we need a series of unique numbers for our ID generator to use. Which will bring me to my next post, about the strange concept we call time ….

Generic Program

The ultimate flexible program starts out totally ignorant. It uses a database to determine “the best thing to do at a particular time.”

The database returns: what to do (action), and how to do it (method).

How to do it, also know as the method:

  • Do it and report success or failure.
  • Do it and report what happened at each step.
  • Do it and report how long it took, along with success or failure.

Success or Failure reports include:

  • I don’t know how to do that action.
  • I don’t know the method in which you want me to do that.
  • I don’t know how to do that action, nor do I know the method in which you want me to do that.
  • I tried to do that action, but it was unsuccessful. The method was successful.
  • I tried to do that action using the method you wanted me to, but the method was unsuccessful.
  • I tried to do that action using that method, but something went wrong. I don’t know if it was the action, or the method used.
  • I did it. Success.

Software Limitations

Over the many years I’ve been using computer programs, I’ve been disheartened by how limiting most databases are. A programmer’s job is to make a useful tool, and if that tool is supposed to model the physical world, it needs to be able to capture more than just enough to supposedly do the work it set out. Even if the tool doesn’t use the data, the user always has a need to record their thoughts as they are working on something, and it’s not very expensive to at least allow a “comments” area for anything they may happen to dream up while using the tool.

It mounts the frustration of people using a program to find its inadequate for their needs after they’ve already invested a load of data entry into it. We’ll use a contact database for an example. Let’s say that one of the co-workers is going on vacation, and some new vacation contact information will only be good for a set duration, perhaps a phone number they use. What would be wonderful is if they wanted to call someone during that period the database would give the vacation number, but after the vacation’s over it would revert back to the old number. But if putting in that functionality is so difficult, why not give the database the tools to capture a message, and possibly incorporate an email reminder system?

As needs get more complex, we wind up with multiple records of the same thing across many databases. A person’s records in our office is in our contacts database, and in our accounting systems (up to three times, if they have multiple relationships to the business, such as customer, vendor, and employee). They might also be a member of one of the organizations we do staff support for, hence a record in yet another database that provides functionality our accounting system doesn’t do. When that person calls in to be helped, there are far too many places that we have to look to retrieve information on them, and it is a burden to update if they have a change.

The solution is to recognize that these databases don’t have a record of the person themselves; they have a record of the relationship between that person and “yourself”, as defined by the program. In the physical plane, your needs are greater than one programmer’s vision of what their tool is going to provide. They don’t recognize that, and ego plays a lot in it — they feel that somehow their tool is going to give you everything you need to manage what you have going on. Yet most all programs have very little capture of your own information, for some reason they objectify you as the luser, and some even punish you for all the trauma that programming for your needs has cost them as a programmer.

Programming tools

I’m trying to come up with the ultimate simple programming environment. I’d like to be able to support most any type of user interface, such as CLI (command line), ncurses (maybe via dialog), zenity, tk (I think it’s ugly, but might make a good fallback) and wxwidgets. And I’d like to be able to daemonize it simply and use it as a web server, or to be able to called by one.

I’ve been debating TCL vs. Python. Python is very nice, but it seems that I would have trouble making small executables with it. One of my main goals is for the deliverables to remain small as possible. A program call freewrap is advertised as making small executables, for which I could make TCL programs, and (optionally) a TK interface. Unfortunately, it seems freewrap doesn’t appear to work with the newest 8.5 Tcl/Tk, so therefore ugly GUI without Tk’s new theming. But maybe it’s still possible, or will be by the time I need it.

Since I want to make a generic view (from the Model, View, Controller pattern), I might still be able to use wxwidgets (and probably Python) if I want to deliver a beautiful interface — at the expense of deliverables size. How I’d wrap this into an installable deliverable I don’t know yet. That’s later, anyway, as well.

Sqlite and Tcl seem to be a very nice fit for what I want. D. Richard Hipp has some wonderful ideas about using Sqlite for a generic storage backend that would replace use of arrays; becoming part of the code itself. This goes along with my philosophy for a super-generalized computer program.

The program itself would know very little, not even its own name. It’s function is to run a loop: What’s the best thing to do, then do it. It would provide error messages if it couldn’t initialize. And that’s about it. The database would load code as needed.

I spent a good part of this morning reviewing Expect, as it has a signal handling functionality. It’s important that anything I write (and should be anything I do) maintain its cleanliness. So if there’s something dirty that should be cleaned up before exiting, Expect can trap SIGTERM (hopefully on Windows too) and run a cleanup function. It looks now as if there’s nothing dirty about opening the database, even with a table that’s only in memory. If so, I could load the cleanup functionality from the database, and keep it very simple.

Data would be stored as either a unique ID or as a value. From the past research I’ve done (which I seem to have misplaced at the moment), I came up with a scheme using 32 numbers and letters, which I called ZILO encoding. I named it that, since those would be the letters I would omit, since they look like other letters or numbers. Using 128 bits plus a few more, I’d end up with (if I remember right) 30 digits that could be handwritten and reread without error. This would store a GUID, a big integer, or just raw data.

Part of the ZILO encoding is the ability to sort data, which the default GUID format doesn’t lend itself to. Important data that’s part of the GUID should be available: where the ID came from, and what time it was when formed. Note I don’t subscribe to random GUID’s as MS devs do, as they are not truly unique.

The basic table format I described came in three parts: an ID, a subject, and an object. A representation of any unique or commodity object would get a GUID. Values stored in big integers would play a big part. Every object would be a class, an individual being a class with one object.

Subsequent entries would compound upon another, and a freeform database (within a database) would emerge. Every assumption made would be linked to a basis, which can be revised. Revisions would stay within the database, and bogus assumptions would be culled by a garbage collection routine. So there are no Updates or Deletes involved, just Inserts and Selects. The most valid truth would just come from the relationship which has the most recent good basis.

« Older entries Newer entries »