Mistweb Ltd - Internet software

The planks of modern software development

Of course everything I say in these pages reflects my own opinions. As soon as you express an opinion you invite others to disagree, but that's life. These pieces are probably among the more debatable, so if you find these comments helpful, that's fine; if not, well at least I tried to stimulate some discussion.

Domain-driven design

There are two books that taught me about this essential: Object Design - Roles, Responsibilities, and Collaborations by Rebecca Wirfs-Brock and Alan McKean, Addison Wesley, 2003 and Domain-Driven Design - Tackling Complexity in the Heart of Software by Eric Evans, Addison Wesley, 2004. The subtitle of the second book gives the clue as to why this is so important: developing modern software systems is hard. The first steps in digital computing were taken by people like Alan Turing and John von Neumann. These people were geniuses. For many years the main achievements in software systems were driven by a small group of very clever people indeed. In the modern world, software systems are more complex than ever before and we are not geniuses - far from it, software has to be developed, extended and maintained by ordinary people with ordinary mental abilities and capacities. How can this be possible?

It is vitally important to keep all software absolutely as simple as possible. If it is complex, convoluted or intricate on the day it was written, it will, most likely become impossible within a very short time after that. A few months later, even the person who wrote it probably won't be able to find there way around it. At that point, no-one will really want to make any changes or even touch it for fear that, once broken, it will never work again. That software is worthless in the commercial world, and so writing it in the first place was largely a waste of time and money.

When I first was learning to write code in Java (around JDK version 1.1.6), I was able to look at some of the source code of the JDK. I remember looking a dozens of simple methods with one- and two-line implementations that obviously did exactly what the method-name described. I was thinking, 'Yes, but where's the hard stuff? Where's the magic?' Look as I may the only complex-looking bits I could find were wrapped in comments like '//Fix for bug in Windows 98...', which made me wince 'You poor devils'. It has taken me years to realize that I was looking at the the best quality code I was to see for a long time: nothing complex, everything doing exactly what it said on the tin, in the simplest possible way, and yet adding up to complex and sophisticated toolkits like the AWT or Swing.

There have been a series of essential advances in software technology that have enabled us to reach the point we are at today. After the freedom to name all variables with descriptive names and the introduction of control structure like loops and IFs, the next main one was the introduction of top-level procedures and functions. These enable us to untangle the earlier webs of jumps and GOTOs that previously turned anything more than a few dozen lines long into a complex jumble. They are still useful and should not be overlooked: if any method has so many bits that it is not visible in one screen and is not comprehensible in one thought, then an 'extract method' refactor is probably called for (see Refactoring - Improving the Design of Existing Code by Martin Fowler, Addison Wesley 1999).

Following this, the next most important step was the introduction of Object Oriented design. The concept of grouping functions and procedures into modules or libraries had already come along, but OO design is fundamentally more than this, although many developers still seem to miss the point. An object or class called 'SqlHelper' or 'DisplayManager' is entirely useless from a design perspective - these names describe what the code inside them does, such as helping you with your SQL database access, or managing what happens on the display. These are modules of procedural code, and this is not the 1980s.

The point about OO design is that we use software objects to model the problem domain - the business or technical endeavour that we are trying to engage with. If we are creating a system that deals in messaging, then we must have a Message class. If there are a few types of message, then we might have sub-classes of Message with names like SecureMessage, ChangeOfAddressMessage etc. These model the problem domain. If there is a SQL database that is going to store the messages, then we might have a class called Database or MessageDatabase.

The next step is to add the data and the methods to these classes of objects. A Message might have a Subject, a Body, a Recipient and a Sender, probably a SentDate etc. Another thing that the developer can do entirely to wreck good domain-driven OO design would be to make all of these data members entirely public and write a MessageSender or MessageManager class that has total control of filling the class with data, altering values as it sees fit and then passing the thing to the DatabaseManager to save it. Object-oriented design begins to earn its keep in the battle to keep code simple when we introduce 'encapsulation'. This means exposing only a few useful-sounding methods that do things that are directly relevant to building a software model of the problem domain. In the case of a Message, these might include a constructor that takes two parameters: a reference to a previously-set-up Database object, and a message's primary key. Using these two, the Message can ask the Database to retrieve all the relevant data for itself to come into the world fully ready to takes it place in a community of other useful, intelligent and fully-formed objects. A Save() method might allow the Message to look at its own internal state, decide if there have been any changes made that warrant a database call, then, depending on whether it already has been saved in the past, give that same Database reference enough information, in its own terms, either to perform a SQL INSERT or an UPDATE on its behalf.

The most important words so far were, to come into the world fully ready to takes it place in a community of other useful, intelligent and fully-formed objects. This, to me is the essence of good domain-driven OO design - each object has all the built-in data and know-how to perform its own part of the whole. The intelligence is equally split among all the objects and all their methods. Then another whole load of extra intelligence just appears (or maybe emerges) just in the way that the objects naturally interact with each other to get each job done. There are no complex controller classes, and no massive management modules, but all the cooperating entities get the job done between them. No single method is more than a few lines long, and each just does one job, and does it very well, without side-effects.

I have a number of stories in my head that help me with all this. I read a book many years ago called The Planiverse, a novel by A. K. Dewdney, 1984. The basic idea was that, in the flat two-dimensional land represented by a computer screen, there were two-dimensional creatures going about their lives in real time. In a modern idiom, you could effectively 'right-click' on any of these objects and ask it questions like, 'What are you doing?'. 'Looking for food', it might reply, or 'Resting'. No-one tells these things what to do next, they have timers and state-of-health variables, and they decide what to do next in a way that models the way a real animal might. That's what our objects should be like - they find themselves in their usual community, various stimuli such as user mouse-clicks and key-presses come their way and they all run around doing just what we've programmed them to do in a fairly orderly way until the job is done.

Another is a simulation I once saw of cars on a small grid of roads. I think it was a Java Applet, it may have been someone's A-level project. Each car is an instance of the same Car class and it has a small set of rules about driving down the correct side of the road, being careful at junctions and avoiding anything that comes near. You set them running and, lo and behold, they all behave nicely and drive around looking for all the world as if they know where they're going. Each one applies the rules, makes random turns at the junctions when they are clear, and they all get along fine.

The last one for now is swarm computing. This is a current research area that is related to the fact that Moore's Law seems to have broken down with regard to computer clock speeds - if we are to continue to see on-going increases in computer hardware power, it will be by multiplying the number of processors we use rather than their speed. We have seen computers go from 1 MHz to 3 GHz clocks in the course of about 20 years (a 3,000-fold increase). Now look at the fact that most of us are going from single- to dual-core processors in our personal machines about now, then you realise that someone has to be planning for a style of software development that will run not just on 4 or 8 processors, but will have to keep hundreds or thousands of processors simultaneously busy within a lifetime. Multithreading on this scale is really not possible with conventional languages and conventional thinking, so one idea that some boffins and geeks are looking into is, what about if we tackle hard problems the way a swarm of ants or bees would? How would 10,000 ants set about sorting a set of data into alphabetical order? Each one has only a few simple abilities, but put them all together and almost everything is possible. This is still far-fetched at the moment, but interesting none-the-less.

8 Jul 2008

Automated testing

To come...

Agile methodology

To come...

Design patterns

To come...