Roger Sessions, CTO of ObjectWatch and an expert in software architecture, argues that the increasing complexity of our IT systems will be our undoing. In fact, he just recently got a patent for a methodology that helps deal with complex IT systems. Network World Editor in Chief John Dix recently caught up with Sessions to get his take on the extent of the problem and possible solutions.

The 10 dumbest mistakes network managers make

Outline the IT complexity problem as you see it.

The basic problem is the larger and more expensive an IT project is, the more likely it is to fail. You can do a lot of analysis as to why that is. You can say maybe we're not using the right methodology, or communications is failing, or any number of things. But ultimately the only variable that appears to correlate closely with failure is complexity.So my basic proposal is that as systems get bigger and more expensive they get more complex and complex things are harder to deal with and therefore more likely to fail. So if the system is under, say $750,000, it has a good chance of succeeding. Once it approaches $2 million it has less than a 50 per cent chance of succeeding. And by the time it gets much larger than that, the chances of success drop to near zero.

Clarify what you mean by failure?

The way I define failure is that, if at the end of a project the business looks back and concludes it would not have taken on the project knowing what it knows now, then the project is a failure.

How about some examples.

It could be a system that was abandoned because it got too complex or people got confused and started going off on tangents. An example would be the National Program for Information Technology in the U.K. It was an attempt to build an IT infrastructure for their healthcare system. It was probably a $20 billion effort, and it was abandoned and now they're starting from scratch.

$20 billion?

Yeah. It's a system I wrote about in my last book three years ago. At the time it hadn't yet failed. But the system was huge, highly complex, and there were no policies in place to deal with the complexity. I predicted that it could not succeed. And just in the last couple of weeks they've announced that they're discontinuing the system.

That's an extreme example. Most failures are in the $2 million to $4 million range. If you look at studies of how many systems in this range are successful, the number is less than 50 per cent.

So the question is not, are we failing? We know we're failing. The question is, why aren't we doing something about it? Why do we keep doing the same thing over and over again if we can see very clearly how unsuccessful it is?

Before we discuss that, those numbers are based on your own research or other people's research?

Other people's research. The Communications of the ACM in '07, for example, found that systems significantly under a million dollars have a better than 75 per cent chance of success. $2 million to $3 million it drops down to 50 per cent to maybe 40 per cent success. $10 million plus it's under 10 per cent success.

OK. So what do we do about it?

Well, the obvious answer is not to do big systems, do small systems. And to some extent, that's the approach I advocate. You've probably seen the anticomplexity patent we recently got for the Simple Iterative Partitions (SIP) methodology. That patent was interesting because it's the first patent that's ever been granted for a methodology to simplify either business or IT systems, and we actually tackle both since you can't simplify IT unless you simplify the business process on which IT is based.

As you break a big system down into a number of smaller systems, you reduce the functionality, the complexity, and the cost of those smaller systems. So in theory you're getting your system size down to a reasonable size that yields at least 75 per cent success rate. Unfortunately as you minimize the complexity of the system by breaking it down, you increase the complexity of those system interdependencies.

So, on the one hand, you need to break down big systems into small systems to reduce the complexity. On the other hand, as soon as you do that you increase the intersystem dependencies, which increases the complexity. So you're in a no-win situation.

Our methodology uses a mathematical approach to finding the best possible balance between those two tradeoffs, that is, between making the system small and minimizing functionality related complexity and keeping the system large and minimizing dependency related complexity.

How do you apply a mathematical model to this kind of stuff? Walk me through the approach.

We are actually already using mathematical models without realizing it. For example, Service-Oriented Architectures (SOA) are mathematically described as partitions, which is part of set theory. But typically we build SOA without understanding how partitions behave mathematically. So we don't use a mathematical approach to find the best possible partitions and then translate those partitions into SOAs. Instead we design our SOAs through decompositional design. Decompositional design is highly arbitrary. It is a process that is mathematically defined as irrational. There are literally trillions of ways of decomposing a problem and the vast majority of these are sub optimal. So many large SOAs end up in the failure bin.

This patent also uses partitioning to build an SOA, but not with highly arbitrary decompositional design. Instead we drive the partitioning with equivalence relations. Unlike decompositional design, with its effectively random results, equivalence relation analysis is a highly directed process. It leads you to one and only one solution. And we can show mathematically that this solution is the simplest possible solution. It will also be the cheapest solution and the solution that will most likely line up with the business needs.

How does this work in practice? Do companies that have a big project call you in to help them figure out the best way to tackle it?

Exactly. That's our specialty. An example of which would be even a small system for a state government. Let's say you've got a motor vehicles department that wants to replace just that part of the system that tracks drivers' licenses and makes sure people have done the right thing to get a license. Though that's only a small part of the functionality that's in motor vehicles, that's easily going to be a $3 million or $4 million system.

Once you realize it's a $3 million or $4 million system you realize you've got less than a 25 per cent chance of success unless you do something different. You can use decompositional design to carve the system up into four small systems, but the chances are that you're going to end up with a worse situation. So instead, we come in and do the pre-planning, which is before the system architecture is created. We figure out the optimal way to take the large collection of functionality and carve it into smaller projects, so that each one of those projects is as simple as it can possibly be. It may be four subprojects. It may be eight. It will be whatever the mathematical analysis tells us is the best way to do it.

There are a lot of benefits to doing this. First you've greatly increased the chances of success, because each one of those projects is small enough so it has probably an 80 per cent chance of success. You've also reduced the cost of the project, because cost is directly proportional to overall complexity. And you've made it much easier to determine what functionality needs to be in each one of the pieces.

It is a multi-stage process. First we identify the business functions. Then we do synergistic analysis, which tells us which functions need to live in which subsets. And those subsets eventually turn into independent projects. So it's not a short process, but relative to the cost of doing the entire project it's very fast and it's more than paid back by the fact that you reduce the complexity of the overall system and therefore reduce the cost and the failure rate.

Do you mostly work in SOA environments?

In general we are agnostic about SOA. SOA is a good way to implement these subsets so that that the technical architecture is closely aligned with the business architecture. So I like SOAs from this perspective. But it's really at a lower level than what we do. What we do is identify the optimal configuration of subprojects. We answer the question of how can you break this big project up into the best possible configuration of smaller projects? How you implement those projects, we largely leave to the implementation teams.

Does the arrival of cloud computing help simplify the world or just add another layer of complexity?

In my opinion, it adds another layer of complexity. I like the idea of cloud architectures, not for everything, but for many things. But it is a serious mistake to put a large, complex system on the cloud. That's going to be very difficult to do, very difficult to manage. And if you're successful, you'll be trapped on that particular vendor.

So what you want to do is break these systems up into smaller pieces. And then for each one of the pieces make the best possible implementation deployment decision, such as should we use service-oriented architectures to implement it? Should we use the cloud as the deployment platform? But you do that after you've removed as much of the complexity as you can by splitting up the project into smaller projects.

So far we've been talking about green field, starting from scratch. Could your approach help a company get a handle on a big existing system that breaks too often?

Yeah, it can. Because in many cases what you find in these very large systems are a few complexity hotspots, or what I call complexity knots, that one part of the system where everything seems to break. Or, if it's interacting with some other system, the interactions fail. A lot of times the problem is that one function has been placed in the wrong place. And if we move it from one system to another it can dramatically reduce the complexity. You can really extend the life of a system by doing something like that.

We had a situation like that where we were looking at two insurance systems that had to work together. And every time information was passed from one to the other there was a high probability of failure. We showed through complexity analysis that the problem was not how they were packaging the information or that one group was not dealing with the information correctly. The problem was that they shouldn't have been passing that information in the first place. If they slightly adjusted the functionality so the right system was doing the right analysis, the whole problem went away.

That's not uncommon. Now, of course, you'd like to do this with a new system, because the more you can reduce complexity in the first place the better long-term payback you're going to have.

You're known to say IT is facing a meltdown. Explain.

The meltdown I see is a complexity problem. Systems are getting larger and larger and they're already to the point where they have a very high probability of failure.

So we have a cycle. The cost of the system goes up, the cost of the failures goes up, and the chances of success drop. As the chances of success drop, the cost of the system goes up more. And as the cost of the system goes up more, the failure rate increases.

So, you have this cycle and you have to ask yourself, "How much money are we spending on failed IT systems before you can call it a meltdown?"

Last question. Short of hiring your firm to come and solve the problem for mankind, any promising IT developments you see that would help address this?

Well, there are lots of interesting ideas, like cloud, but unfortunately none address the kind of coarse-grained complexity issue that we're looking at. So, for somebody who's looking at taking on, let's say, a $10 million-plus system, short of hiring us to come in and help them pre-plan it, I guess my advice would be not to do it because the chances of being successful are just too low.

Figure out what you can do for under a million dollars, because you're better off getting something working for a million dollars than you are getting nothing working for $10 million.

Read more about infrastructure management in Network World's Infrastructure Management section.