Information System Modernization – The Ship of Theseus?

The other day I was listening to a podcast by Malcolm Gladwell. It was about golf clubs (which he hates). Living next to two Golf Courses and frequently running next on them this was something I could relate to. The issue he had was that they did not pay proper tax. This is due to a California rule that property tax is frozen on pre 1978 levels unless more than 50% of ownership had changed.

The country clubs own the golf courses and the members own the country club. Naturally more than 50% of the members had been changed since then. However, according to the tax authorities this does not mean that 50% of the ownership had changed. The reason is that the gradual change of the membership means that the identity of the owning body had not changed. This is to some people a peculiar philosophical stance but not without precedence. It is known through the ancient Greek writer Plutarch’s philosophical paradox known as Theseus ship, here quoted from Wikipedia:

“The ship wherein Theseus and the youth of Athens returned from Crete had thirty oars, and was preserved by the Athenians down even to the time of Demetrius Phalereus, for they took away the old planks as they decayed, putting in new and stronger timber in their places, in so much that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.”

For Gladwell it was not clear that the gradual replacement of members in a country club constituted no change in ownership. Be that as it may, the story made me think about information system modernization, which is typically a huge part of many enterprises and government IT project portfolios. Information systems are like Theseus ship, you want to keep it floating but you also want to maintain and make it better. The question is just: is information system modernization Theseus ship?

The Modernization effort

Usually a board, CEO, CIO, commissioner or other bodies with responsibility for legacy systems realize that it is time to do something about them. Maybe the last person who knows about it is already long overdue his retirement, the operational efficiency has significantly declined, costs expanded or the market demands requirements that cannot be easily implemented in the existing legacy system. Whatever the reason, a decision to modernize the system is made: retire the old and replace with the new.

Now this is where it gets tricky because what exactly should the new be? Do we want a car or a faster horse? For many the task turns into building a faster horse by default. Because we know what the system should do, right? It just has to do it a bit faster or do a little bit better. The problem is that we are sometimes building Theseus a new rowboat with carbon fiber planks where we could instead have gotten a speedboat with an outdoor kitchen and a bar.

When embarking on a legacy modernization project, there are a few things I believe we should observe. I will use a recent project that we have done to modernize the architecture of a central integration solution at New York City Department of IT and Telecommunication. This legacy system is itself a modernization of an earlier mainframe based system (yes, things turn legacy very fast these days).

Some of the things to be conscious of in order not to end up in the trap of Theseus’ ship when modernizing systems are the following.

Same or Better and Cheaper

A modernized legacy system has to fulfill three general criteria: It should do the same or more as today, with the same or better quality at a cheaper price. It is that simple. When I say that it should do the same as today I would like to qualify that: if the system today sends sales reports to matrix printers and fax machines around the country, we probably don’t need that even if it is a function today. The point is that all important functions and processes that are supported today should also be supported.

When we talk about quality we mean the traditional suite of non-functional requirements: Security, Maintainability, Resilience, Supportability etc. Quite often it is exactly the non-functional requirements that need to be improved, for example maintainability or supportability.

At a cheaper price is pretty straightforward. It is not always possible, such as when you are replacing a system that was custom coded with a modern COTS or SaaS solution. Nevertheless, I think it is an important ambition and realistic because most legacy technology that used to be state of the art is now commodities due to open source and general competition. An example is Message Queueing software. That used to be offered at a premium by legacy vendors, but due to open source products like Active MQ and Rabbit MQ as well as cost efficient cloud offerings, it has become orders of magnitude cheaper.

Should the system even be doing this in the new version?

Often there is legacy functionality that has become naturally obsolete. One example I found illustrates this. The integration solution we had is based on an adapter platform that takes data from a source endpoint, formats it and puts it on a queue. At the center a message broker routes it to queues that are read by other adapter platforms. They then format and write the messages to the target endpoint. This is a fine pattern, but if you want to move a file, it is not necessarily the most efficient way since the file has to be parsed into multiple bits to be put on a queue and then assembled again on the other side. This is a process that can easily go wrong if one message fails or is out of order. Consequently multiple checks and operational procedures need to be in place. Rather than having the future solution do this, one could look to see if other existing solutions are more appropriate, such as a managed file transfer solution. Similarly when the system merely wraps web calls, an API management solution may be more appropriate.

Why does the system do it in this way?

Was this due to technological or other constraints when it was built? When modernizing it can pay off to look at each characteristic of the legacy system and understand why it is implemented in that way rather than just copying it. For example, our integration solution puts everything on a queue. Why is that? It may be because we want guaranteed delivery.

This is a fair answer but also a clue to how we can make it better, because what better way is there to make sure that you don’t loose a message than to just store it in an archive for good as soon as you get it? In a queue that message is deleted. This presumably has to do with message queueing’s origin on the mainframe where memory is a scarce resource.

It is not any more so rather than use a queue, lets just store the message and publish an event on a topic and let subscribers to the topic process it at their convenience. This way the integration can also be rerun even if a down stream process fails such as the target adapters writing to a DB. If this were a queue-based integration the message would be lost because it would have been deleted off the queue. With this architecture any process can at any time access the message again. Now a message is truly never lost.

What else can the system do going forward?

Keep an eye out for opportunities that present themselves with rethinking the architecture and the possibilities of modern technologies. To continue on our example with the message store, we can now use the message archive for analytical solutions by transforming the messages subsequently from the archive into a Data Warehouse or Datamart. This is also known as an ELT process.

Basically we have turned our legacy queue based architecture into a modern ELT analytics type architecture on the side. What’s more is that we can even query the data in the message store with SQL. One way is to make it accessible as a HIVE table. Imagine what that would take in the legacy world: for every single queue we would have had to build an ETL process and put it into a new schema that would have to be created in advance.

Being open minded and having a view to adjacent or related use cases is important to spot these opportunities. This may take a bit of workaround the institutional silos if such exists though. That is just another type of constraint, a non-technical constraint, which is often tacitly built into the system.

Remember that we wanted the modernized system to be “Same or better and cheaper”. Now we can still get all of the functional benefits of a queue, just better, since we can always find a message again. On top of that we have offered new useful functionality in an analytics solution that is sort of a by-product of our new architecture. Deploying it in the cloud allows us to have better resilience, performance, monitoring and even security. Add to that the cost, which is guaranteed to be significantly less than what we were paying for our legacy vendor’s proprietary message queueing system.