AI is Easy – Life is Hard

Artificial Intelligence is easy. Life is hard. This simple insight should collectively caution our expectations. When we look at Artificial Intelligence and the amazing results it has already produced, it is clear that it has not been easy. The most iconic victories are as follows.

  • Deep Blue beats Kasparov
  • Watson beats champions Brad Rutter and Ken Jennings in jeopardy
  • Google beats world champion in Go

Today Autonomous Vehicles (AV) manage to stay on the road and go where they are expected to. Put on top of this multiple implementations of face recognition, speech recognition, translation etc. What more could we want to show us that it is just a matter of time before AI becomes truly human in its ability? Does this not reflect the diversity of human intelligence and how it has been truly mastered by technology?

Actually, no, I don’t think so. From a superficial point of view it could look like it, but deep down all these problems are if not easy, then hard in an easy way in the sense that there is a clear path to solving them.

 

AI Is Easy 

The one thing that holds true for all of these applications is that the goals are very clear. Chess, Jeopardy and Go: you either win or you don’t. Facial, speech and any other kind of recognition: you recognize something or you don’t. Driving an autonomous vehicle: It either drives acceptably according to the traffic rules or it doesn’t. If only human life were so simple.

Did you know when you were born what you wanted to work with? Did you know the precise attributes of the man/woman you were looking for? Did you ever change your mind? Did you ever want to do two or more mutually exclusive things (like eating cake for breakfast and live a healthy life)?

Humans are so used to constantly evaluate trade offs, with unclear and frequently changing goals that we don’t even think about it.

 

An AI Thought Experiment 

Let me reframe this in the shape of a tangible existing AI problem: Autonomous Vehicles (AV). Now that they are very good or even perfect at always staying within the traffic rules, how do they behave when conditions are not as clear? Or even in situations where the rules might be conflicting?

Here is a thought experiment: the self-driving car is driving on a sunny spring afternoon through the streets of New York. It is a good day and it is able to keep a good pace. On its right is a sidewalk with a lot of pedestrians (as is common in New York), on its left is a traffic lane going the opposite direction as they do on two-way streets (which are more rare but not altogether absent). Now suddenly a child runs out into the road in front of the car and it is impossible for it to brake in time. The autonomous vehicle needs to make a choice. It either runs over the child makes an evasive maneuver to the right hitting pedestrians or the left hitting cars going the other direction?

How do we prepare the AI to make that decision? Now, the goals are not so clear as in a jeopardy game. Is it more important not to endanger children? Let’s just say for the sake of argument this was the key moral heuristic. The AI would then have to calculate how many children were on the sidewalk and in a given car on the opposite side of the road. It may kill two children on the sidewalk or in another car. What if there were two children in the Autonomous Vehicle itself? Does the age factor in to the decision? Is it better to kill old people than younger? What about medical conditions? Would it not be better to hit a terminal cancer patient than a healthy young mother?

The point of this thought experiment is just to highlight that even if the AI could make an optimal decision it is not simple what optimal means. It may indeed differ across people, that is, regular human beings who would be judging it. There are hundreds of thousands of similar situations where there just by definition is no one right solution, and consequently no clear goal for the AI to optimize towards. What if we had an AI as the next president? Would we trust it to make the right decisions in all cases? Probably not, politics is about sentiment, subjectivity and hard solutions. Would we entrust an AI that would be able to go through all previous court cases, statistics, political objectives to make fair rulings and sentencing? No way, although it probably could.

 

Inside the AI Is the Programmer 

As can be seen from this the intelligence in an AI must be explained by another intelligence. We would still have to instill the heuristics and the tradeoffs in the AI, which then leads back to who programs the AI. This means that suddenly we will have technology corporations and programmers making key moral decisions in the wild. They will be the intelligence inside the Artificial Intelligence.

In many ways this is already the case. A more peaceful case in point is online dating: a programmer has essentially decided who should find love and who shouldn’t through the matching algorithm and the input used. Inside the AI is the programmer making decisions no one ever agreed they should. Real Artificial Intelligence is as elusive as ever; no matter how many resources we throw at it. Life will throw us the same problems as it always has and at the end of the day the intelligence will be human anyway.

AI and the City

Artificial Intelligence is currently being touted as solution to most problems. Most if not all energy is put into conjuring up new and even more exotic machine learning models and ways of optimizing these. However, the primary boundary for AI is currently not technical as it used to be. It is ecological. Here I am not thinking about the developer ecosystem, but the ecosystem of humans that have to live with the consequences of AI and interact with machines and systems driven by it. While AI lends itself beautifully to the concept of smart cities this is also one of the avenues where this will most clearly play out because the humans that stand to benefit and potentially suffer from the consequences of AI are also voters. Voters vote for politicians and politicians decide to fund AI for smart cities.

How Smart is AI In A Smart City Context?

At a recent conference I had an interesting discussion where we were talking about what AI could be used for. Someone suggested that Machine Learning and AI could be used for smart cities. Working for a city and having worked with AI for a number of years, my question was “for what?” One suggestion was regulating traffic.

So, let us think through this. In New York City we have on occasion a lot of traffic. Let us say that we are able to construct a machine learning system that could indeed optimize traffic flow through the city. This will not be simple or easy, but not outside the realm of the possible. Let us say that all intersections are connected to this central AI algorithm that provides the city as a whole with optimal traffic conditions. The algorithm works on sensor input that counts the number of cars at different intersections based on existing cameras. Probably this will not mean that traffic always flows perfectly but certainly on average does better.

Now imagine during one of these congestions a fire erupts in downtown Manhattan and fire trucks are delayed in traffic due to congestion. 50 people die. The media then finds out that the traffic lights are controlled by an artificial intelligence algorithm. They ask the commissioner of transportation why 50 people had to die because of the algorithm. This is not a completely fair question but media have been known to ask such questions. He tries to explain that the algorithm optimizes the overall flow of traffic. The media are skeptical and ask him to explain how it works. This is where it gets complicated. Since this in part is a deep learning algorithm no one can really tell how it works or why there was congestion delaying the fire trucks at that particular time. The outrage is palpable and headlines read “City has surrendered to deadly AI” and “Incomprehensible algorithm leads to incomprehensible fatalities”

Contrast this to a simple algorithm that is based on clear and simple rules that are not as effective overall but work along the lines of 30 seconds one way 30 seconds another way. Who would blame the commissioner of transportation for congestion in that case?

Politics And Chaos

Media aside there could be other limiting factors. Let us stay with our idea of an AI system controlling the traffic lights in New York City. Let us further assume that the AI system gets a continuous input about traffic flow in the city. Based on this feed it can adapt the signals to optimize the flow. This is great but due to the fact that we have now coupled the system with thousands of feedback loops it enters into the realm of complex or chaotic systems and will start to exhibit properties that are associated with that kind of systems. Typical examples of such properties are: erratic behavior, path dependency, and limited possibility for prediction.

Massively scalable AI cannot counteract these effects easily and even if we could, the true system dynamics would not be known until the system goes live. We would not know how many cars would be running red lights or speed up/slow down compared to today. Possibly the system could be trimmed and be made to behave, but then we basic politics. Which responsible leader would want to experiment with a city of more than 10 million peoples daily lives. Who would want to face these people and explain to them that the reason they are late for work or for their son’s basket ball game is trimming an AI algorithm?

The Limits Of AI

So, the limits to AI may not be primarily of a technical nature. They may have just as much to do with how the world behaves and what other non-data-scientist-humans will accept. Even if it is better to loose 50 people in a fire in Manhattan once every 10th year and reducing the number of traffic deaths by 100 every year, the stories written are about the one tragic event not about the general trend. Voters remember the big media stories and will never notice a smaller trend. Consequently, regardless of the technical utility and precision of AI, there will be cases where the human factor will constrain the solutions more than any code or infrastructure.

Based on this thought experiment I think the most important limits to adoption of AI solutions at city scale are the following

  • Unclear benefits – what are the benefits of leveraging AI for smart cities? We can surely think up a few use-cases but it is harder than you think. Traffic was one but even here the benefits can be elusive.
  • Algorithmic transparency – if we are ready to let our lives be dominated by AI in any important area citizens who vote will want to understand precisely how the algorithms work. Many classes of AI algorithms are incomprehensible in their nature and constantly changing. How can we prove that no one tampered with them in order to gain an unfair advantage? Real people who are late for work or are denied bail will want to know that and some times Department Of Investigation will want to know as well.
  • Accountability – whatever an algorithm is doing, people will want to hold a person accountable for it if something goes wrong. Who is accountable for malfunctioning AI? Or even well functioning AI with unwanted side effects? The buck stops with the responsible on the top, the elected or appointed official.
  • Unacceptable implementation costs – real world AI in a city context can rarely be adequately tested in advance as we are used to for enterprise applications. Implementing and trimming a real world system may have too many adversarial effects before it starts to be beneficial. No matter how much we prepare we can never know exactly how the human part of the system will behave at scale until we release it in the wild.

Artificial Intelligence is a great technological opportunity for cities but we have to develop the human side of AI in order to arrive something that is truly beneficial at scale.

 

Data Is the New Oil – Building the Data Refinery

“Data Is the New Oil!”

Mathematician and IT Architect Clive Humby seems to have been the first to coin the phrase in 2006 where he helped Tesco develop from a fledgling UK retail chain to an inter continental industry titan only rivaled be the likes of Walmart and Carrefour through the use of data through the Tesco reward program. Several people have reiterated the concept subsequently. But the realization did not really hit primetime until the economist in May 2017 claimed that data had surpassed oil as the most valuable resource

Data, however, is not just out there and up for grabs. Just like you have to get oil out of the ground first, data poses similar challenges: you need to get it out of computer systems or devices first. When you do get the oil out of the ground it is still virtually useless. Crude oil is just a nondescript blob of black goo. Getting the oil is just a third of the job. This is why we have oil refineries. Oil refineries turn crude oil into valuable and consumable resources like gas or diesel or propane. It splits the raw oil into different substances that can be used for multiple different products like paint, asphalt, nail polish, basketballs, fishing boots, guitar strings and aspirin. This is awesome; can you imagine a world without guitar strings, fishing boots or Aspirin? That would be like Harry Potter just without the magic…

Similarly even if we can get our hands on it, raw data is completely useless. If you have ever glanced at a webserver log, a binary data stream or other machine generated code you can relate to the analogy of crude oil as a big useless blob of black goo. All this data does not mean anything in itself. Getting the raw data is of course a challenge in some cases, but making it useful is a completely different story. That is why we need to build data refineries. Systems that turn the useless raw data into components that we can build useful data products from.

Building the data refinery

For the past year or so, we have worked to design and architect such a data refinery at New York City. The “Data as a Service” program is the effort to build this refinery for turning raw data from the City of New York into valuable and consumable services to be used by City agencies, residents and the rest of the world. We have multiple data sources in systems of record, registers, logs, official filings and applications, inspections and hundreds of thousands of devices. Only a fraction of this data is even available today. When it is available it is hard to discover and use. The purpose of Data as a Service is to make all the hidden data available and useful. We are turning all this raw data into valuable and consumable data services.

A typical refinery processes crude oil. This is done through a series of distinct processes and results in distinct products that can be used for different purposes. The purpose of the refinery is to break down the crude oil to distinct useful by-products. The Data as a Service refinery has five capability domains we want to manage in order to break the raw data down into useful data assets:

  • Quality is about the character and validity of the data assets
  • Movement is how we transfer and transform data assets from one place to another
  • Storage deals with how we retain data assets for later use
  • Discovery has to with how we locate the data assets we need
  • Access deals with how we allow users and other solutions to interact with data assets

Let us look at each of these in a bit more detail.

Quality

The first capability domain addresses the quality of the data. The raw data is initially of low quality like the crude oil. It may be a stream of bits or characters, telemetry data, logs or CSV files.

The first thing to think about in any data refinery is how to assess and manage the quality of the data. We want to understand and control the quality of data.  We want to know how many data objects there are if they are of the right format or if they are corrupted. Simple descriptive reports like the number of distinct values, type mismatch, number of nulls etc. can be very revealing and important when considering how it can be used by other systems and processes.

Once we know the quality of the data we may want to intervene and do something about it. Data preparation formats the data from its initial raw form. It may also validate that the data is not corrupted and can delete, insert and transform values according to preconfigured rules. This is the first diagnostic and cleansing of the data in the DaaS refinery.

Once we have the initial data objects lined up in an appropriate format Master Data Management is what allows us to work proactively and reactively with improving the data. With MDM we will be able to uniquely identify data objects across multiple different solutions and format them into a common semantic model. MDM enables an organization to manage data assets and produce golden records, identify and eliminate duplicates and control what data entities are valid and invalid.

Data movement

Once we have made sure that we can manage the quality of the data we can proceed to the next phase. Here we will move and transform the data into more useful formats. We may, however, need to move data differently. Sometimes it is all well to move it once a day, week or even month, but more often we want the data immediately.

Batch is movement and transformation of large quantities of data from one form and place to another. A typical batch program is executed on a schedule and goes through a sequence of processing steps that transforms the data from one form into another. It can range from simple formatting changes and aggregations to complex machine learning models. I should add that what is sometimes called Managed File Transfer, where a file is simply moved, that is, not transformed can be seen as a primitive form of batch processing, but in this context it is considered a way of accessing data and described below.

The Enterprise Service Bus is a processing paradigm that lets different programmatic solutions interact with each other through messaging. A message is a small discrete unit of data that can be routed, transformed distributed and otherwise processed as part of the information flow in the Service Bus. This is what we use when systems need to communicate across city agencies. It is a centralized orchestration.

But some data is not as nicely and easily managed. Some times we see use cases where the processing can’t wait for batch processing and the ESB paradigm does not scale with the quantities. Real time processing works on data that arrives in continuous streams. It has limited routing and transformation capabilities, but is especially geared towards handling large amounts of data that comes in continuously either to store, process or forward it.

Storage

Moving the data naturally requires places to move it to. Different ways of storing data have different properties and we want to optimize the utility by choosing the right way to store the data.

One of the most important and widespread ways to store data is the Data Warehouse. This is a structured store that contains data prepared for frequent ad hoc exploration by the business. It can contain pre-aggregated data and calculations that are often needed. Schemas are built in advance to address reporting needs. The Data Warehouse focuses on centralized storage and consequently data, which has a utility across different city agencies.

Whereas Data Warehouses are central stores of high quality validated data, Data Marts are similar local data stores. They are similar to Data Warehouses in that the data is prepared to some degree, but the scope is more local for an agency to do analytics internally. Frequently the data schema found are also more of an ad hoc character that may not be designed for wide spread consumption. It also serves as a user driven test bed for experiments. If an agency wants to create a data source and figure out if it has any utility, the data mart is a great way to quickly and in a decentralized manner create value in an agile manner.

Where Data Warehouses and Data Marts store structured data, a data lake is primarily a store for unstructured data, like csv, XML, log files as well as binary formats like video and audio. The data lake is a place to throw data first and then think about how to use it later. There are several zones within the data lake with varying degrees of structure: like the raw, analytical, discovery, operational and archive zones. Some parts like the analytical zone can be as structured as Data Marts and be queried with SQL or similar syntax (HiveQL), where others like the raw zone requires more programming to extract meaning. The data lake is a key component in bringing in more data and transforming it to something useful and valuable.

The Operational Data Store is in essence a read replica of an operational database. It is used in order not to unnecessarily tax an operational, transactional database with queries.

The City used to have real warehouses filled with paper archives that burned down every now and then. The reason for this is that all data has a retention policy that specifies how long is should be stored. This need is still there when we digitize data. Consequently we need to be in complete control of all data assets’ lifecycle. The archive is where data will be moved when there is no more need to access the data frequently. Consequently data access can have a long latency period. Archives are typically used in cases where regulatory requirements warrant data to be kept for a specific period of time.

Discovery

Now that we have ways to control the quality, move the data and store it we also need to be able to discover it. Data that cannot be found are useless. Therefore we need to supply a number of capabilities for finding the data we need.

If the user is in need of a particular data asset, search is the way to locate it. Based on familiar query functions the user can use single words or strings. We all know this from on line search engines. The need is the same here: to be able to intelligently locate the right data asset based on an input string.

When the user does not know exactly what data assets he or she is looking for we want to be able to supply other ways of discovering data. In a data catalog the user can browse existing data sources and locate the needed data based on tags or groups. The catalog also allows previews as well as additional meta-data about the data source, such as descriptions, data dictionaries and experts to contact. This is good if the user does not know exactly what data asset is needed.

In some cases a user group knows exactly what subset of data is needed. The data may not all reside in the same place or format. By introducing a virtual layer between the user and the data sources it is possible to create durable semantic layers that remain even when data sources are switched. It is also possible to tailor specific views of the same data source tailored to a particular audience. This way the view of the data will cater to the needs of individual user groups rather than a catch all lowest common denominator version, which is particularly convenient since access to sensitive data is granted on a per case basis. The data virtualization will make it possible for users to discover only the data they are legally mandated to view.

Access

Now that we are in control of the quality of data and who can use it, we also need to think about how we can let users consume the data. Across the city there are very different needs for consuming data.

Access by applications is granted through an API and supplies a standardized way for programmatic access by external and internal IT solutions. The API controls ad hoc data access and also supplies documentation that allows developers to interact with the data through a developer portal. Typically the data elements are smaller and involve a dialogue between the solution and the API.

When files need to be moved securely between different points without any transformation a managed file transfer solutions is used. This is also typically accessed by applications, but a portal also allows humans to upload or download the file. This is to be distinguished from document sharing sites like sharepoint, work docs, box and google docs where the purpose is for human end users to share files with other humans and typically cooperate on authoring them.

An end user will sometimes need to query a data source in order to extract a subset of the data. Query allows this form of ad hoc access to underlying structured or semi structured data sources. This is typically done through SQL. An extension of this is natural language queries thorough which the user can interrogate a data source through questions and answers. With the advent of colloquial interfaces like Alexa, Siri and Cortana this is something we expect to develop further.

A stream is a continuous sequence of data that applications can use. The data in a stream is supplied as a subscription to streams in a real time fashion. This is used when time and latency is of the essence. The receiving system will need to parse and process the stream by itself.

Contrary to this, events are already processed and are essentially messages that function as triggers from systems that indicate that something has happened or should happen. Other systems can subscribe to events and implement adequate responses to them. Similar to streams they are real time, but contrary to streams they are not continuous. They also resemble APIs in that it is usually smaller messages, but differs in that they implement a push pattern.

Implementing the refinery

Naturally some of this has already been built, since processing data is not something new. What we try to do with the Data as a Service program is to modernize existing implementations of the above-mentioned capabilities and plan for how to implement the missing ones. This involves a jigsaw puzzle of projects, stakeholders and possibilities. Like most other places we are not working from a green field and there is no multi million-dollar budget for creating all these interesting new solutions. Rather we have to continuously come up with ways to reach the target incrementally. This is what I have previously described as pragmatic idealism . What is important for us, as I suspect it will be for others, is to have a bold and comprehensive vision for where we want to go. That way we can hold up every project or idea against this target and evaluate how we can continuously progress closer to our goal. As our team’s motto goes “Enterprise Architecture – One solution at the time”

Information System Modernization – The Ship of Theseus?

The other day I was listening to a podcast by Malcolm Gladwell. It was about golf clubs (which he hates). Living next to two Golf Courses and frequently running next on them this was something I could relate to. The issue he had was that they did not pay proper tax. This is due to a California rule that property tax is frozen on pre 1978 levels unless more than 50% of ownership had changed.

The country clubs own the golf courses and the members own the country club. Naturally more than 50% of the members had been changed since then. However, according to the tax authorities this does not mean that 50% of the ownership had changed. The reason is that the gradual change of the membership means that the identity of the owning body had not changed. This is to some people a peculiar philosophical stance but not without precedence. It is known through the ancient Greek writer Plutarch’s philosophical paradox known as Theseus ship, here quoted from Wikipedia:

“The ship wherein Theseus and the youth of Athens returned from Crete had thirty oars, and was preserved by the Athenians down even to the time of Demetrius Phalereus, for they took away the old planks as they decayed, putting in new and stronger timber in their places, in so much that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.”

For Gladwell it was not clear that the gradual replacement of members in a country club constituted no change in ownership. Be that as it may, the story made me think about information system modernization, which is typically a huge part of many enterprises and government IT project portfolios. Information systems are like Theseus ship, you want to keep it floating but you also want to maintain and make it better. The question is just: is information system modernization Theseus ship?

The Modernization effort

Usually a board, CEO, CIO, commissioner or other bodies with responsibility for legacy systems realize that it is time to do something about them. Maybe the last person who knows about it is already long overdue his retirement, the operational efficiency has significantly declined, costs expanded or the market demands requirements that cannot be easily implemented in the existing legacy system. Whatever the reason, a decision to modernize the system is made: retire the old and replace with the new.

Now this is where it gets tricky because what exactly should the new be? Do we want a car or a faster horse? For many the task turns into building a faster horse by default. Because we know what the system should do, right? It just has to do it a bit faster or do a little bit better. The problem is that we are sometimes building Theseus a new rowboat with carbon fiber planks where we could instead have gotten a speedboat with an outdoor kitchen and a bar.

When embarking on a legacy modernization project, there are a few things I believe we should observe. I will use a recent project that we have done to modernize the architecture of a central integration solution at New York City Department of IT and Telecommunication. This legacy system is itself a modernization of an earlier mainframe based system (yes, things turn legacy very fast these days).

Some of the things to be conscious of in order not to end up in the trap of Theseus’ ship when modernizing systems are the following.

Same or Better and Cheaper

A modernized legacy system has to fulfill three general criteria: It should do the same or more as today, with the same or better quality at a cheaper price. It is that simple. When I say that it should do the same as today I would like to qualify that: if the system today sends sales reports to matrix printers and fax machines around the country, we probably don’t need that even if it is a function today. The point is that all important functions and processes that are supported today should also be supported.

When we talk about quality we mean the traditional suite of non-functional requirements: Security, Maintainability, Resilience, Supportability etc. Quite often it is exactly the non-functional requirements that need to be improved, for example maintainability or supportability.

At a cheaper price is pretty straightforward. It is not always possible, such as when you are replacing a system that was custom coded with a modern COTS or SaaS solution. Nevertheless, I think it is an important ambition and realistic because most legacy technology that used to be state of the art is now commodities due to open source and general competition. An example is Message Queueing software. That used to be offered at a premium by legacy vendors, but due to open source products like Active MQ and Rabbit MQ as well as cost efficient cloud offerings, it has become orders of magnitude cheaper.

Should the system even be doing this in the new version?

Often there is legacy functionality that has become naturally obsolete. One example I found illustrates this. The integration solution we had is based on an adapter platform that takes data from a source endpoint, formats it and puts it on a queue. At the center a message broker routes it to queues that are read by other adapter platforms. They then format and write the messages to the target endpoint. This is a fine pattern, but if you want to move a file, it is not necessarily the most efficient way since the file has to be parsed into multiple bits to be put on a queue and then assembled again on the other side. This is a process that can easily go wrong if one message fails or is out of order. Consequently multiple checks and operational procedures need to be in place. Rather than having the future solution do this, one could look to see if other existing solutions are more appropriate, such as a managed file transfer solution. Similarly when the system merely wraps web calls, an API management solution may be more appropriate.

Why does the system do it in this way?

Was this due to technological or other constraints when it was built? When modernizing it can pay off to look at each characteristic of the legacy system and understand why it is implemented in that way rather than just copying it. For example, our integration solution puts everything on a queue. Why is that? It may be because we want guaranteed delivery.

This is a fair answer but also a clue to how we can make it better, because what better way is there to make sure that you don’t loose a message than to just store it in an archive for good as soon as you get it? In a queue that message is deleted. This presumably has to do with message queueing’s origin on the mainframe where memory is a scarce resource.

It is not any more so rather than use a queue, lets just store the message and publish an event on a topic and let subscribers to the topic process it at their convenience. This way the integration can also be rerun even if a down stream process fails such as the target adapters writing to a DB. If this were a queue-based integration the message would be lost because it would have been deleted off the queue. With this architecture any process can at any time access the message again. Now a message is truly never lost.

What else can the system do going forward?

Keep an eye out for opportunities that present themselves with rethinking the architecture and the possibilities of modern technologies. To continue on our example with the message store, we can now use the message archive for analytical solutions by transforming the messages subsequently from the archive into a Data Warehouse or Datamart. This is also known as an ELT process.

Basically we have turned our legacy queue based architecture into a modern ELT analytics type architecture on the side. What’s more is that we can even query the data in the message store with SQL. One way is to make it accessible as a HIVE table. Imagine what that would take in the legacy world: for every single queue we would have had to build an ETL process and put it into a new schema that would have to be created in advance.

Being open minded and having a view to adjacent or related use cases is important to spot these opportunities. This may take a bit of workaround the institutional silos if such exists though. That is just another type of constraint, a non-technical constraint, which is often tacitly built into the system.

 

Remember that we wanted the modernized system to be “Same or better and cheaper”. Now we can still get all of the functional benefits of a queue, just better, since we can always find a message again. On top of that we have offered new useful functionality in an analytics solution that is sort of a by-product of our new architecture. Deploying it in the cloud allows us to have better resilience, performance, monitoring and even security. Add to that the cost, which is guaranteed to be significantly less than what we were paying for our legacy vendor’s proprietary message queueing system.

A Citywide Mesh Network – Science Fiction or Future Fact?

I recently finished Neal Stephenson’s excellent “Seveneves”. The plot is that the moon blows up due to an unknown force. Initially people marvel at the now fragmented moon, but due to the intelligent analysis of one of the protagonists it becomes clear that these fragments will keep fragmenting and eventually rain down on earth. The lunar debris turns into comets that start making the earth a less than pleasant place and very hot place to live. In order to survive the human race decides to build a space station composed of a number of individual pods (designed by the architects!). This design is chosen in order to have the opportunity to evade incoming debris like a shoal of fish evades a shark.

Naturally there is no Internet in space but the natural drive towards having a social network (called spacebook) forces the always inventive human race to find another way to implement the internet. The resulting solution is a mesh network.

The principle of a mesh network:

“is a local network topology in which the infrastructure nodes (i.e. bridges, switches and other infrastructure devices) connect directly, dynamically and non-hierarchically to as many other nodes as possible and cooperate with one another to efficiently route data from/to clients”

The good thing about mesh networks is that every node can serve as a router and even if one or a few nodes fail (as they might in a space orbit filled with lunar debris) the network would still work. Contrast this with a network typology where one or even a few pods had central routers, like our present day Internet which is based on the hierarchical Domain Name System where traffic depends on a few top-level DNS servers. If these were all taken out the whole network would not work. With a wireless mesh network, the network would continue to work as long as there are nodes that can reach each other. But enough of the science fiction let’s get back to the real world.

The City Wide Mesh Network

New York City, where I work, has had its own share of calamities. Not quite the scale of the moon blowing up, but September 11, 2001, was still a significant disaster. The effect was that the cell network broke down due to overload. This greatly reduced first responders’ ability to communicate. In order for this not to happen again, NYC built its own wireless network: we call this NYCWIN. For years this network has served the City well, but the cost of maintaining a dedicated citywide wifi network is high compared to the price and quality of modern commercial cell networks.

However, the cellular network is also patchy in some parts of the city, as most New Yorkers have noticed. It is also expensive if we want to supply each IoT device in the City with its own cellular subscription. Typically a cellular connection will have a lot more bandwidth than most devices will ever use anyway. So, might it be possible to rethink the whole network structure and gain some additional benefits in the process? What if we created a citywide mesh network instead? It could function in the following way:

A number of routers would be set up around the city. Each would be close enough to reach at least one other router. When one router fails there are others nearby to take over the network traffic. These routers would form the fabric of the citywide mesh network.

Some of these primary routers would be connected to Internet routers either through cables or cellular connections. These special routers would serve as gateways to the internet. In this way the network would effectively be connected to the Internet and we would have a mesh Internet. This is actually not something new, in fact it already exists! It has been implemented by a private group called NYC Mesh: They have created their own proprietary routers for this, but wouldn’t it be cool if the City scaled a similar solution to use by all New Yorkers and visitors. Free of charge, like the LinkNYC stands. Oh and could they not maybe be the Internet gateways we thought of above? Think about it, what if wifi was just pervasive in the air of the City for everyone to tap into?

Better than LTE

The beauty of this is that this network may even be better than the cellular network, since it can better be extended to parts of the city that have patchy coverage from cell towers. We would just have to set up routers in those areas and make sure there was a line of connection to nodes in the existing network or an internet gateway. It would even be possible to extend the network indoor, even to the subway.

With thousands of IoT devices coming online in the future years, costs will increase significantly for Smart City solutions. Today it is not cheap to have a device connected through a cellular carrier to the Internet. Since it is essentially a cell phone connection, it also costs about the same typically. This may economically make sense compared to alternatives for the number of devices connected today. But scaling towards millions of devices, this approach is untenable in the long run. The City Wide Mesh Network could be a scalable low cost alternative for all of the City’s IoT devices to connect to the Internet.

Building and maintaining the network

It is quite an effort to implement this network and maintain it, but there is also a way to get around that. Today it is possible for commercial carriers to put up cellular antennas on City property if permission is granted. What if we made all permissions contingent on setting up a number of mesh routers for the citywide mesh network? Then, for every time a cellular or other antenna was set up, the citywide mesh network would be strengthened.

It could simply be made the obligation of the carriers that are granted use of City property for commercial uses, that they maintain their part of a free city wide mesh network. The good thing about a mesh network is that there is no central control and making it operational would just entail following some standards and add and replace network nodes. The City would have to decide on the standards to put in place: what equipment, what protocols etc. Not an easy task perhaps, but also not impossible.

In order to maintain the health and operation of the network monitoring would have to be in place. We could see in real time what nodes were failing and replace them. It would also be possible to elastically provision nodes when traffic patterns and utilization makes it necessary.

World Wide Standard

Now here is where it could get interesting, because the issue today in Mesh networking as in most other IoT is that there are no common standards. Vendors have their own proprietary standards and no interest in making it compatible. History has shown us that the only way to impose standards on any industry is through governmental mandate. New York could of course not mandate a standard, but what if the City forced all vendors who wanted to sell to the New York City Wide Mesh Network to comply with a given standard? The industry would have to develop their products to this common standard. Since New York has the size to create a critical mass this could possibly be the start of a new Mesh Network standard.

New York works together with a lot of other cities that often take inspiration from us in issues of technology. An example is open data, which originated in New York, but is now spread to virtually every city of notable size. The same could be the case for the City Wide Mesh Network design and standards used. That way, cities could have a blueprint for bringing pervasive low cost wifi to all citizens and visitors.

Fiction or Fact?

If a similar catastrophe to 9/11 were to ever happen again, then the mesh network would adapt and through healthy nodes still be able to send data around, possibly slower, but it would not fail. Only the particular nodes that were hit would be out, but the integrity of the network would be intact. It is, of course, possible that islands without connectivity would appear but that is to be expected. As long as the integrity of the network is unaffected it is ok.

It would is actually possible to create a robust low cost, citywide network that would be developed and maintained by third parties with better coverage than cell phones all the while helping the world by forcing the industry to implement standards that would improve interoperability for IoT devices. This is not necessarily science fiction: everything is within the realm of possibilities.

The Data Deluge, Birds and the Beginning of Memory

One of my heroes is the avant garde artist Laurie Anderson. She is probably best known for the unlikely hit “Oh Superman”  in the eighties and being married to Lou Reed, but I think she is an artist of comparable or even greater magnitude. On one of her later albums is a typical Laurie Anderson song called: “The Beginning of Memory”. Being a data guy this naturally piqued my interest. It was sort of a win-win scenario. The song is an account of a myth from an Ancient Greek play by Aristophanes: “The Birds”. Here are the lyrics to the song :

There’s a story in an ancient play about birds called The Birds
And it’s a short story from before the world began
From a time when there was no earth, no land
Only air and birds everywhere

But the thing was there was no place to land
Because there was no land
So they just circled around and around
Because this was before the world began

And the sound was deafening. Songbirds were everywhere
Billions and billions and billions of birds

And one of these birds was a lark and one day her father died
And this was a really big problem because what should they do with the body?There was no place to put the body because there was no earth

And finally the lark had a solution
She decided to bury her father in the back of her own head
And this was the beginning of memory
Because before this no one could remember a thing
They were just constantly flying in circles
Constantly flying in huge circles

While myths are believed to be literal truth by very few people they usually point to some more abstract and deeper truth. It is rarely clear exactly how and what it means. But I think I see the deeper point here that may actually teach us something valuable. Bear with me for a second.

The Data Deluge and The Beginning of Memory

The feeling I got from the song was eerily familiar with the feeling I get from working with Internet of Things. Our phones constantly track our movements; our cars record data on the engine and performance. Sensors that monitor us every minute of our lives are silently invading our world. When we go through the streets of Manhattan we are monitored by the NYPDs system of surveillance cameras, Alexa is listening in on our conversations and Nest thermostats sense when we are home.

This is what is frequently referred to as the Internet of things. The analogy to the story about the birds is that until now we have just been flying about in circles with no real sense of direction or persistence to our movement. What is often overlooked is that the fact that we can now measure the movement and status of things only amplifies the cacophony of the deafening sound of billions of billions of birds, sorry, devices.

This is where the birth of memory comes in. Because not until the beginning of memory do we gain firm ground under our feet. It is only with memory that we provide some persistence to our throngs of devices and their song. We capture signals and persist them in one form of memory or another.

The majority of interest in IoT is currently dedicated to exactly this process, how do we capture the data? What protocols do we use? Is MQTT better or does AMQP provide a better mechanism? What is the velocity and volume of the data? Do we capture it as a stream or as micro batches?

We also spend a great deal of time figuring out whether it is better to store in HDFS, Mongo DB, or Hbase, should we use Azure SQL Data Warehouse or Redshift or something else? We read studies about performance benchmarks and guidelines to making these choices (I do at least).

These are all worthwhile and interesting problems that also capture a large part of my time, but it also completely misses the point! If we refer back to the ancient myth, the Lark did not want to remember and persist everything, it merely wanted to persist the death of its father, it only wanted to persist something because it was something that mattered!

What Actually Matters?

And this is where we go wrong. We are just persisting the same incessant bird song frequently without pausing to think about what actually matters. We should heed the advice of the ancient myth and reflect on what is important to persist. I know this is against most received wisdom in BI and Big Data, where the mantra has been “persist as much as possible, you never know when you are going to need it”

But actually the tides are turning on that view due to a number of new limiting factors such as storage, processing and connectivity. Granted, storage is still getting cheaper and cheaper and network bandwidth more and more ample. Even processing is getting cheaper. However, if you look closely at the fine print of the cloud vendors, services that process data and move data are not all that cheap. And you do need to move the data and process it in order to do anything with it. Amazon will allow you to store anything at next to no cost in S3, but if you want to process it with Glue or query with Athena it is not so cheap.

Another emerging constraining factor is connectivity. Many devices today still connect to the Internet through the cellular network. Now, cellular networks are operated by carriers that pay good money for the frequencies used. This money is passed on to the users. On average a device is not different from a cell phone, so naturally you have to pay something close to the price of a cell phone connection, around $30 to $40. I do get the enthusiasm around billions of devices, but if the majority of these are connecting to the internet through the cellular radio spectrum, then the price is also billions of dollars.

Suddenly, the bird song is not so pleasant to most ears and our ornithological enthusiasm is significantly curbed. These trends are sufficient to warrant us starting to think about persisting only what actually matters. That can be a lot, if you really have a feasible use case for storing for example for storing all your engine data (which you might), it could also be that the 120 data points per second from your connected tooth brush may turn out to probably not matter that much.

And I haven’t even started to touch on how you would ever find sense in all the data that you persisted to memory. Most solutions do not employ adequate metadata management or data catalogs or other solutions that would tell anyone what a piece of data actually “means”. If we don’t know or have any way of knowing what a piece of data means there is absolutely no reason to store it. If you have a data feed with 20 variables but you don’t know what they are, how is it ever going to help you?

Store what matters

This can actually be turned into a rule of thumb about data storage in general: The data should be stored only to the extent that someone feels it matters enough to describe what it actually is. If no one can be bothered to pin down a description of this variable and no one can be bothered to store that description anywhere it is because it doesn’t matter.

 

 

 

 

 

https://en.wikipedia.org/wiki/The_Birds_(play)

 

Pragmatic Idealism in Enterprise Architecture

Being an enterprise architect I am not insensitive to the skyward gazes that project managers or developers make when being “assigned” an architect. The architect is frequently perceived as living in an ivory tower of abstraction in perfect disjunction from the real world. At best he is a distraction, at worst a liability

The architect frequently lives in a completely idealized world and he is tasked with implementing these ideals. However often this fails precisely because the ideals rarely conform to the reality. The architect fails to appreciate, what in military parlance is sometimes referred to as “the facts on the ground”. He is too often the desktop general.

Symptoms of an idealist regime is

  • There is a guideline for that
  • Templates for any occasion
  • We have it documented in our Enterprise Architecture tool, any other questions?
  • More than 3% of the IT organization are Architects
  • CMMi level 5 is viewed as the minimum requirement for doing any kind of serious work

Now consider the architect’s counterparts: project managers, developers or sys admins that just want to get the job done in a predictable way. These guys live “the facts on the ground”. They know all the peculiarities of the environment or system being worked on.

Symptoms of a pragmatist regime is the following

  • If something breaks we fix it and get back to our coffee break
  • Upgrade what is already in place when it runs out of support (urgency promotes action)
  • Enhance existing functionality, it already works
  • New technology is like the flu, it will pass, no need to get it
  • A big pot of Status Quo (not the band) with a dash of Not-invented-here

The pragmatist will never really fundamentally transform the situation because he always wanders from compromise to compromise. He is wandering from battle to battle. Now this will rarely win the war.

It seems that we are left between a rock and a hard place. One, the idealist, will never move anything but has the sense of direction. The other, the pragmatist, will move plenty but has no sense of direction so it will mainly be in circles. Let us turn our attention to a possible way out of this conundrum. The answer is a philosophical stance first attributed to John Dewey at the start of the previous century: Pragmatic Idealism. Well, duh. Was that obvious?

It is just as obvious as it is rare in my experience. Pragmatic Idealism is a term often used in international policy, but is enterprise architecture not often similar to just that? It posits that it is imperative to implement ideals of virtue (think perfect TOGAF governance and templates for any and all possible architectural artifact), but also that it is wrong not to discard these ideals and compromise at times in the name of expediency.

What does this mean in practice? Here are a number of principles to help you live by the ideals of pragmatic idealism (if that makes sense)

 

Have ideals and communicate them frequently. If we become too pragmatic we lose the purpose of being an architect. We have to remember that the direction has to be set by us and we need everyone to know about it. Even if it is not immediately clear how we will get there. We need to provide input on whether we should go all in on open source or whether Microsoft is a preferred vendor. Here one caveat is that we have to be very sure about the ideal, because if we first have started to communicate it there is no way back. You will lose all credibility as a visionary if you stand up one day and say open source is the way forward and the next you sign an enterprise license agreement with Oracle.

This means that you have to bring a very good knowledge of where your organization is and where it wants to go. Without a solid understanding of both, you are better off playing it safe and going with the flow. That said it should quickly be possible to pick up one or two key ideals.

Ideals are ideally expressed as architecture principles. I often use TOGAFs formula of Name, Statement, Rationale and Implications:

  • Name – Should be easy to remember and represent the essence of the rule
  • Statement – Should clearly and precisely state the rule. It should also be non trivial (“don’t be evil” does not pass the test)
  • Rationale – Provides a reason for the rule and highlights the benefits of it
  • Implications – Spells out the real world consequences of this

The first thing you should do then is to flesh out these ideals and create a process through which you can create buy in to them. Chances are that the organization already has some that you can work from, but make sure that they also align with what you feel they should be going forward.

Oh, and also, don’t have too many ideals, that is, don’t have too many principles. We are shooting for something around the “magical number 7 plus or minus 2” as the title of George Miller’s groundbreaking article had it. In this article Miller demonstrated that the number of different items of information optimal for being remembered was 7 plus or minus 2. While later research has shown that it is probably even lower, this is still a good rule of thumb. Ideally you would want to be able to remember it your self but, more importantly, you want everyone else to remember your principles as well.

 

Approach every problem with the minimum amount of energy and structure necessary

Say what? Now is he advertising laziness or what? Not quite, there is actually hard science behind this. We know from the second law of thermo dynamics that disorder is the only thing in the universe that comes for free and automatically. Conversely order requires energy. Any person or organization only has a limited amount of energy.  What this means is that the net effect of your architecture endeavors will be maximized with the minimum amount of order necessary. Consequently the more thoughtfully you can use that energy the more effective you will be.

In practical terms this means that you should not develop 25 item templates for 9 different types of meeting minutes if you are the sole architect in a 9 man start up. You are clearly spending too much energy. It may be your ideal to have a template for every purpose but maybe it can wait until the purpose actually arises. Similarly you should not do all your architectural documentation in your code if you are building an application with 100 million lines of code. Even if you your ideal is Lightweight Architecture Decision Records as Thoughtworks advocates as the highest ideal.

Every problem is different. The architectural skill you have to develop is to find out how important it is. The more important a problem is the more structure and energy it deserves. This is why documentation is higher in regulated industries like pharma and banking; it is simply a necessity to stay in business.

There are different ways to gauge importance. First of all, if something is recurring frequently, chances are that it is important. At least from the perspective of efficiency it is worthwhile to bring structure to frequently recurring events. This is why many people took the time to structure an email signature with their name and phone number. That way they do not have to write it every time someone needs it.

Secondly, important stuff is tied to the business model. If you are in banking, data management, access control and auditing is important. In this case you might want to bring as much structure and predictability to that as possible.

 

Make every compromise count. You have to make sure that the ideals you are following are known and that every compromise you make is registered as such by the people on the ground. If no one knows the direction, then we are just back to basic pragmatism where everything is just another step in a random direction. You have to make sure that every compromise somehow leads to a larger goal.

If you want to move to a cloud first strategy and a given project has reservations about the cloud and wants to implement the solution on a local VM. Don’t just say ok, even if you think it is ok for this project. Make sure that you make clear what the advantages are and agree on non-trivial reasons why this particular project does not have to go to Azure or AWS. Sometimes a compromise can also be used as leverage for other architectural decisions, since people know you are there to implement ideals. This can even work doubly to your advantage in that you are seen to be pragmatic and possible to work with and they will feel like they owe you, or at least be on friendly terms. But beware, because it may just as well be perceived as weakness if there isn’t a good reason for the compromise.

 

The path forward

The world is divided into idealists in ivory towers watching and shouting and pragmatists scurrying about in their trenches as rats in mazes, but only the pragmatic idealists can effect real change towards the better. If we lean toward one we should try to be aware of the merits of the other. I have given a couple of principles I have found helpful: Have ideals and communicate them, approach every problem with the minimum amount of energy and structure and make every compromise count. There are many more ways to affect change since it all starts with having a pragmatic idealist spirit.

 

Architecture – Turning Fiction into Fact

I am an admirer of my compatriot Bjarke Ingels who is a real architect. His buildings always stretch the boundaries of the possible. For example, can you create an idyllic ski slope in a flat country like Denmark and put it in the center of a  city with more than a million inhabitants? Sure, just put it on top of an old power plant that needs re-building, and oh, maybe you could have the power plant’s chimney puff smoke rings? As Ingels puts it:  “Architecture is the fiction of the real world”   and this is what he did.

But buildings rarely exist in isolation; they are usually part of a city. Ingels continues: “The city is never complete. It has a beginning but no end. It’s a work in progress always waiting for new scenes to be added and new characters to move in”. While Ingels is talking about real world brick and mortar buildings and other constructions, there is no reason why this would not also apply to IT architecture as well.

This quote also applies to any modern enterprise. There will always be an IT landscape and it is always a work in progress. You will never finish. The only thing you can do is to manage the change in a more or less efficient way. When we create the IT architectures of the future we are in essence turning the fiction of user stories and personas into new scenes and characters of this ever evolving city. Our architectures will be evaluated on whether how real characters will inhabit the structures we create. Will our designs turn out like the Chinese ghost town of Ordos or the smooth coordination of the Tokyo subway. Just like the buildings and towns we create will only be successful if they become liveable for the people they are meant for, the IT systems we build will have to fulfill the functions of the users and the surrounding systems.

The Frontier of Imagination

Typically, we will ask people what they want and document this as requirements, user stories or use cases. This is all well, but if Ingels had gone out and asked the people of Copenhagen what they wanted we would have gotten more of the same building blocks and villas that are already prevalent. There would be no power plant with ski-slope simply because people would never have thought about that. If Steve Jobs and Henry Ford had just settled for what people wanted, we would still be speaking in Nokia phones hacking away at clunky black computers and riding in horse drawn carriages. We cannot expect our users, customers or managers to have the imagination. This is something we as architects have to supply.

The real frontier for IT architecture is imagination. We need to be able to imagine all the things that the requirements and user stories don’t tell, we need to be able to be bold and create stuff that no-one ever asked for. This is difficult for multiple reasons. First of all an architect is usually measured on how well the solution he designs solves the requirements put in front of him or her. That means there is little incentive to do anything more. Second, it is often difficult to gauge what would be needed in the future. Third, there is a tendency towards best practice and existing patterns, which does not further innovative solutions.

An Example

However, these obstacles to imagination can be managed. As Ingels has shown, it is sometimes possible to cover all the basic requirements, in a cost effective way that does the same or better than traditional solutions. The same is the case for IT architecture.

Let’s look at an example: as part of the continuous evolution of the IT landscape architects are often asked to re-architect legacy solutions. Some legacy solutions build on costly message queueing software. These can be upgraded and replaced easily with similar solutions. You can also usually show some cost reduction and performance improvement by shopping around but at the end of the day you would still just have the same basic functionality and missed out on an opportunity for improvement.

Let’s take a step back and consider what this architecture does and what it could do. Basically it moves data between endpoints in a secure way without ever losing messages. Now we have to ask ourselves, can we do this in a better way? One obvious way to never lose a message would be to just store it permanently. So, when a message that would usually be written to a queue comes in we will now store it on a persistent medium instead and keep it in as long as it makes sense. We can impose a retention schedule to move the data between different types of storage, that is from hot to cold storage and delete messages if we ever get tired of having them around. The first step is therefore to catch all data and just store it.

Once the data is stored we still need to get the data to target endpoints. Instead of sending messages on a queue we will just send a pointer to where we stored the message as an event. This will inevitably be a smaller and more uniform message than the traditional message in a message queue. When we have small and uniform it is a lot easier to optimize everything from speed to size etc.

This move also allows us to open up for different ways to consume the event. Now the target endpoints are not forced to have a queue client from the queue software vendor. They could just as well choose to have a REST interface, ODBC or Kafka. Our event generator just has to be able to connect to these different types of interfaces. That means it has to be able to call a web service, write to a table through ODBC, publish to Kafka or any other type of type of endpoint that is relevant.

The target endpoints now have much more freedom in how they receive notifications and how they will handle them. The center just has to have a number of different channel adapters for the different types of end points. These should be simple and easily configurable since the message is always uniform in its format and size.

Target endpoints now have to retrieve the payload from the message store based on the metadata in the event. Easy, we just put an API in front of our message store for them to call with the message ID. This API could be a web service or an ODBC connector or just a crude URL offered with the payload.

This approach lets us build a solution that does the same as a traditional queue, take a message from a source endpoint and ensure guarantied delivery to a target endpoint. Only we are able to do it with lower latency, more cost efficiently, we have made sure archiving is a built in feature. If something fails the process can be retried any number of times, since the message is never dropped from the queue and there is an API always open.

This architecture supports the basics but also allows us a number of new possibilities. Everything that has passed through our messaging system is now available through an API. The target end points can call any subset of messages on the subscriptions they follow at any point in time as opposed to the traditional approach where something was on the queue and once it was read it was gone.

We can even create a search API for messages in our message store. If we need to find some particular message we can now let the target endpoints do that automatically. This can be expanded further because we might have created a Hive table on the message store, so now we can access all the data that went through our pipeline through SQL/HiveQL. In the traditional world we would have to set up a separate solutions to aggregate messages to a different store and then ETL them into a Data Warehouse and create models exposed through BI tools for end users to gain access.

Turning fiction into fact

This solution turns a simple queuing solution into a cheaper, faster version of queueing that also suddenly is an API and Data Mart. This shows that there is no reason to be limited by the mental frameworks of legacy technologies that we are replacing. We need to think about the possibilities available to us today and imagine how we would solve the problems that legacy technologies solved given our current technology and the wider needs beyond this particular use case. We can, in fact, turn fiction into fact, sometimes we just have to be bold and let go of best practice and received wisdom.

 

 

 

How to come up with a product that is truly unique

How do you come up with a product idea that the whole world is not already selling? This is an interesting question that I think every entrepreneur asks him or herself regularly. I don’t have the answer for it, but I can tell you something about how to end up with the answer.

Ban TechCrunch 
The first step is to stop reading start-up media. Any start up media! That’s over – period. These just promote Groupthink and turns your attention to products and services that everybody is already doing. This is why the world is flooded with instant messaging and photo apps and to-do lists.

Think of it as entrepreneur information detox. You need to get it out of your system. If you absolutely need to read something, read something that nobody else reads. I can recommend Kafka’s short stories, Thomas Tranströmers poems or Mike Tyson’s biography.

If you have special knowledge…
Do you know something that most other people don’t? Have you worked in a niche? If you do then think hard about how to leverage that knowledge for a product or service. Is there some problem that is frequent in this special area you know, preferably a problem that someone would pay to get rid of. If that is the case you have your first lead there.

If for example you work in a cinema you may have noticed that it is a problem to clean chairs quickly enough between showings if somebody spilled something. Maybe the solution is a special coating for the chairs, maybe a cover that can be changed.

A good example of a company that did this is Zendesk | Customer Service Software & Support Ticket System. Zendesk started from the founders’ working with customer support systems, which they found to be too complex and difficult to implement and use.

If you have no special knowledge…
If you don’t have any specialised knowledge, which is often the case if you are fresh out of school or have spent most of your youth playing Fifa, there are several options. Think about stuff that you absolutely wouldn’t like to work with. Stuff that would be really boring, disgusting or socially awkward. It should be something you would lie about it if you were telling about it on a first date.

Think along the lines of condoms for dogs, reading stories for senior citizens, avoiding sewage blockage or code review. Now come up with a product/service that would make this thing easier.

“But why would I do something I don’t want to do?” you may ask. The thing is that this is usually a good indicator of what other people think as well and that is where you have the opportunity.

One of my favourites in this area is the company The Specialists who employ people with autism to do tasks that others find tedious like testing. What is incredibly boring or difficult for other people is something they like to do. Another example is Coloplast who makes products for continence care. Essentially they just make plastic bags, but for a special purpose.

Go datadriven
Another option is to find some way to pick up on a demand that is currently not well served. It could be selling niche stuff on amazon, which can be amazingly lucrative (see this thread on Quora). There are even tools for discovering such opportunities like Jungle Scout (Jungle Scout makes product research on Amazon EASY). But there are also other general SEO tools that can give you the same effect like Moz.

Get out into the world..
Now that you have some vague directions you have to go out into the world to find out how to build a business model around this. This takes research about the users and customers, but also about competitors and suppliers. Strategyzer | Business Model Canvas is a good short hand for figuring out what to think about and where to go.

Lean start up, MVP etc…
I’m not going to go into more detail about this here. A quick search will flood you with quality material on how to build a product from an initial idea and turn it into a success.

Building a Product Strategy for a Backend Product

When you learn and read about product management you will quickly learn how important it is to engage with your customers, be agile and make experiments, but when your product is a back-end system with no end users, but just other applications and it is considered key infrastructure that others depend on to work in a predictable way, it is not so easy to be agile do A/B tests lean start up style experiments and user testing.
This is a classical problem and one very often ignored in product management literature. Here it seems always to be about products that have users that you can sit down and talk to and learn what to do. There are however a few things you can do if you are the product manager of a back end product and need to build a product strategy.


Align Strategy

It is necessary to sit down and look at all the consumers of your product. They are essentially your customers. That means identifying all other products that depend on or will depend on your product. Unfortunately product managers don’t always have a strategy. Then you need to look at other artefacts like road maps, visions, even marketing material. It is also a good idea to talk to them to understand where they are moving. Here you actually don’t need to concern yourself with the end users.

Once this is done find out what the strategy is for their product. Doing this may uncover some contradictory demands. One product may want you to focus on microservices another on batch deliveries another wants a message based architecture. Some may prefer REST/JSON type services, others SOAP/XML and others just FTP/CSV in a scheduled batch. Welcome to the world of Agile development where teams decide inside their own bubble what would be most agile for them.

Unfortunately it is your problem to reconcile these differences with the different consumers. In order to do this you need stakeholder management.

Manage Stakeholders

It is necessary for you to chart the different stakeholders and weigh their importance and actually do a typical stakeholder analysis where you find out what their interests are and how you should communicate with them. Unfortunately most product managers leave it at that and forget the art of stakeholder management. In the best case they will fill out a stakeholder analysis and store it on their harddrive never to be opened again. But stakeholder management is more like politics. Watch Game Of Thrones or House Of Cards for inspiration.
You have to understand the different fractions and their powerbase. Understand the different persons their culture. You have to lobby ideas, be the diplomat, explain the positions of other stakeholders. Look at key persons social network profiles in order to find out what type they were, where they live, what they do in their sparetime. Understand their concerns, apply pressure when needed and yield when it is necessary. Remember politics is all about compromise. But you can only do that once you have a plan.

Draft a plan

All the input you have got from the above points now has to be integrated with your own knowledge about the product. What are the possibilities, the technical limitations, technical debt etc? Given your knowledge of the status of your product and the possibilities and available resources you have to plan for how it should change. Draft a plan on a few headlines. Focus for example on capabilities you would like to develop, data you want to capture or ways of working with consuming products. Find out only a few key goals you have, but have suggestions for more.

Reiterate

Now, start over again, because product strategy, like any strategy takes time and you need to form a coalition behind it if it should succeed. You are not finished until you have that coalition behind you. Not until then will you have a proper product strategy.