Big Data From a Product Perspective – Different Views

The hype surrounding big data at the moment is reaching a climax. While it is evident that we have more and more data and that there are valuable insights hidden, the situation is different if we look at big data as the products that are actually offered.

If we look at big data from a product perspective I think the situation is a bit more mixed. As a product category big data is not yet mature enough to warrant these huge valuations, but that could happen if a couple of things happen. But first let us look at how big data is viewed by different groups

Different views on Big data
From inside the big data community the focus is on technologies like Hadoop, Hive, nosql databases and the companies supporting these technologies and a plethora of other more or less obscure (to the uninitiated of course) products that are part of the big data ecosystem. It is not closely related to business intelligence although it is vey much the same problem big data is solving.

If we look at how the media sees it we are looking at something similar to the invention of the wheel. Something that will have profound effect on human civilisation and the way we live for millennia to come.

Investors see big data investments like the quest for the holy grail (which might explain some of the silliness): Hortonworks has raised $248 million, Coudera $1,2 billion, Datastax $189 million, Elastic search $104 million, Couchbase $106 million etc. All of these companies don’t have a proprietary product, but support open source products. The business model is one of building closed source tools that let customers run the open source better.

The CEOs who invest in big data really just want a big pile of money. They are not interested in the curios patterns you can find like the correlation between search terms containing the word coconut and the migratory patterns of the African swallow. They see in big data a new way to make more money and just want to get to that immediately.

The CIO is usually completely sidelined in decisions involving big data. Maybe because he is increasingly becoming the custodian of legacy technologies, but the need for big data often come from isolated infrastructure projects or from business development.

Developers view big data as models of the real world with intricate detail like the matrix. Soon we will be able to model the entire universe and predict what will happen with big data technology.

What end users see is of an alarming complexity. You need to have semi programming skills in order to extract even simple queries. You also need to be adept at manoeuvring applications with hundreds of functions similar to the sys admin. This is often the case with open source development that usability suffers, because the community wants to take the product in different directions. Furthermore developers are users and they already know the product so there is no real pressure to make the product easy to use for the uninitiated.

What it really is? In the end big data may very well turn out to be just like the segway. I am not saying that it will only be used by mall cops and tourists, but rather that it might end up servicing very limited segments and industries with very specific needs.

Enter the genius – the five specialisations of the big data employee?
The problem today is that in order to get any value out of big data you need to be a virtual genius. you need to master at least four areas that are usually specialisations

First of all you need to be a developer. You might not need to code an actual application if you are just using it for analytical purposes, but you need to be able to write code to extract the information you need one way or another.
Second, you need to be an infrastructure architect and sysadmin because you need to set up a great number of servers and networks. You need to know about the multitude of different infrastructure elements.
Third, you need to be a database administrator. You need to set up databases and maintain them. You need to set up ETL processes, sharding and the like (you do not have to worry about database schemas though).
Fourth, you need to be a data scientist since you need to know a fair amount about machine learning algorithms in order to extract patterns from the data.
Fifth, you need to be a business analyst. If big data is to make sense from a business perspective it is necessary to understand the business model, the revenue streams, the cost structure etc. You also need to know a fair amount about the customers like what parameters to segment them by and what their pains are.

Naturally you don’t have to have all that in one person. In principle it can be spread across several employees, but quickly you will have to hire a complete team in order to just get started, although it is still difficult to find specialists that know just one or two of these things. On top of this you need very tight integration, because big data is more integrated than other technologies.

If you succeed with this the problems are not over unfortunately. Most organisations already have established procedures where work is split up along the lines mentioned above. You have application developers, operations, DBAs, analysts and business developers. Each department has it’s own governance and procedures describing hand offs to other departments. Now you are asking the organisation to circumvent all of these established procedures.

So big data products still have a long way to go before they are ready for the mass market and the really big bucks.