I was going through Techcrunch’s coverage of Jeff Jonas, Chief Scientist of the IBM Entity Analytics group and his concept – BIG data. I think this is a wonderful way to look at how we make sense of the over abundant flow of information around us.
Addressing the question what is data, Jeff says organizations are as smart as what they know. And what they know comes from data – structured or unstructured – which in turn forms their perceptions. The smartest any enterprise can be is the net sum of its perceptions, he says (one could argue it should be connections, not data). He includes observations as data and talks about the insight that is possible when these observations are linked and analyzed.
This piece was interesting. He says, almost all data starts out as unstructured, it requires humans to place them into a structure. That may be widely obvious, but useful to take a second look at and ask if the way we structure data actually creates the problems we have today. It is actually a problem to piece together conversations in Twitter, for example. But Twitter structured it in a particular manner already echoing what it thought was a monologous mode of interaction with a network. I compose a status message and tweet it. I retweet a message. I place a hashtag to organize it. I make a list to follow. I reply to a message. All these are essentially one-way communications, like email.
Talking about why more data makes us ignorant. Data is growing faster as computing power, connected-ness and bandwidth continue to rise exponentially far exceeding enterprise ability to process it. This is a widening gap. And the impact is even more crucial because data is segmented into silos making it quite impossible to connect multiple sources of data to make valid inferences that increase efficiency and adaptability. One important thing he highlighted is data amnesia (it happens to me very regularly) – a bank he was already doing business with called him 6 times to get him to sign up – they did not know as an organization that he was already a customer. With the explosion in data, there is a decreased ability to analyse and act on analyses, there are less and less humans in comparison that can do this.
On being asked if Google isn’t really already the king in that space, he said he thinks Google is a giant pixel sorter (every searchable object is a pixel…). In the sorting, it misses things such as local context. Maybe we need to have services layered on top of search. And that is the thing I really like – I call it reversing the search – pushing context to identify relevant information rather than the other way around.
What is most interesting is how he talks about Big Data – technologists are building systems to analyze individual puzzle pieces (or pixels). But what would you do with a puzzle piece? The real thing is how to fit that puzzle piece into the puzzle. The last few pieces in a puzzle are easier to do than the first few. This is context accumulation – context is hugely important to accumulate in systems that analyze data and make better predictions.
On privacy and freedom, while all this data collection and subsequent predictions may offer better experiences, but at the same time awareness about what the customer is giving up in making the choices she makes about privacy needs to be significantly raised.
This is not the semantic web or linked data. This is not so much about linking data or describing data (or relationships) than about trying to view data in context.