05 Mar 2018

Dark data – the big data you didn’t know to worry about, but should

For anyone still trying to wrap their minds around the volumes of Big Data being left unattended in their systems which might help improve offers and personalization, we have a new thing to lose sleep over: Dark Data.

This substrata of information on interactions relevant to your brand can yield important insights on emerging markets, customer preferences, and interest building for new products and services, but because it can exist in a gap between stored data and active search it can be lost in a matter of moments, and all of its revenue-building potential disappears with it.

For those who hold to the notion that data is the new oil, mining for Dark Data is the new fracking. Barcelona-based technology company Datumize has come up with this analogy to explain what is at stake, estimating that as much as 65% of useful competitive information is trapped or lost in “Dark Data” pockets.

Tnooz spoke to Datumize to better understand what all this means and how some travel companies are already busy fracking.

Javier Uriagereka, sales director of Datumize says that the source of Dark Data is in the billions of data exchange transactions that happen every second online.

Systems are mostly set up to capture structured data, and in some cases a certain amount of unstructured data, but capturing and analyzing all of these exchanges requires different technology. Datumize found that systems otherwise used by national security agencies, which gather a big picture view of direct and indirect exchanges of structured and unstructured data can be useful for corporations as well, and deployed relatively cheaply.

As Uriagereka says:

“The question is that if we are storing millions of terabytes, at the end we are not ensuring whether this data is worthy of being in storage or not. There is a lot of information that we are generating in the organization that we are not capturing it or processing it because it is very difficult. Maybe that’s because of closed protocols, or because of the risk of going inside our mission critical systems. We are not able to modify those systems because of having any stop in our day-to-day work.

“We found a technology that is basically the same technology that spy agencies and cybersecurity companies are using..and we are using it to monitor all of the network. We are not spying on any one. We are storing all of the traffic happening in a network and capturing all data flowing inside the networks—wifi networks or traditional networks. All of this data is traveling between and within the systems.”

big data

It’s called Dark Data merely because it is not highlighted in the company. It exits unseen. What these systems do is to capture the information in a way that eliminates the risks to operational systems while analyzing and formatting in a way that makes the information easy to digest, revealing its implications for the business. There’s a lot of it to play with, Uriagereka explains.

“Gartner said that over 65% of the data in the company is dark. IBM says 80% generated in the company is dark, so we are losing that information.

“It is in the systems, but we are not able to capture it because we are losing it, or because it is transitory information. In the digital world, millions and billions of transactions are only there for half a second and that information is not stored anywhere..we just lose it.

“Anytime you go on a metasearch engine and you look for a flight from Copenhagen to Madrid, for example, you go into Kayak and put in the origin and destination and you see, in a couple of seconds, that Kayak is looking at two hundred airline companies and presenting five pages of different results. This information is not stored anywhere. You lose it when you go out of Kayak. But it’s very interesting information because the company can understand what customers are demanding, what people are looking for, where and when they want to travel.”

Practical Dark Data applications

Datumize has put its Dark Data Observer and Kosmos systems to work for undisclosed clients in hospitality and in the airline industry. Their case studies reveal how useful recovering Dark Data can be to travel industry enterprises.

The hotel group used Datumize Observer to track guest activities and movements, for a more accurate view of how guests use facility services like breakfast and wifi, to determine peak times for backlog at check-in and to understand how guests use the hotel app. It is using the information to better allocate staff, improve ancillary offers, determine necessary improvements to wifi infrastructure in different properties, and optimize its mobile application.

The study attributes a client satisfaction increase of 2.3% to this system as well as RevPAR increase of 3.3%.

The airline used the Datumize Kosmos system to get a holistic view of flight search through the web, GDS, meta-search and OTAs. By gathering all that lost search information, was able to identify air services and routes that were desired but unavailable, as well as better gauge fare competitiveness and demand for ancillaries like hotel and car rentals, as well as get a view of daily airport traffic to quickly measure on-time performance.

The Datumize system found that 29% of searches were for currently operable routes on which the airline offered no flight services within the schedules requested by customers, and 24% were for flights on routes that the airline did not operate at the time. As a result, the airline was able to evaluate its flight schedules to better meet customer needs and improve bookings, and to identify high-demand destinations for potential launching of new routes. The study attributes a 4% impact to revenue as a result of capturing this dark data.

Technical losses

Technical failures built-in to operating systems can also cause a Dark Data seepage. Uriagereka tells us:

“We’re working with a tour operator receiving about one million requests per hour and because of integration issues they are losing 64% of the requests. Two out of three requests have a technical issue. That doesn’t meant that you don’t have offers, it means that customers never receive them. Imagine that you are losing 64% of the requests, that’s a lot of money.

“Business metrics, are very important but also technical metrics. You might ask, how are companies not realizing that they are losing 65% of their requests, but that’s because they receive over one million requests per hour—that’s a small operator. A big operator, like TUI or Globalia or Thomas Cook, receive twelve to fifteen million, even twenty-million, requests per hour.

“It’s absolutely impossible to process this information manually, and also very difficult to process this information automatically. We are able to capture in real time, those millions fo requests, and understand the dialogue—the request and also the response that we offer—and analyze it from a technical and a business point of view.”