Staff Writer
Columbus CEO

c.2013 New York Times News Service

SAN FRANCISCO — David Soloff is recruiting an army of “hyperdata” collectors.

The company he co-founded, Premise, created a smartphone application now used by 700 people in 25 developing countries. Using guidance from Soloff and his co-workers, these people, mostly college students and housewives, photograph food and goods in public markets.

By analyzing the photos of prices and the placement of everyday items like piles of tomatoes and bottles of shampoo and matching that to other data, Premise is building a real-time inflation index to sell to companies and Wall Street traders, who are hungry for insightful data.

“Within five years, I’d like to have 3,000 or 4,000 people doing this,” said Soloff, who is also Premise’s chief executive. “It’s a useful global inflation monitor, a way of looking at food security, or a way a manufacturer can judge what kind of shelf space he is getting.”

Collecting data from all sorts of odd places and analyzing it much faster than was possible even a couple of years ago has become one of the hottest areas of the technology industry. The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.

For the last few years, insiders have been calling this sort of analysis Big Data. Now Big Data is evolving, becoming more “hyper” and including all sorts of sources. Startups like Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act.

A picture of a pile of tomatoes in Asia may not lead anyone to a great conclusion other than how tasty those tomatoes may or may not look. But connect pictures of food piles around the world to weather forecasts and rainfall totals and you have meaningful information that people like stockbrokers or buyers for grocery chains could use.

And the faster that happens, the better, so people can make smart — and quick — decisions.

“Hyperdata comes to you on the spot, and you can analyze it and act on it on the spot,” said Bernt Wahl, an industry fellow at the Center for Entrepreneurship and Technology at the University of California, Berkeley. “It will be in regular business soon, with everyone predicting and acting the way Amazon instantaneously changes its prices around.”

Standard statistics might project next summer’s ice cream sales. The aim of people working on newer Big Data systems is to collect seemingly unconnected information like today’s heat and cloud cover, and a hometown team’s victory over the weekend, compare that with past weather and sports outcomes, and figure out how much mint chip ice cream mothers would buy today.

At least, that is the hope, and there are early signs it could work. Premise claims to have spotted broad national inflation in India months ahead of the government by looking at onion prices in a couple of markets.

The photographers working for Premise are recruited by country managers, and they receive 8 to 10 cents a picture. Premise also gathers time and location information from the phones, plus a few notes on things like whether the market was crowded. The real insight comes from knowing how to mix it all together, quickly.

Price data from the photos are blended with prices Premise receives from 30,000 websites. The company then builds national inflation indexes and price maps for markets in places like Kolkata, India; Shanghai; and Rio de Janeiro.

Premise’s subscribers include Wall Street hedge funds and Procter & Gamble, a company known for using lots of data. None of them would comment for this article. Subscriptions to the service range from $1,500 to more than $15,000 a month, though there is also a version that offers free data to schools and nonprofits.

The new Big Data connections are also benefiting from the increasing amount of public information that is available. According to research from the McKinsey Global Institute, 40 national governments now offer data on matters like population and land use. The U.S. government alone has 90,000 sets of open data.

“There is over $3 trillion of potential benefit from open government economic data, from things like price transparency, competition and benchmarking,” said Michael Chui, one of the authors of the McKinsey report. “Sometimes you have to be careful of the quality, but it is valuable.”

That government data can be matched with sensors on smartphones, jet engines, even bicycle stations, that are uploading data from across the physical world into the supercomputers of cloud computing systems.

Until a few years ago, much government and private data could not be collected particularly fast or well. It was expensive to get and hard to load into computers. As sensor prices have dropped, however, and things like Wi-Fi have enabled connectivity, that has changed.

In the world of computer hardware, in-memory computing, an advance that allows data to be crunched without being stored in a different location, has increased computing speeds immensely. That has allowed for some real time data crunching.


General Electric, for example, which has more than 200 sensors in a single jet engine, has worked with Accenture to build a business analyzing aircraft performance the moment the jet lands. GE also has software that looks at data collected from 100 places on a turbine every second, and combines it with power demand, weather forecasts and labor costs to plot maintenance schedules.

IBM also recently announced commercial deployment of software that learns and predicts the behavior of large, complex systems to improve performance while things are happening.

One customer, an Illinois telecommunications company called Consolidated Communications, uses the software to oversee 80,000 elements of its network, like connectivity speeds and television performance, for each of its 500,000 clients. IBM also announced new products it said would improve data analysis and make it easier for customers to work with different kinds of data.


Traditional data analysis was built on looking at regular information, like payroll stubs, that could be loaded into the regular rows and columns of a spreadsheet. With the explosion of the Web, however, companies like Google, Facebook and Yahoo were faced with unprecedented volumes of “unstructured” data, like how people cruised the Web or comments they made to their friends.

New hardware and software have also been created that sharply cut the time it takes to analyze this information, fetching it as fast as an iPhone fetches a song.

This month, creators of the Spark open-source software, which speeds data analysis by 100 times compared with existing systems, received $14 million to start a company that would offer a commercial version of that software.

ClearStory Data, a Palo Alto, Calif., startup, has introduced a product that can look at data on the fly from various sources. With ClearStory, data on movie ticket sales, for example, might be mixed with information on weather, even tweets, and presented as a shifting bar chart or a map, depending on what the customer is trying to figure out. There is even a “data you might like” feature, which suggests new sources of information to try.

The trick, said Sharmila Shahani-Mulligan, ClearStory’s co-founder and chief executive, was developing a way to quickly and accurately find all of the data sources available. Another was figuring out how to present data on, say, typical weather in a community, in a way that was useful.

“That way,” Shahani-Mulligan said, “a coffee shop can tell if customers will drink Red Bull or hot chocolate.”