Tim’s recent post quotes Marissa Mayer at Google acknowledging that data is critical to Google’s ability to provide more and more contextually relevant web services. But, the most striking revelation may be that Google is providing 411 services in order to capture phonemes that would enable it to do a better job of searching video (presumably by using voice recognition on the audio track). If this is true, it means that they are providing a fairly complex and expensive service simply to capture a data element that would help them to provide another service. Looked at through this lens, Google is one huge data magnet. All of the services they provide are collecting massive amounts of data.Tim O’Reilly has been saying for several years that data is the Intel Inside of web services. I am not sure the analogy is completely accurate. The microprocessor plays a different role in a personal computer than data does in a web service. But it was a catchy line, and it made the central point that data is the heart of a web service.
I do not mean to imply that there is anything wrong with this strategy or with Google’s motives. It is only by collecting this data that they can provide many of the services they offer. Most of the data they collect is not that personal – collecting the inflection in my voice when I make a 411 call (my phonemes) does nothing to harm me – I had no plans for those phonemes anyway. I am happy to contribute them. Many of Google’s other uses of data are equally harmless. Google’s use of my click-through patterns to increase the relevance of future search returns or to adjust the presentation of paid search text ads is fine with me, but there is something here that I think we need to pay attention to.
Data has this really weird quality. In economic terms data has an increasing marginal utility. Anyone who took Econ 101 knows that most physical objects have a decreasing marginal utility. When it is raining my first umbrella keeps me dry, a second may be handy if the first blows out, but a third is unlikely to be used. This is true of shirts, steaks, houses, of almost anything you can think of except data.
Data has the opposite characteristic. Each incremental point of data adds value to the ones you all ready have. It is easy to see this in the context of an advertising network. If the ad network knows that a user is female it can show more relevant ads. But, If the ad network knows that female’s age, it can do even better, and data about location, household income, and recent web sites visited all add value to the existing data points, making it possible to show more and more relevant ads. Google’s services all benefit from additional data albeit in different ways.
So what does all this mean about the market for web services. It means that we all need to to think about the degree to which Google’s enormous data asset will allow it to dominate this important sector
We have for example, been paying a lot of attention to services that help users discover new information and to filter the information they are already trying to consume. The young companies we have looked at in this space all approach the problem differently but they all depend on amassing data about users reactions to information and services in order to improve their ability to anticipate what a user might be interested in seeing. When Google announced their recommendations feature for Google Reader, we had a flurry of discussions about how this would impact the opportunity to provide discovery services. Google’s recommendations feature itself was not that impressive, or immediately useful, but just the way Microsoft’s entrance into a PC software market (often with an inferior product initially) changed the prospects for a startup, Google’s addition of recommendations to Google Reader is a shot across the bow of anyone in the filtering or discovery business. The source of the threat here is a data differential. Google has so much more data at their fingertips that even if a startup does a much better job leveraging data to deliver recommendations, Google could potentially provide a better value proposition to the end user with an inferior algorithm powered by more data, sourced from a broader range of services.
I have to admit that I do not know yet how dominant Google could be in web services, or if their dominance would dampen innovation and hurt consumers, but my bias as a venture capitalist is to believe that innovation thrives in small businesses and is often muted in large organizations. So, I think it is time that we all began to think about how to promote innovation in a world dominated by Google’s massive data store. Open source and the shift to a web based applications architecture reduced Microsoft’s influence and enabled a new round of innovation on the web.
Google understands the leverage of data. In the one area where they do not have the largest data asset, social networking, they have launched the Open Social initiative to try to make that data accessible to them and to others. It will be interesting to see how Open Social plays out. I am not convinced that it will alter the balance of power in the social networking space. Open source by itself did not have a huge impact on Microsoft. It was open source in combination with a platform shift from the PC to the web that opened up innovation on the web. Microsoft still dominates the PC platform. We need an open data movement, but that may not be enough. We may also need a platform shift. The web seems so much like an end state that it is hard to imagine what that platform shift might look like or when it might happen. I am not going to predict the nature or the timing of this platform shift, but I will point out one thing. The data that drives all of the most valuable web services is contributed by users as they interact with these services. The shift that unlocks another era of innovation will occur when users to understand their role in this ecosystem and have the tools at hand to direct what is now an unconscious contribution in a way that insures continued innovation on their behalf.