Open data in Luxembourg, strategy and best practices (2012) chapter 4

From Wicri Luxembourg (en)

Estimating the impact of open data raises several problems: it must of course distinguish the economic consequences of political or societal impacts, less easily quantified. Most analysts recognize the economic leverage of open data, such as Gartner, "Gartner Says Organizations Big Data Makes Smarter, But Makes Them Richer Open Data" but the level of this impacts gives many different estimations. The question of the economic impact could in principle be limited to a balance of costs and revenues. But even for that, the perspective to adopt, the method of calculation, and the inputs to take into account are not always shared. Because of their anteriority, PSI provide some enlightening elements of comparison with sufficient experience in the long term, since the issues are discussed in quite similar terms. It is even an essential aspect, which has a decisive role in the launch of the European regulation of PSI from the finding of the PIRA study in 2000, with investments that are two times higher than the European Union in PSI, the United States generate revenue forty times higher.

Analysis framework

The general model used here is taken from an Australian report proposing a paradigm to calculate the benefit, taking into account the point of view of data producers and data consumers. This ensures a more sustainable economic model, it is necessary not to limit the equation to producer: the degree of reuse is directly related to the cost of accessing to data. There could be a seemingly balanced model for producers, but if but if it is not suitable for consumers, it will eventually fail. Morevoer this model includes savings by different stakeholders under profits, choice shared by most of the analyses on the profits of open data.


The case of PSI shows also the great difficulty to find an analytical framework accepted by enough stakeholders to debate under homogeneous conditions: “How are the costs determined? Do you include the lights? The air conditioning? The price, then, is always a political decision—and arbitrary. According to economists, there is no way to price information in an objective manner.”

Cost Definition Stakeholder involved Consequences of open data
"Collection/creation" Provider Weak
“Data assurance (quality, privacy…)” Provider Both. +: crowdsourcing. -: loss of incomes
"Curation" Provider Weak
"Dissemination" Provider Increased costs
"Transaction" Part of dissemination costs Provider/user Cost reduction
"Permission" User cost reduction
"Access" User Cost reduction
"Use" User Cost reduction
"Re-use" User Cost reduction


Elements on real costs

Platform Country Scope Scale Cost assessment
data.gov USA General National Around $10 million / year
data.gov.uk UK General National 2010-2011 : £1,2 million ; 2011-2012 : £2 million per year
data.gouv.fr France General National €5 million / year
data.nantes.fr France General Local €100,000 (cost of the Portal)
portalu.de Germany Environment National €750,000 / year

There is no comprehensive study on the costs, there is not a detailed analysis showing the proportions of the different types of costs, but there are fragmentary evidences. In the United Kingdom, the platform has cost more than £1 million for 2010-2011 and £2 million in 2011-2012. Costs depend also of purpose of implementation model of open data: platform data.gov.uk costs around 2 million euros per year, while the ODI is funded at £10 million for the first year. In each ministerial department of the United Kingdom, cost is estimated between 53 and 500 thousand pounds. There are also issues of tension between economic costs and social benefits. For example, the security map does not provide tangible economic returns and implies a cost of 150,000 pounds per year, but it is one of the most used datasets, because there is a request for this kind of information from the citizens. It is necessary to distinguish clearly between economic investment whose return is expected quantifiable and social investment, especially when we try to meet the demands and needs of citizens, which made harder the assessment of the British open data platform .

Pricing models

By definition, open data is free. However, it is interesting to reuse what has been written about the PSI to understand that among the various options, open data is increasingly chosen as a solution and recognised as the model that generates the greatest economic impact. Debates about the ideological orientation of the open data have several consequences on the definition of pricing models. If one considers that taxpayers already fund data production, the duty of the government is to make them the fruits of these expenses. This kind of argument has been widely reported by the Guardian, which launched a well-known call for the opening of free public data : “Give us back our crown jewels” . Other arguments, insisting on the independence of data producers compared to governments and the need to maintain data quality, promote a profitable model.

The question of the economic model, in fact the question of the interest to pay for access to data, has led to several studies. The issue has been the best studied and theorized by Pollock [2008], even if he advocates the marginal cost option. His findings have been largely reused by other studies. According to him, economic models should take into account three types of costs : production costs (collection) costs of processing data from "upstream" or "raw" data in "downstream" data and dissemination costs. The choice of a raw data type (upstream) or a downstream type is important because the technical community in its most entirety is only interested by raw data to create digital services.

INSERER TABLEAU MODIFIE

This view reflects more general considerations on the role of the State in the production of data: the proponents of a liberal approach believe that it should be limited to the production and supply of raw data, leaving the production of services to the private sector, the boundary between the two being the level of the marginal cost: the higher it is, the more it should be left to the private sector . For example, and by contrast with current practices, the administration should continue to make the topographical surveys, but leave the map production to private companies. Exceptions to this principle are the cases in which the State would be more efficient, not only in terms of cost, but also the respect of privacy, public safety and the protection of the consumers.

If the State brings a competitive type activity (downstream data) from the public data without putting to the market the raw data underlying its activity, it may prohibit the access of firms in a competitive market and this directly contradicts the doctrine of essential goods. This would result in a monopoly position. The risk is particularly that data producers seek to report the cost of production of refined data on all the consumers by increasing too the cost of the raw data.

The study led by Pollock applies to show the limits of the models based on the average or monopoly costs, distinguishing four major problems raised by these economic models: questions of credibility of the producer about the sustainability of its pricing, the incentives to produce and reuse, the distortions of competition, and problems of information asymmetry. In cases where a payment model is chosen, experiments suggest that the revenue barely balance the costs implied by the payment infrastructure The average cost model means insoluble problems of information asymmetry, since it is impossible for a producer to observe the exact point on a demand curve, which would allow him to balance his budget. On the contrary the marginal cost model is not affected by this limit. It is mainly about the distortions of competition that the average and monopolistic cost models show a risk of inefficiency. These risks arise because of the dominant position of public actors in the production of public data, but also since the sub-optimality of these models do not address incentive signals to economic agents who might be interested by public data. On the other hand the marginal pricing model aligns exactly the price of the license for data reuse on the marginal cost of opening and dissemination. According to the model of Pareto efficiency, that is the optimal solution for the society. For Pollock, the marginal cost model is therefore essential and must be generalized. The only question that remains open is to find in what circumstances the average cost model should prevail over the marginal cost model or in which case the marginal cost is high enough to impose a modest fee other than the free access.

But more generally, there is not only the funding that is at stake, Pollock finally puts its role in relation to a good governance. The relationship between the various institutions must be designed to avoid any imbalances that may lead to over/under investment, for example.

The estimation of the costs of open data can also be based on the costs of platforms that operate on similar principles. To put online geographic data portals, the investments required represent around €200 000. The average operating costs are EUR 50 000 per year. For IGN, which meets the requirements of the INSPIRE Directive, investments were EUR 6 million and the platform requires 22 agents, generating an annual operating cost of 2 million euros.

About PSI, Frederika Welle shows how the Netherlands, going to a lower cost of buying and reusing geographic data, if not to the marginal cost, managed to get higher incomes. The original price of a million discouraged reusers and had recorded only one sale, when the passage of EUR 200 000 has achieved twenty sales, thus generating a higher income. This shows that the will to extract the largest possible income of the data can be counterproductive in some cases.

It should also be noted that the reform proposals of the European Directive on PSI are based on their provision at their marginal cost. This means that eventually, open data or not, any institution within Europe which provides data, will have to provide them free of charge or at a price which tends to zero . In the field of PSI, either because the producers have preceded the European evolutions, or because they have been convinced by the arguments for the marginal cost, the Deloitte study notes that of 21 organizations studied, 16 were already aligned with the marginal cost model .

The value of open data

It is necessary to evaluate the whole market of information, to understand the potential importance of public data within this market. The information market in France is estimated €3.7 billion, up 60% from public data . In Germany the market for geo-information increased from €1 billion in 2000 to €1.6 billion in 2006 . At European level, Vickery estimates the market for geographic information about 18 billion euros in 2009. However, many of these estimates are only the projections of national results found across European countries or somewhere else through the application of a percentage or multiplier.


The evaluation of open data is more complex than it is on a cycle (value chain production) that produces consequences in several stages. The Guide pratique de l’ouverture des données publiques territoriales proposes three layers:

  • The creation of products and services, commercial or not
  • The use of these services and the consequent production of wealth, of quality of life.
  • The return for the government that can be easily measured (e.g.: growth of tax revenue) or less (e.g.: increased quality of life).


Reuse

What said Tim Berners Lee about linked data can be generalized to the whole sector: it is difficult to predict the future uses: there is a part of serendipity in this approach. For a company, obtaining the data for free can release financial constraints related to the acquisition of data and enable a new centre of gravity in its business model, for example by reducing the price to be more competitive

If it concerns enough stakeholders in sufficiently varied domains, open data can change the economic landscape: "Pricing policy HAS direct effect on the efficiency, cost reduction and even innovation." One question may be the compatibility of Approaches like open data and open innovation, beyond the proximity of terms. The open innovation has mostly been mentioned in connection with the open source. It shares with this movement the idea of multiplying relationships by sharing documents and data, exceeding (but not cancelling) limitations imposed by intellectual property. This concept of sharing is essential, even if the dominant sense now taken by this concept is much broader, since it is characterized as any innovative contribution that does not come from internal processes of an organization. Whatever the definition, the open data is, by its various characteristics considered in other parts of this study, the mode of data dissemination that best suits the needs of open innovation, in particular its dimension of "collective", "distributed" or "cumulative" innovation. In the literature reviewed, open innovation is most often mentioned in connection with the open data in approaches devoted to smart cities. But more generally, open innovation concerns each domain of reuse: Innovation in territorial action is no more the fact of the sole public bodies, citizens and reusers can "implement ideas by themselves". However the lack of traceability of data prevents a precise understanding of the nature and extent of reuse and therefore a large part of the value created. Hence mobile applications, whether they represent only a small part of the possibilities of open data, focus for the moment most of the attention from open data owners.

According to a study led by Deloitte, the global market for mobile applications, which are an important vector of open data, should represent 35 billion in 2015, €8.4 billion only in Europe. The study attempts to estimate the percentage of applications using PSI or open data among applications. The situation is actually very dependent on the type of applications: from 100% to 12% Weather applications for motion applications and 8% for financial data .


INSERER TABLEAU MODIFIE

Moreover the economic model for developers asks sustainability issues at the local level. In Rennes Métropole, an agglomeration of 400,000 people, the number of downloads has been estimated at approximately 5000 per application, which is not enough if we want to involve developers whose open data would be the core business income. Advertising is limited to so that the initial investment required thousands of euro . As the model is especially suitable for creating applications by volunteer developers. But there are problems of sustainability in the long term, these developers ensure they making. The analysis of the applications available is limited by the features of platforms: App Store (for iOS) does not give the number of downloads, while Google Play (for Androïd) gives an indication, and at least one application exceeds this threshold, between 10 000 and 50 000 downloads. For there are economies of scale that permit a profitable model, local communities have also to open equivalent datasets.

Estimating the value of open data

There are several approaches to measure the value of PSI and estimate the potential benefits of access and re-use to data. They include top-down or bottom-up econometric modelling, which explains the large differences between extrapolations based on surveys of PSI producers and users scaled to national or regional markets, estimates based on agency costs and consumers’ willingness to pay.

There are two ways to calculate the value of open data: a positive approach that involves interviewing stakeholders on the value generated from the open data and adding these figures, and an approach based instead on consequences of not opening.

The studies below are the most significant attempts to estimate the value of open data and PSI. Vickery presents a state of the art of different attempts to estimate the value of open data and PSI.

Study Author(s) Date Scale Geographic coverage Data type Benefits (/years)
PIRA Commercial exploitation of Europe’s public sector information 2000 European European Union General €68 billion
MESPIR Dekkers & alii 2006 European European Union General €27 billion
Review of recent studies on PSI re-use and related market developments Vickery 2011 European European Union General €40 billion at least; Direct & indirect impacts : €140 billion; Savings : €2 billion
The Value of Geospatial Information to Local Public Service Delivery in England and Wales ConsultingWhere; ACIL Tasman 2010 Local United Kingdom Spatial £320 million; £600 million foreseen in 2015
Inquiry into Improving Access to Victorian Public Sector Information and Data Victorian Parliament 2009 Local Victoria State (Australia) General Seen as positive but not really quantified
The Socio-Economic Impact of the Spatial Data Infrastructure of Catalonia Garcia Almirall & alii 2008 Local Catalonia (Spain) Spatial Savings = €2,6 million /year; wider effects identified but not quantified
The commercial use of public information (CUPI) DotEcon 2006 National United Kingdom General £590 million; expect £1 billion if barriers were removed
The economic value of the Dutch geo-information sector Castelein & alii 2010 National Netherlands Spatial (whole sector) €1,4 billion

These studies allow to summarize some key findings. It is primarily the prevailing geographic data. Another report, from the University of Victoria, estimates that for every dollar spent on a total profit of $ 5. The ratio is 1 to 13 for geographical data, thus confirming the high economic value of these data . There are still significant differences between the studies, including within the same country. If the study "The commercial use of public information" made for the Trade Office in 2006 indicates a potential market of £1 billion, the British government now estimates that public information represented a market of 16 billion pounds in 2011 . At the European scale, European commission adopts the high estimate of 140 billion indicated by Vickery . At the local level, there are few figures available on the profits, partially because their economy is completely embedded with other geographical levels, which tend to drown the local effects of open data. Some communities give some elements about their ROI. But it is difficult to compute ROI. For the first contest about open data, Apps for Democracy, the value of 47 applications is estimated at $2 million, while the competition represented a cost 50,000 euros. Organizers could accordingly announce a ROI of 4 000%. But this is a very partial approach that gives at most the contest ROI, and not that of the open data platform.

Ultimately, there is the idea that is not open is also a cost. This approach in negative, taking the absence of benefits or absence of cost savings for a cost, has several aspects. For public institutions, it includes productivity gains and cost savings that open data could permit to achieve. For example, in terms of efficiency gains in existing operations, improving accessibility of information necessary for obligatory environmental impact assessments could potentially reduce EU27 costs by 20% or around €2 billion per year . The study the most representative of this approach is The commercial use of public sector information, which assesses the loss of income at 520 million per year, when the market value of PSI is then estimated at £590 million with the following distribution:

  • Unduly high pricing : £20 million
  • Distortion of competition : actions of PSI holders which restrict the access of the reusers to information : £140 million
  • Failure to exploit PSI : benefits are limited when all raw data is not available for reuse : £360 million

There are also the cases where data are collected and illegally reused. There are many costs for the institution or public enterprise concerned: costs incurred during any judicial proceedings to remove unauthorized reuse, damages in terms of public image because reuser is often an individual or a small structure and this is equivalent to oppose innovative activity. It must also be added the costs related to opening initiatives that can be done in a hurry way and therefore did not bother to optimize the opening process and looking for suitable partners. For example, the announcement by a group of activists of the opening of the Deutsche Bahn data on their own initiative undoubtedly precipitated the announcement of the partnership between the German transport company and Google to open its datasets in the GTFS format.