I was invited to make a statement to a “European Commission Public Consultation” on Open Research Data which took place on July 2 at the European Commission in Brussels. All in all an interesting venue (whose weakness was clearly the 5 minute statement format) which gathered a good sampling of voices. I was told that this was a consultation which is to be understood within the Commission’s Horizon 2020 initiative. In the words of the Commission itself:
Horizon 2020 is the financial instrument implementing the Innovation Union, a Europe 2020 flagship initiative aimed at securing Europe’s global competitiveness. Running from 2014 to 2020 with an €80 billion budget, the EU’s new programme for research and innovation is part of the drive to create new growth and jobs in Europe.
I made a few points the most important of which are:
- Expecting, forcing, or mandating a “culture of sharing” among researchers (while not a bad thing in and within itself) is a naive and largely inappropriate and useless instrument for “opening data”. It is unfortunately a popular illusion.
- Data is our insurance again the pollution of our fact space with noise coming from “stylized facts”, “popularized science” and “politicians’ needs for scientific policy making”. In that sense the only useful definition of what open data means (and it doesn’t necessarily mean free) is that research whose data is not available (i.e. open) to research peers is simply scientific hearsay.
- One should make sure to not overburden researchers or we are in danger of chasing the last remaining talent away from research. This is an underpaid profession to begin with. The more we dump on the researcher the more we empty the researcher pool of talent. Ending up with open but useless data is definitely not what we want.
- We need a new class of professionals, namely domain aware data workers. They are part data, part IT and part subject experts much like paramedicals and paralegals. Considering the amounts of large amounts of data available and the even larger amounts we need we are missing a whole class of experts capable of catering to the data users i.e. the researchers.
- Large monolithic central archives are not the way to go as they are vulnerable to adverse shocks (Library of Alexandria), miss domain expertise and are dysfunctional. It is a safer bet to establish distributed, domain aware, small but federated repositories capable of withstanding shocks and able of documenting and teaching their data for re-use.
- Fostering research can only be done by fostering researchers. The Commission needs to hire and promote in its ranks, researcher advocates.
- Social sciences are underfunded contrary to large physics projects like the CERN (every particle collision burns several dozen surveys). In such projects it probably makes sense to have large data repositories. In Social Science we are missing data. Some exists and is not open (administrative data) some does not exist (economic measurements are in bad shape and do not use advances in technology).
Besides that I heard two things which are typical of this type of meetings and they should be eliminated from the way we discuss data:
- It is often the default assumption that researchers are charlatans. Some of the people in the meeting went as far as to say that the data is the “real science” and the researcher just provides the prose to bury the often poor facts in. These generalizations are dangerous and the Commission should make sure to signal that and put money in showing it understands this. If not bad days are ahead for science. Data is often delivered by sensors and research is way more than orthogonal tables with numbers in them. The stars of open research data are not the data librarians but the researchers. If we forget this we are not serving science.
- Many of the people who run repositories are willing to put the survival of their gig above the need for a viable solution. To them open means that they are the ones locking the data up in “their archive” and promising us to keep it open. Here too the Commission is better off cutting losses from fruitless efforts (if necessary) and trying something different.