Thanks to Michelle Willmers for co-authoring this blog.
The Research on Open Educational Resources for Development (ROER4D) project recently received the Open Education Consortium Open Resources, Tools and Practices Award in the category ‘Open Data’. The project was deeply honoured by this recognition, which provided an opportunity to reflect on ROER4D’s data sharing and publication journey. From the elevation of curation to a core project objective early in the project lifespan, to the formalisation of a Curation and Dissemination (C&D) team comprised of a Publishing and Curation Manager and a Project Curator, the ROER4D project has continually sought to professionalise its approach to data management and publishing. In the course of this process, the C&D team has gained a number of insights that we believe could be valuable to other projects in their data management activities.
In August 2015, the C&D team launched the ROER4D Open Data Initiative, which aimed to support sub-project researchers in making their data openly available through a co-ordinated, strategic open data publishing process. We published our first open dataset as part of a collaboration with Factors influencing Open Educational Practices and OER in the Global South.
We’ve had a number of insights over the course of implementing the Open Data Initiative that we’ve shared at webinars with colleagues at UCT and beyond. There is a growing expectation on the part of funders and institutional management that researchers incorporate research data management (RDM) as part of research design and funding proposal processes. Postgraduate students, in particular, are under significant pressure in this regard, and at UCT, RDM is now included in a new memorandum of understanding between students and supervisors. Engaging with the concept and technicalities of RDM is therefore becoming an integral part of professional research practice.
We would argue that the process of preparing one’s data for publication has benefits for all researchers, even if it does not result in an openly-licensed, published dataset. The data publication process contains within it the components of a measured, strategic approach to all aspects of RDM, from ethics and consent to metadata description and long-term preservation. Within this context, the comprehensive planning and rigorous curatorial work needed to publish data openly supports and elevates good research practice. The understandable concerns that researchers have about sharing their data can be harnessed to improve the quality and rigour of their data management and collection processes.
Data publishing vs data sharing
Over time, the C&D team has found it valuable to differentiate between sharing and publication, with “sharing” referring to internal collaborative activity within the project network and “publishing” referring to the processing, packaging and communication of content to outside audiences through the production of formal outputs. We believe that this differentiation, particularly the framing of open data in terms of a publishing process, has strengthened the Open Data Initiative by drawing on familiar discourses of academic publishing and its concomitant quality assurance mechanisms. The cleaning of data, creation of comprehensive metadata and association with contextual material to make those data understandable, as well working with external, professional partners (such as DataFirst), mirrors traditional journal or book publishing processes.
In positioning our Open Data Initiative as a publishing process we are not disparaging of the concept of sharing. Sharing is key to all aspects of open research, including open data, but it also forms part of normal academic practice in terms of collaboration with colleagues, students and other external parties. The concept of publication builds on sharing by emphasising the organisation and presentation of material for an external audience, which entails a level of considered action and thinking that sharing does not always entail.
That said, any publication process also entails a cost component in the form of professional expertise, whether this is provided internally in the project structure or by external experts. This additional expense needs to be factored in at an early stage when considering data publishing as part of formal project activity.
Open data enhances rigour
A second insight we’ve gained is that open data publication has a strong role to play as a rigour- or quality-enhancing mechanism. In the ROER4D context, the intense scrutiny that datasets have undergone prior to and during the publication process has led to improved verification and validation of their internal coherence and consistency, and the combination of external critique by the C&D team and internal reflection by the contributing researchers facilitated by the publication process has improved the quality of the datasets. In some cases this scrutiny has fed back into improving the quality of the analysis, and how the findings are presented.
Well begun is half done
In a recent DMP tool. More important though, as borne out by the ROER4D experience, is the elevation of data management and curation more generally to a core project objective rather than a post-project archiving process.
In summary, ROER4D’s experience has been that professionalised data management supports core aspects of the research endeavour and facilitates the development of skills such as curation and copyright management that are becoming increasingly important in all research endeavours. Open data publication, in particular, stresses the development of these skills and of a rigorous and holistic approach to data management. Rather than being a disincentive to engage, we see this heightened scrutiny in a positive light. It is a means to refine and improve data quality and general research practice, and fosters an appreciation for the value of open practices and the sharing of open content amongst researchers.