University of Tübingen, Central Data Administration (until 02/2020)
Research Questions and General Approach
CAMPOS generates large amounts of data of various kind and relies on data interoperability between its projects and with the whole research community covering CAMPOS. The challenge for the INF Project lies in managing easy access to all data associated to CAMPOS using existing data formats where appropriate, and creating new research data formats where necessary. The final goal is an accepted data management environment for CAMPOS that can - furthermore - be transferred to similar interdisciplinary environmental-science projects.
To achieve this, the INF project
- supplies a central platform for common data storage, management and backup of all projects within CAMPOS and, more importantly,
- provides additional services for data interoperability and exchange, for common data analysis tasks and for managing project workflows, thus increasing overall ease of data usage.
Key objectives are:
- Access to data for all researchers by supplying a central storage platform with integrated user and data access management.
- Data management services that rely on defined data and metadata formats and a common naming convention.
- Data analysis services that use the centralized data management infrastructure.
- Data interoperability within CAMPOS and to external researchers and the whole research community, interfacing with existing other information infrastructures outside CAMPOS.
- Sustainable storage of all relevant data generated within CAMPOS to guarantee the availability of the data for data publication, the reuse of data, and the verification of scientific results.
- Full access to data, and map-based visualization of existing data.
All tools and services provided by the INF project aim to fulfil the FAIR principles for modern digital research data management and to support researchers at all stages of the data life cycle.
FAIR Data Principles
Data management services and tools, as developed by INF, aim at a convenient search and retrieval (findable) of data under a sophisticated user rights management (accessible) in CAMPOS via metadata. The provision of long-term archiving and publication functionalities for research data covers important aspects of data reusability.
Data interoperability depends, among other things, heavily on the support of data exchange formats and compliance with accepted technical standards. Here, INF actively seeks cooperations with other organizations outside of CAMPOS.
Research Data Lifecycle
The complete data life cycle, spanning from data acquisition over the various stages of computation, data analysis, visualization to long-term archiving and publication must be covered to achieve appropriate quality standards and reproducibility of modern research. As developed by INF, a central platform for services and data storage is necessary for an appropriate management of all life cycle tasks and aspects, benefiting all participating CAMPOS researchers.
Schematic illustration of the research data life cycle
To answer the challenges given by the diverse spectrum of disciplines involved in the CAMPOS project and the multitude of existing workflows, a general data management framework was built, consisting of three functional environments:
- the CAMPOS Internal Area forming the private working environment for researchers for all data-related tasks and issues,
- the research data archive FDAT of the University of Tübingen to preserve and publish data for long-term storage and use, and
- the CAMPOS Public Web Portal to provide public access to selected data (not yet implemented).
An automated procedure to build archival packages for longterm storage and publication into the research data archive FDAT was developed by INF.
At present, data packages from several sub-projects in CAMPOS have been successfully archived and published long-term in FDAT. See e.g.: hdl.handle.net/10900.1/afc6eac2-6521-4e59-a5c9-b1d1b36ad598.
A Structure of organizational entities on three levels has been established to efficiently coordinate the development and implementation of the data management within the CRC:
- Executive Board: top level decisions – strategy, prioritization, licensing, etc.
- Project Data Managers (PDMs): organization of work in individual projects
- Data Teams: management of specific types of data, incl. metadata definitions
A data management approach was implemented from the researchers’ perspective describing a sophisticated workflow from the generation and preparation of data and metadata all the way to the long-term preservation and publication.
A hierarchically and flexibly structured metadata schema was developed in order to avoid redundancies and inconsistencies in metadata across the heterogeneous research disciplines in CAMPOS. This concept offers efficiency because metadata can be defined in a data type-specific way based on extensible metadata templates. The effort required for data annotation depends on whether a metadata template for the given type of data is already available in the library of templates, with this approach being a profound time saver for the researcher.
The Data Cockpit, a key component of the CAMPOS internal working area for researchers, provides access to data and metadata for project internal use. It serves as a central hub for data upload and map based visualization of measuring stations etc. Via the Data Cockpit, metadata sets for the respective data records can be easily searched, maintained and edited by the project researchers.
With the assistance of S2, a data management plan (DMP) was developed by INF. The DMP advises researchers to appropriately manage and document the data to allow for the integrated analysis of data across individual research projects and disciplines. The DMP clarifies all data responsibilities to ensure successful cooperation in the CAMPOS project and serves as a guideline for data storage, for reference and to support researchers with all data tasks.