Data access, processing and metadata: CESSDA tools and services
The Tools and Services Working Group organised the CESSDA Expert Seminar 2019 (CES2019) on 26-27 September 2019, hosted by the UK Data Service in Colchester.
The first day of CES2019 focussed primarily on the following topics: access conditions to data, metadata sharing and various CESSDA products, while the second day focussed on data processing tools.
Access conditions to data
Currently all the archives within CESSDA have different data access conditions. Two institutions presented their specific conditions. GESIS have several conditions depending on various factors such as the depositor agreement, confidentiality and sensitivity. The UK Data Archive have four colour coded categories, based on various factors such as the depositor agreement and how sensitive the data is – the level of personal data contained in the dataset.
Access conditions vary from country to country and the general conclusion was that the CESSDA community needs to agree on the same access conditions and make clear what the rules are to users. The CESSDA Data Access Policy states in Principle 11 that “Access conditions to data shall, by 2022, be fully interoperable.”
Metadata sharing
Metadata maintenance is the key to good data. CESSDA has now reached a stage where, as a community, we are joining forces in the tools and services we provide to our users. The first step that we took was to make sure that the metadata from the service providers were standardised and mapped in the CMM (CESSDA Metadata Model). It makes sure that sharing is possible, despite different data documentation across service providers, by making sure that they are linked to each other. The second step was to establish the CESSDA Metadata Office in 2019, which deals with metadata issues in CESSDA and is responsible for updating the CMM model when necessary.
The CESSDA Data Catalogue is the result of good metadata work by all the participating archives. It was conceived as the main interaction point between researchers and CESSDA service providers (SPs). It harvests the metadata from the CESSDA SPs and thus shows the user where the data is located.
There is obviously a lot of work being done “behind the scenes” to maintain and update the metadata collections, which also was pointed out by Jon Jonson from CLOSER.
Four CESSDA products
Four CESSDA products were presented and a dialogue took place about future development and prospects.
CESSDA Data Catalogue was launched last year and now contains over 30.000 data sets from a number of countries. There are still countries who are in the process of joining, so the number of data sets will continue to increase. The CESSDA Data Catalogue is also part of the EOSC marketplace.
CESSDA Controlled Vocabularies Service (CVS) launched in autumn this year, secures consistency in the use of controlled vocabularies across archives, and the user can see and download the definitions.
ELSST – European Language Social Science Thesaurus, is the online thesaurus showing definitions and relationships of expressions and keywords in the social science across several languages.
An example from ELSST: Relations between concepts
Copyright: UK Data Service
European Question Bank (EQB) is soon to be launched and will allow the user to search question texts across languages, countries and time. The EQB will facilitate a search possibility based on question texts from all the participating archives. This means that a researcher who is using a questionnaire can find different question formulations of the same topic. For example, say that you are studying the welfare state, it could be interesting to find out if researchers across Europe have formulated their survey questions in a similar way to you. The EQB will thus allow users to go into further detail with the data than before.
Data processing
A broad range of tools were presented, from user-orientated tools such as QAMyData from the UK Data Archive. QAMyData is soon to be released and can check the selected data and provide a workflow based on a software, thus replacing the time-intensive, manual work necessary to check data. The Danish National Archive showed how they have combined the long-term preservation of both governmental records and research data.
The Swedish National Data Service are developing new tools as a result of new demands from their designated community. SND are now receiving data from multiple communities and not only from the social sciences. They have gone to some effort to make sure to link the community-specific metadata to the DDI standard, used by the CESSDA community.
The UK Data Archive demonstrated how to use big data by scaling up for smart meter energy data. Big data from energy use can be analysed online and combined with open data from other public sectors. Speaker Darren Bell demonstrated the tool used to analyse big data online. He also pointed out that in order to use big data, it is also important to be able to link it with other data and that it can be analysed.
Why come to CESSDA Expert Seminar
CESSDA Expert Seminars are a great opportunity for CESSDA service providers to share experiences about tools and services in all phases of development.
The programme this year contained presentations from representatives from seven different countries. Many service providers were present to hear more about the tools and services provided either from CESSDA or the individual archives. One of the main benefits of being in a consortium is that it gives the archives an arena to share and discuss different topics.