SNSF Spark Projekt ‘Dynamic Data Ingestion’ for server-side data harmonisation: Creating a database with 200k students and scholars 1200-1800: Method, concept and practical implementation

Gubler, Kaspar

doi:10.48350/198516

SNSF Spark Projekt ‘Dynamic Data Ingestion’ for server-side data harmonisation: Creating a database with 200k students and scholars 1200-1800: Method, concept and practical implementation

BORIS DOI

10.48350/198516

Date of Publication

June 4, 2021

Publication Type

Conference Paper

Division/Institute

Historisches Institut...

Author

Gubler, Kaspar

Historisches Institut - Mittelalterliche Geschichte

Subject(s)

900 - History::940 - ...

Language

English

Uncontrolled Keywords

nodegoat

data modeling

data analysis

data visualisation

digital humanities

Description

The linking of research data has been a dominant topic for years, especially in digital history. Linked Open Data (LOD) is the buzzword at conferences and in research projects. However, it is not the collection of such data available on the internet that is the greatest challenge here, but its harmonisation, because research databases are usually structured differently. It is therefore not surprising that despite many initiatives no research project in digital history has yet been realised being able to harmonise data across several structural levels of the databases. This means, for example, not only linking persons of databases by their names, but going deeper into the data structure to harmonise, for example, the geographical origin or attributes of a person’s education. But that would be the aim: to answer scientific questions through structural data harmonisation. This is where our SPARK project comes in. The third and final phase of the project (Episode 3) has been completed in January 2021. What are the core results of this project? In essence, it is a software module (DDI module for ‘dynamic data ingestion) and a method: data (research data) is collected from different source databases and ingested on a central server using the module according to the spider principle, creating a new metadatabase. The harmonisation of the collected data in this new build database is done as far as possible already with the data ingestion by mapping the database fields of the source databases into corresponding database fields to the new metadatabase. If such a mapping is not or only partially possible because the database fields of the source database and the metadatabase are too dissimilar, in a second step, as soon as the data is stored on the central server, an algorithm can be used to bring uniformity to this data by data reconciliation. In addition, the data can also be automatically reclassified in order to standardise it. These measures prepare the data for analysis and ultimately for publication, which both can be done in the virtual research environment Nodegoat.

Handle

https://boris-portal.unibe.ch/handle/20.500.12422/178748

Show full item

File(s)

File	File Type	Format	Size	License	Publisher/Copright statement	Content
Nodegoat-Day-2021-Programm.pdf	text	Adobe PDF	972.1 KB	https://www.ub.unibe.ch/services/open_science/boris_publications/index_eng.html#collapse_pane631832		supplemental	Open

SNSF Spark Projekt ‘Dynamic Data Ingestion’ for server-side data harmonisation: Creating a database with 200k students and scholars 1200-1800: Method, concept and practical implementation

Options