Frequently asked questions
Why does the same data sheet appear more than once?

To ensure optimal quality, you may observe that some data sheets are processed more than once. This is part of our standard digitization procedure, where each sheet undergoes at least two rounds of digitization to guarantee high quality and accuracy. If discrepancies arise between the two digitizations, a third round is conducted. The number of times a datasheet is reprocessed and the management of discrepancies during post-processing are determined by our client (the data owner).
As a user, you do not need to worry about this aspect. We take every precaution to prevent the same data from being reprocessed by a single user. Although we strive for minimal overlap, there may be occasional instances, but these are infrequent.

How is flawed user input handled?

Handling flawed user input is a critical aspect of our process. Each data sheet undergoes multiple rounds of digitization by different users to identify and correct potential data entry errors. It is important to recognize that achieving perfection may not be feasible, depending on the dataset's complexity. Therefore, our clients, the data owners, establish an acceptable accuracy threshold. Our post-processing procedures operate within these specified parameters to address and resolve any discrepancies in user input.
We prioritize the breadth of data availability over immaculate precision. Even if the data contains some errors, a dataset spanning 70 years is generally more valuable than a nearly perfect dataset covering only 10 years.
In summary, our approach to handling flawed user input aims to balance maintaining data integrity with the importance of comprehensive historical data.

What is citizen science?

Citizen science is a collaborative scientific research approach that involves members of the public in scientific investigation. Whether through collecting data, analyzing results, or making new discoveries, volunteers (or "citizen scientists") contribute to real scientific studies. This voluntary participation can take place through various ways such as via online platforms, local or global events, or research projects. The aim of citizen science is to broaden participation in scientific research and create a more democratic approach to scientific inquiry.

Why can't theses tasks be solved with machine learning techniques?

The core difficulty resides in the diverse nature of the data records. Though optical character recognition (OCR) technology has advanced significantly, allowing potential digitization of tabular data, the main hurdle is extraction. Due to the changing formats of tables, we can't define a standard structure. Moreover, the essential annotations made by previous data collectors further complicate an automatic approach to this task.
The issues with hydrographs stem from their often poor paper quality which makes them hard to decipher. Moreover, for training purposes, some data that's already been digitized is needed. Therefore, manual action becomes inevitably necessary.
We understand the potential of machine learning and are actively working to incorporate it. Our future aim for this platform is to focus on two key areas. One being a translation of unreadable data and annotatations by measuring station operators, which are crucial for data insight. The other is the development of viable data sets useful for training neural networks.

Why does the data start in November and not January?

A hydrological, or water, year is a specific 12-month period that hydrologists use to study and report on water systems. In our cases it starts on November 1st and ends on October 30th of the following year, though the exact dates can vary based on regional climate.
This period is intentionally selected to include a complete cycle of water-related events, encompassing precipitation, evaporation, and runoff. Utilizing a hydrological year helps in capturing all the key phases crucial for understanding water dynamics. By standardizing the evaluation period, scientists can effectively analyze and manage the collected water data, which includes rainfall patterns, river flows, and soil and groundwater storage. This concept supports informed decisions in water resource management.

Why is it called digitization instead of digitalization?

Both terms are often used interchangeably while referring to different actions. "Digitalization" describes a transformative process that relies on the application of digital technologies. Conversely, "digitization" refers to the process of converting analog data into a digital format. Given that our website is focused on this latter process, we have chosen to use the term "digitization."

Your question is not listed here?

We are always available to assist you and are more than happy to address any further questions you may have. Please feel free to contact us with any additional queries or topics that require clarification.

Mail: hydriv-digitization@tu-braunschweig.de