Why does the same data sheet appear more than once?
To ensure optimal quality, you may observe that some data sheets are
processed more than once. This is part of our standard digitization
procedure, where each sheet undergoes at least two rounds of
digitization to guarantee high quality and accuracy. If
discrepancies arise between the two digitizations, a third round is
conducted. The number of times a datasheet is reprocessed and the
management of discrepancies during post-processing are determined by
our client (the data owner).
As a user, you do not need to worry about this aspect. We take every
precaution to prevent the same data from being reprocessed by a
single user. Although we strive for minimal overlap, there may be
occasional instances, but these are infrequent.
How is flawed user input handled?
Handling flawed user input is a critical aspect of our process. Each
data sheet undergoes multiple rounds of digitization by different
users to identify and correct potential data entry errors. It is
important to recognize that achieving perfection may not be
feasible, depending on the dataset's complexity. Therefore, our
clients, the data owners, establish an acceptable accuracy
threshold. Our post-processing procedures operate within these
specified parameters to address and resolve any discrepancies in
user input.
We prioritize the breadth of data availability over immaculate
precision. Even if the data contains some errors, a dataset spanning
70 years is generally more valuable than a nearly perfect dataset
covering only 10 years.
In summary, our approach to handling flawed user input aims to
balance maintaining data integrity with the importance of
comprehensive historical data.
What is citizen science?

Citizen science is a collaborative scientific research approach that involves members of the public in scientific investigation. Whether through collecting data, analyzing results, or making new discoveries, volunteers (or "citizen scientists") contribute to real scientific studies. This voluntary participation can take place through various ways such as via online platforms, local or global events, or research projects. The aim of citizen science is to broaden participation in scientific research and create a more democratic approach to scientific inquiry.
Why can't theses tasks be solved with machine learning techniques?
The core difficulty resides in the diverse nature of the data
records. Though optical character recognition (OCR) technology has
advanced significantly, allowing potential digitization of tabular
data, the main hurdle is extraction. Due to the changing formats of
tables, we can't define a standard structure. Moreover, the
essential annotations made by previous data collectors further
complicate an automatic approach to this task.
The issues with hydrographs stem from their often poor paper quality
which makes them hard to decipher. Moreover, for training purposes,
some data that's already been digitized is needed. Therefore, manual
action becomes inevitably necessary.
We understand the potential of machine learning and are actively
working to incorporate it. Our future aim for this platform is to
focus on two key areas. One being a translation of unreadable data
and annotatations by measuring station operators, which are crucial
for data insight. The other is the development of viable data sets
useful for training neural networks.
Why does the data start in November and not January?
A hydrological, or water, year is a specific 12-month period that
hydrologists use to study and report on water systems. In our cases
it starts on November 1st and ends on October 30th of the following
year, though the exact dates can vary based on regional climate.
This period is intentionally selected to include a complete cycle of
water-related events, encompassing precipitation, evaporation, and
runoff. Utilizing a hydrological year helps in capturing all the key
phases crucial for understanding water dynamics. By standardizing
the evaluation period, scientists can effectively analyze and manage
the collected water data, which includes rainfall patterns, river
flows, and soil and groundwater storage. This concept supports
informed decisions in water resource management.
Why is it called digitization instead of digitalization?
Both terms are often used interchangeably while referring to different actions. "Digitalization" describes a transformative process that relies on the application of digital technologies. Conversely, "digitization" refers to the process of converting analog data into a digital format. Given that our website is focused on this latter process, we have chosen to use the term "digitization."
Your question is not listed here?
We are always available to assist you and are more than happy to address any further questions you may have. Please feel free to contact us with any additional queries or topics that require clarification.