Data Validation
Heliconia Demographic Survey Data
We use the R package pointblank
to review and validate the plot-level descriptors
(HDP_plots.csv
) and clean demographic data set
(heliconia_survey_clean.csv
) in preparation for archiving
in Dryad and publication in Bruna et al. (2023). The report below
includes:
- the different validation tests that were conducted,
- the date of the most recent test,
- each test’s criteria for ‘pass’, ‘warn’ and ‘stop’,
- the number of ‘units’ (i.e., rows or columns) assessed in each test,
- how many of these units passed or failed, and
- a button for downloading a .csv file of the records flagged by a particular validation test. Note that these are not necessarily errors. For instance, the validation procedure for ‘plant size - height’ returns as ‘stop’ all plants >2 m tall. Heliconia plants can exceed this threshold; this test is simply designed to flag any such individuals. In contrast, the data set should not have any duplicated rows. A notification of ‘fail’ for this test indicates an error that can be corrected by downloading the csv file, reviewing the duplicated rows, and uploading the necessary corrections.
Last run: 2023-09-20
Dataset Structure: Data types
Tests to determine if columns are correctly coded as integer,
character, etc.
Test criteria: Strict (‘stop’ if any rows
fail).
Pointblank Validation | |||||||||||||
Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Height is measured to nearest cm
|
— |
|
✓ |
57K |
57K 1.00 |
0 0.00 |
— |
○ |
— |
— | ||
2 | Shoots is interger
|
— |
|
✓ |
57K |
57K 1.00 |
0 0.00 |
— |
○ |
— |
— | ||
3 | Number of inflorescences is integer
|
— |
|
✓ |
2K |
2K 1.00 |
0 0.00 |
— |
○ |
— |
— | ||
2023-09-20 13:42:48 UTC < 1 s 2023-09-20 13:42:48 UTC |
Dataset Structure: Plot & Subplot IDs
Test for any nonexistent values of plot_id
(e.g.,
‘FF-10’, ‘CF-23’) or subplot
(e.g., ‘H23’, ‘A11’).
Test criteria: Strict (‘stop’ if any rows
fail).
Pointblank Validation | |||||||||||||
Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | col_vals_in_set()
|
|
✓ |
66K |
66K 1.00 |
0 0.00 |
— |
○ |
— |
— | |||
2 | col_vals_in_set()
|
|
✓ |
66K |
66K 1.00 |
0 0.00 |
— |
○ |
— |
— | |||
2023-09-20 13:42:49 UTC < 1 s 2023-09-20 13:42:49 UTC |
Dataset Structure: Duplicated or Missing Values
Tests for duplicated rows, missing plant_ID
numbers, or
duplicate plant_id
numbers (test is done for every survey
year).
Test criteria: Strict (‘stop’ if any rows
fail).
Pointblank Validation | |||||||||||||
Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | duplicated rows
|
— | — |
|
✓ |
66K |
66K 1.00 |
0 0.00 |
— |
○ |
— |
— | |
2 | col_vals_not_null()
|
— |
|
✓ |
66K |
66K 1.00 |
0 0.00 |
— |
○ |
— |
— | ||
3 | Check for duplicate plant ID numbers
|
— |
|
✓ |
9K |
9K 1.00 |
0 0.00 |
— |
○ |
— |
— | ||
4 | Check for duplicate tag numbers in a plot
|
— |
|
✓ |
64 |
0 0.00 |
64 1.00 |
— |
● |
— |
|||
2023-09-20 13:42:50 UTC 4.0 s 2023-09-20 13:42:54 UTC |
Plant Characteristics: Size & Flowering
Tests to determine how many values of plant size (shts
,
ht
) or infloresence number (infl
) are outside
the range of most values.
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
Pointblank Validation | |||||||||||||
Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | shoots between 0 and 20
|
|
✓ |
66K |
66K 0.99 |
8 0.01 |
● |
○ |
— |
||||
2 | height between 0 and 200cm
|
|
✓ |
66K |
66K 0.99 |
2 0.01 |
● |
○ |
— |
||||
3 | infloresences between 0 and 3
|
|
✓ |
66K |
66K 0.99 |
15 0.01 |
● |
○ |
— |
||||
2023-09-20 13:42:55 UTC < 1 s 2023-09-20 13:42:55 UTC |
Plant Characteristics: Growth
Tests for unusual changes in plant size (both height and shoot
number) from \(Year_{t}\) to \(Year_{t+1}\).
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
Pointblank Validation | |||||||||||||
Check growth & regression
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |% change in height| < 200%
|
|
✓ |
66K |
66K 0.99 |
420 0.01 |
● |
○ |
— |
||||
2 | |∆ height| < 100cm
|
|
✓ |
66K |
66K 0.99 |
11 0.01 |
— |
● |
— |
||||
3 | |∆ shoot number| < 5
|
|
✓ |
66K |
66K 0.99 |
201 0.01 |
— |
● |
— |
||||
2023-09-20 13:42:56 UTC < 1 s 2023-09-20 13:42:56 UTC |
Seedlings: Initial size
Tests for seedlings whose size at initial marking was unusually
large. Conducted for both height and shoot number.
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
Seedlings: Data Entry Errors
Check if during data entry the size of seedlings (1) wasn’t accidentally transposed to the “inflorescences” column, which would code a new seedling as being reproductive.
Test criteria: Strict (‘stop’ if any rows fail).
Pointblank Validation | |||||||||||||
Check for ‘reproductive’ seedlings
tibbleWARN
—
STOP
1
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | infl < 1
|
|
✓ |
3K |
3K 1.00 |
0 0.00 |
— |
○ |
— |
— | |||
2023-09-20 13:42:58 UTC < 1 s 2023-09-20 13:42:58 UTC |
Zombie plants
Zombie plants are those that were recorded as ‘Dead’ in a survey but
for which there is a measurement in a subsequent year (indicative of the
plant losing all below-ground parts and then new shoots emerging prior
to the next survey). This validation generates a .csv
of
any plants meeting this condition (labeled as ’zombie` for review and
correction.
Pointblank Validation | |||||||||||||
Check for zombies
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Check for Zombies
|
|
✓ |
0 |
0 NA |
0 NA |
— |
○ |
— |
— | |||
2023-09-20 13:43:01 UTC < 1 s 2023-09-20 13:43:01 UTC |
Plant Mortality: Plant size
Tests for plants 6 or more shoots dying from one year to the next. Note: These are not errors, these are plants whose size the year prior to being recorded as ‘dead’ in a survey was in the top 2% of dying plants.
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if \(\geq\) 2% of rows fail conditions.