18 Reproducible Data Correction with R
18.1 Download the data we will be using in class
- Open the messy data file
demo_data.csvby following this link - the data will open as a tab in your web browser in
.csvformat; save them to thedata_rawfolder by going to ‘File’ on the menu bar of your web browser and selecting ‘Save page as’ from the drop-down menu. - save the file to the
data_rawfolder.
18.2 Data Cleaning: Practice
Review the
.csvfileWhat things do you see that need to be corrected?
Make a list of the what you think needs to be corrected and the steps necessary to identify and implement each correction. Some of the things to look out for include:
- Numeric values stored as character data types
- Factors stred as characters
- Duplicate rows
- Spelling mistakes
- inconsistent formatting (eg., codes, capitalizations)
- White spaces
- Missing data
- Zeros instead of null values
- Special characters (e.g. commas in numeric values instead of decimals)
- column headings with spaces between words or that start with numerals
Write R code to implement the changes you have identified
Save this code as
las_practice.rand submit it via the Canvas website.
Grading Rubric:
Assignment completed with data validation correctly programmed with useful error messages: 35 Most data validation properly programmed; some require instructor follow-up: 25 Many of the validation parameters need corrections, error messages not useful: 15 Incorrect data are able to be entered in all categories; Instructor follow-up required: 10