6 QA/QC: Reducing Error Entry, Finding and Correcting Errors
6.1 Reducing errors when entering data
Using ‘Data Validation’ Rules
Setting up Data Validation in Microsoft Excel: [link to Microsoft post] and a tutorial video.
Setting up Data Validation in Google Sheets: [link to post by Google] and a tutorial video
Data entry with ‘Text-to-Speech’
MS Excel (& other MS programs)
Converting text to speech in Excel for Microsoft 365, Excel 2010-2021 [link].
Microsoft for Mac: Hear selected text read aloud from Excel, Word, Power Point, and Outlook [link]. See also ‘System Preferences->Accessibility’
Video Tutorials
- Tutorial Video 1. This will allow you to select rows or columns of Excel and have them read back to you.
- Tutorial Video 2: it’s not as thorough, but it is a bit easier to see the menu
- Tutorial Video 3. Because why not a third one?
- If you are typing up written observations or field notes, the NY Times Wirecutter team thinks the Dictation tool in MS Word is the best way transcribe your written notes. It requires an internaet connection to use, but according to their review “it handles medical terms and less-common words correctly more than most competing tools. It supports 34 languages, includes commands for punctuation, emoji, and other commonly used special characters, and allows you to format and navigate text with spoken commands.”
Text-to-Speech in Google Docs
- If you prefer working in Google Docs you can do the same thing. This article will show you how. You can also watch this tutorial video.
Better still: ‘speak-on-enter’ to confirm data values as entered
- Speak-on-enter [tutorial:]
Speech-to-Text for keyboardless data entry
Google Sheets: Typing with your voice
MS Word: How to dictate documents
Overview of Microsoft Accessibility Tools
- ‘Narrator’ function link
6.2 R Packages for Data QA/QC
janitor: simple functions for examining and cleaning dirty data. It was built with beginning and intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster. (see this mini-tutorial)
- clean column names
- remove empty rows and columns
- remove duplicated rows
cleanr. small R package for cleaning and checking data columns in a fast and easy way.
unheadr: used to wrangle spreadhseets with embedded subheaders or values wrapped accross several rows (highlighting, merged cells, etc).
For more advanced R users:
Richard’s Iannone
pointblankR package for Data Validation and Organization of Metadata [linkvalidate: designed to test data against a reusable set of data validation rules, investigate, summarize, and visualize data validation results, among other things
Data Curator is a simple desktop data editor to help describe, validate and share usable open data.
Kim, A. Y., Herrmann, V., Barreto, R., Calkins, B., Gonzalez-Akre, E., Johnson, D. J., Jordan, J. A., Magee, L., McGregor, I. R., Montero, N., Novak, K., Rogers, T., Shue, J., & Anderson-Teixeira, K. J. (2022). Implementing GitHub Actions continuous integration to reduce error rates in ecological data collection. Methods in Ecology and Evolution, 13, 2572–2585. https://doi.org/10.1111/2041-210X.13982
6.3 Readings & Sources
Campbell, J. L. et al. 2013. Quantity is nothing without quality: automated QA/QC for streaming environmental sensor data. BioScience, 63(7): 574-585. link
For more advanced users of R/Github: Kim, A. Y. et al. 2022. Implementing GitHub Actions continuous integration to reduce error rates in ecological data collection. Methods in Ecology and Evolution, 13, 2572– 2585. https://doi.org/10.1111/2041-210X.13982
Barchard, K. A., & Pace, L. A. (2011). Preventing human error: The impact of data entry methods on data accuracy and statistical results. Computers in Human Behavior, 27(5), 1834-1839. link
Atkinson, I. (2012). Accuracy of data transfer: double data entry and estimating levels of error. Journal of Clinical Nursing, 21(19pt20), 2730-2735.link
Goldberg, S. I., Niemierko, A., & Turchin, A. (2008). Analysis of data errors in clinical research databases. AMIA Annual Symposium proceedings. AMIA Symposium, 2008, 242–246. link
DataONE Education Module: Data Quality Control and Assurance. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L05_DataQualityControlAssurance.pptx