22  Useful resources for Data Collection & Managment

Modified

February 20, 2026

22.1 R Programming

Essential Resources

  1. R for Data Science book: Hadley Wickham wrote a book on using the tidyverse and the online version is FREE. This is a phenomenal resource on using R to import, tidy, and visualize data.

  2. Posit Cheat Sheets: help with commands for using the different tidyverse packages, RStudio shortcuts and tricks, help with R commands, and more. You definitely want the ones for Data Import, Work with Strings, Factors, Data Transformation, and Base R.

  3. RStudio Keyboard Shortcuts: A list of the keyboard shortcuts for Mac, Windows, and Linux can be found here.

  4. Where and How to ask for help:

Tutorials

  1. Paul van der Laken’s List of books, tutorials, and other resources on topics ranging from data manipulation to data validation to data visualization.

  2. R Essential Training: Wrangling and Visualizing Data. (requires a UF email address to access LinkedIn Learning).

  3. Software Carpentry: Using RStudio for Project Organization & Management

  4. Swirl: learn R programming interactively, at your own pace, and in the R console.

  5. R Bootcamp by Ted Laderas and Jessica Minnier. Learn R in your browser.

  6. How to clean messy data in R

  7. The Ultimate Guide to Data Cleaning is written for Python users but the pricniples apply regardless of language.

22.2 Specific Data Cleaning and Management Problems

Dates & Times

  1. Handling dates and times in R

Text & Text Mining

  1. Text Mining: tidytext package

Qualtrics

  1. Working with Qualtrics survey data: qualtRics package

Text Extraction

  1. Optical Character Recognition (OCR): extract text from images: tesseract package

  2. Extract text & metadata from pdf files: pdftools package

Images & Image Processing

  1. Image processing: the magick package

22.3 Advanced R Packages for Data Management

  1. DataCurator package: ‘a simple desktop data editor to help describe, validate and share usable open data’.

  2. RegExr: online tool to learn, build, & test Regular Expressions (RegEx / RegExp)

  3. janitor (cleanup of file names, etc.)

  4. ROpenSci: tools for accessing, manipulating, and visualizing open data

22.4 Data Visualization

  1. Data Visualization: a practical introduction by Kieran Healy is my favorite introductory (yet super-comprehensive) book on data visualization with R. If you scroll down to the bottom of the page you can download the datasets and code used to make the figures in the book, which makes life much easier.

22.5 Slide & Presentations

  1. Make slide presentations with R

22.6 Documents & Reports

  1. knitr overview: reproducible documents with R

22.7 Discipline-specific R Resources

History

  1. historydata package: Sample data sets for historians learning R. They include population, institutional, religious, military, and prosopographical data suitable for mapping, quantitative analysis, and network analysis.

  2. The Programming Historian Website: wide range of topics, from text analysis to OpenRefine

Psychology

  1. ‘Programming for Psychologists: Data Creation and Analysis’ by Matthew J. C. Crump

22.8 Data Archives

  1. Qualitative Data Repository: dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.

  2. Data Dryad: open data publishing platform and a community committed to the open availability and routine re-use of all research data.

  3. ICPSR: data access, curation, and analytical methods for social science.