15  Useful resources for Data Collection & Managment

Modified

March 13, 2026

15.1 R Programming

Essential Resources

  1. R for Data Science book: Hadley Wickham wrote a book on using the tidyverse and the online version is FREE. This is a phenomenal resource on using R to import, tidy, and visualize data.

  2. Posit Cheat Sheets: help with commands for using the different tidyverse packages, RStudio shortcuts and tricks, help with R commands, and more. You definitely want the ones for Data Import, Work with Strings, Factors, Data Transformation, and Base R.

  3. RStudio Keyboard Shortcuts: A list of the keyboard shortcuts for Mac, Windows, and Linux can be found here.

  4. Where and How to ask for help:

Tutorials

  1. Paul van der Laken’s List of books, tutorials, and other resources on topics ranging from data manipulation to data validation to data visualization.

  2. R Essential Training: Wrangling and Visualizing Data. (requires a UF email address to access LinkedIn Learning).

  3. Software Carpentry: Using RStudio for Project Organization & Management

  4. Swirl: learn R programming interactively, at your own pace, and in the R console.

  5. R Bootcamp by Ted Laderas and Jessica Minnier. Learn R in your browser.

  6. How to clean messy data in R

  7. The Ultimate Guide to Data Cleaning is written for Python users but the pricniples apply regardless of language.

15.2 Specific Data Cleaning and Management Problems

Dates & Times

  1. Handling dates and times in R

Text & Text Mining

  1. Text Mining: tidytext package

Qualtrics

  1. Working with Qualtrics survey data: qualtRics package

Text Extraction

  1. Optical Character Recognition (OCR): extract text from images: tesseract package

  2. Extract text & metadata from pdf files: pdftools package

Images & Image Processing

  1. Image processing: the magick package

15.3 Advanced R Packages for Data Management

  1. DataCurator package: ‘a simple desktop data editor to help describe, validate and share usable open data’.

  2. RegExr: online tool to learn, build, & test Regular Expressions (RegEx / RegExp)

  3. janitor (cleanup of file names, etc.)

  4. ROpenSci: tools for accessing, manipulating, and visualizing open data

15.4 Data Visualization

  1. Data Visualization: a practical introduction by Kieran Healy is my favorite introductory (yet super-comprehensive) book on data visualization with R. If you scroll down to the bottom of the page you can download the datasets and code used to make the figures in the book, which makes life much easier.

15.5 Slide & Presentations

  1. Make slide presentations with R

15.6 Documents & Reports

  1. knitr overview: reproducible documents with R

15.7 Discipline-specific R Resources

History

  1. historydata package: Sample data sets for historians learning R. They include population, institutional, religious, military, and prosopographical data suitable for mapping, quantitative analysis, and network analysis.

  2. The Programming Historian Website: wide range of topics, from text analysis to OpenRefine

Psychology

  1. ‘Programming for Psychologists: Data Creation and Analysis’ by Matthew J. C. Crump

15.8 Data Archives

  1. Qualitative Data Repository: dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.

  2. Data Dryad: open data publishing platform and a community committed to the open availability and routine re-use of all research data.

  3. ICPSR: data access, curation, and analytical methods for social science.

15.9 Disposal & Destruction of Records

  1. UF IT: Securely Destroying Electronic and Paper Records

  2. UF Procurement Services: Approved vendors for document destruction services

  3. UF Approved Methods for Electronic Media Disposal

15.10 Non-R software & tools

Linguistics

  1. lameta: tool for organizing collections of files and metadatata made in the course of documenting language, music, and other cultural expressions.

  2. FieldWorks Language Explorer: Dictionary Creation Software; comprehensive tool that allows you to create a dictionary for a language by collecting texts, words, and cultural information. It is widely used for documenting and preserving low-resource languages and cultures.

15.10.1 Text analysis

{AntConc}(https://www.laurenceanthony.net/software.html): A freeware corpus analysis toolkit for concordancing and text analysis (see link for related tools); a tutorial from the Programming Historian is available here.