22 Useful resources for Data Collection & Managment
22.1 R Programming
Essential Resources
R for Data Science book: Hadley Wickham wrote a book on using the tidyverse and the online version is FREE. This is a phenomenal resource on using R to import, tidy, and visualize data.
Posit Cheat Sheets: help with commands for using the different
tidyversepackages, RStudio shortcuts and tricks, help with R commands, and more. You definitely want the ones for Data Import, Work with Strings, Factors, Data Transformation, and Base R.RStudio Keyboard Shortcuts: A list of the keyboard shortcuts for Mac, Windows, and Linux can be found here.
Where and How to ask for help:
- Hadley Wickham’s advice on how to write a good reproducible
example for getting help with R
- how to post good questions on StackOverflow
- The UF R-users listserv is very user friendly and a great place to post requests for help.
Tutorials
Paul van der Laken’s List of books, tutorials, and other resources on topics ranging from data manipulation to data validation to data visualization.
R Essential Training: Wrangling and Visualizing Data. (requires a UF email address to access LinkedIn Learning).
Software Carpentry: Using RStudio for Project Organization & Management
Swirl: learn R programming interactively, at your own pace, and in the R console.
R Bootcamp by Ted Laderas and Jessica Minnier. Learn R in your browser.
The Ultimate Guide to Data Cleaning is written for Python users but the pricniples apply regardless of language.
22.2 Specific Data Cleaning and Management Problems
Dates & Times
Text & Text Mining
- Text Mining:
tidytextpackage
Qualtrics
- Working with Qualtrics survey data:
qualtRicspackage
Text Extraction
Images & Image Processing
- Image processing: the
magickpackage
22.3 Advanced R Packages for Data Management
DataCuratorpackage: ‘a simple desktop data editor to help describe, validate and share usable open data’.RegExr: online tool to learn, build, & test Regular Expressions (RegEx / RegExp)
janitor (cleanup of file names, etc.)
ROpenSci: tools for accessing, manipulating, and visualizing open data
22.4 Data Visualization
- Data Visualization: a practical introduction by Kieran Healy is my favorite introductory (yet super-comprehensive) book on data visualization with R. If you scroll down to the bottom of the page you can download the datasets and code used to make the figures in the book, which makes life much easier.
22.5 Slide & Presentations
22.6 Documents & Reports
knitroverview: reproducible documents with R
22.7 Discipline-specific R Resources
History
historydatapackage: Sample data sets for historians learning R. They include population, institutional, religious, military, and prosopographical data suitable for mapping, quantitative analysis, and network analysis.The Programming Historian Website: wide range of topics, from text analysis to OpenRefine
Psychology
- ‘Programming for Psychologists: Data Creation and Analysis’ by Matthew J. C. Crump
22.8 Data Archives
Qualitative Data Repository: dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.
Data Dryad: open data publishing platform and a community committed to the open availability and routine re-use of all research data.
ICPSR: data access, curation, and analytical methods for social science.