R is a wonderful tool for data analysis. You’re almost guaranteed to need to be very proficient in it for your project.
R is great for cleaning data because it lets you keep track of the specific steps you used (and ideally write scripts to automate or replicate the process). It also has hugely powerful statistical capabilities, as well as a huge range of graphing capabilities. R’s strength comes from its status as freeware: developers are constantly releasing new packages (that you can download and install) that let R do new and interesting things. Its limitations: It is not a database management system, even though it can do basic joins (called merge in R). Also, since all data are loaded into working memory, there are limits to the size of the data files it can work with (though there are some workarounds)—it’s not the ideal tool for “big data”. Finally, its capabilities for interactive graphics are recent and limited. Within these boundaries, though, it’s amazing—and free!
Because it’s such a powerful flexible tool, there can be a steep learning curve. Here are some things to get you started.
Install R: there are versions for different operating systems.
Install R Studio: this is a nice editing environment: it lets you keep track of scripts, plot windows, the objects in your environment, and so forth. There is no reason not to use R Studio if you’re using R. If you launch R Studio it will start an R session automatically–you don’t need to start both.
Start learning: swirl is a package that turns the R console into an interactive learning environment. This site has instructions on installing it and loading “courses”.
For an alternative approach, you might want to check out the video lectures in the Coursera class “R Programming” (offered monthly) –it will walk you through how to install packages and write scripts in R. (This class is part of the Data Science specialization on Coursera; all of the classes are worth taking.)
Use reference sheets and tutorials: The R Studio homepage has a number of handy reference sheets. Here are few of them:
- data wrangling with dplyr and tidyr
- data visualization with ggplot
- integrated analysis and reporting with r markdown
A great walkthrough of using the reshape2 package to change the layout of data is here.
Find answers to specific questions: Googling the thing you’re interested in will turn up lots of results. You can also browse the archives at R Bloggers or Stack Overflow.
Get inspired: This blog has some nice examples of data visualization in R. This blog has nice examples of data visualization in general. This blog has examples of what NOT to do with data visualization.
Here are some websites which may help someone get started with R:
(1) Getting their data into R:
https://vimeo.com/130548869
(2) Tidying Data:
https://www.rstudio.com/resources/webinars/data-wrangling-with-r-and-rstudio/
(3) dplyr Package to tidy data faster
http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/