A drawback of R compared to Stata and, within R, of data.table compared to tidyverse, is that documentation is spread out all over the place. You end up doing a lot of googling and spending time at Stack overflow. The problem is many R users now think the tidyverse and dplyr in particular is all there is. I started with base R and it is still what i use for many things. But when I began to switch my data “wrangling” from Stata to R, I got started on dplyr, only to learn (from Julian Hinz) that data.table is much faster. After initial struggles I realized that the syntax of data.table fit much better with my intuitions of the right way to do things. I am in the minority on this topic but not alone. In an effort to convince more people to take advantage of the powerful syntax and awesome speed of data.table, I am posting this set of resources.
PS. This now goes to 11! Here’s a great set of lecture slides from Grant McDermott. If you don’t follow him already, you should.
These items are not ordered for beginners. I started with number 8 (which, as the title suggests, was probably not optimal!). But 4 and then 3 might be ideal starting points. According to my notes 9 (and its sequels) are good starting places.
- https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly This is not actually a good starting point at all but if you’re already a dplyr user you might look at this to decide how interested you might be in data.table
- https://atrebas.github.io/post/2019-03-03-datatable-dplyr/ Comprehensive correspondence between dplyr and data.table ways to do things
- https://github.com/chuvanan/rdatatable-cookbook. Here’s a set of chapters showing how to do common tasks in data.table.
- https://github.com/Rdatatable/data.table/wiki//talks/useR2019_Arun.pdf Great slides introduce data.table’s three key components.
Also illustrates .SD with patterns, a very powerful method.
- Frequently Asked Questions: http://datatable.r-forge.r-project.org/datatable-faq.pdf
- The cheat sheet to print out and and pin to the wall somewhere nearby. https://s3.amazonaws.com/assets.datacamp.com/blog_assets/datatable_Cheat_Sheet_R.pdf
- Many posts have shown how much faster data.table is than the alternatives. This one compares with dplyr, base R, and Python Pandas. https://github.com/szilard/benchm-dplyr-dt And here is some code showing how they did each task in data.table compared to dplyr: https://github.com/szilard/benchm-dplyr-dt/blob/master/bm.Rmd
- http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/ Advanced tips and tricks with data.table
- https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html Explains about “modifying by reference” which is one of the trickiest topics and most important to understand. Here you will see why “DT <- copy(DT)” will be an essential line in your functions to avoid unwanted side-effects. More discussion here: https://stackoverflow.com/questions/13756178/writings-functions-procedures-for-data-table-objects