A grammar of data manipulation

It seems that Hadley Wickham, the author of the spectacular ggplot2 library for R, is not content with revolutionizing the world of computational data analysis just once. He keeps doing it. This spring, he released the dplyr package, a package that proposes a grammar of data manipulation. I predict that dplyr will become as important for large-scale data analysis and manipulation as ggplot2 has become for visualization. If you like ggplot2, you will love dplyr. 

Read More

Double Jeopardy

Lately it keeps happening to me that I try to invite somebody to review a paper and they decline, giving as reason that they have reviewed the paper already for a different journal* and reviewing the paper again would put the authors into a situation of double jeopardy. This got me thinking. Should reviewers really decline for that reason? As reviewer, I've always thought the opposite. For a paper I have reviewed already, if the authors have made a reasonable effort to address my comments and have now chosen a more adequate journal, I can keep my review short and recommend acceptance. Thus, I'm actually preventing a situation of double jeopardy. I keep the authors from facing yet another reviewer with new opinions and requests. So, which is right? Should reviewers recuse themselves if they are asked to review again for a different journal, or should they instead leap at the opportunity and give the authors a break? I'd be interested in your thoughts.

*Where the paper was rejected, presumably.

Keep your data tidy, Part II

My previous post on tidy data didn’t at all touch on rule 3, “Each type of observational unit forms a table.” The example I gave had only one observational unit, the weekly temperature measurements. Frequently, however, we have data corresponding to multiple observational units. In this case, it is important that we store them in separate tables, and that we know how to combine these tables for useful analyses.

Read More

Surviving the pre-tenure years at an R1 university

A few days ago, Pröf-like Substance asked for posts with suggestions on how to survive the pre-tenure years. I went over my blogging history and realized that I hadn’t really written anything on this topic yet. Most of my advice to date is targeted at more junior scientists. So here is my attempt at giving some suggestions on how to make the most out of your years on the tenure track.

Read More

Eat more gluten? Maybe not.

A recent article in Time magazine argues that “gluten free” is a fad and should die. While the author makes a few good points, overall I think he misses the mark. I agree with the author that when it comes to products where the main ingredient is wheat (in particular bread, pasta, pizza base, cereals, cookies, cakes), gluten-free replacements usually aren’t that healthy. These replacement products are frequently made of rapidly digestible carbohydrates and tend to be nutrient poor. However, what the author fails to mention is that the products being replaced are also made of rapidly digestible carbohydrates and are nutrient poor. There’s really not that much of a difference between gluten-free bread made of tapioca flour and millet and regular gluten-containing bread made of wheat flour. Most people would be better off avoiding both.

Read More

Share your preliminary work with other people, even if you think it’s crap

It’s quite common for me to have students tell me “the analysis didn’t work out” or “the figure looks bad” or “I don’t have any useful results.” And it’s also quite common for the students to be wrong. Sometimes, students have amazing results but are all disappointed because the results aren’t what they had expected. These students fail to see the data for what they are. More commonly, the students may be right in that the data aren’t that great, but usually I can see something in the data that the student didn’t. In either case, it is important that we look at the data together, because jointly we will see more than either of us individually would have seen.

Read More

How to develop a research question, Part II

After my last post discussing how to develop a research question, Sergey Kryazhimskiy asked me to write about how to find the rare good research idea among the many mediocre ones. The truth is that I don’t really know how to do this. If you do, please tell me. I’m sure I could strengthen my research program by picking better problems. Nevertheless, despite my ignorance, I’ve had a reasonably successful career to date. And it was probably not entirely due to sheer luck. So this should give you hope. Even if you don’t know how to pick good problems, you may succeed in science nonetheless. Just work on the problems that seem important to you and hope for the best.

Read More

How to develop a research question

One of the most daunting prospects for a fresh graduate student is having to develop a solid research question [1]. In my experience, many graduate students feel like they don’t even know where to start. The literature can seem overwhelming, everything has already be done by somebody, and in any case it’s impossible to really know all the literature there is anyway. Making matters worse, almost every cohort or lab inevitably has one or two students who just seem to be fountains of good ideas, who constantly come up with new research ideas they want to pursue. As a result, students who are less inventive or less imaginative can feel like they’re not cut out for a career in research, they’re never going to have the necessary ideas to sustain a research program.

Read More