• Fundamentals of Data Visualization

    I’m very excited to announce my latest project, a book on data visualization. The working title is “Fundamentals of Data Visualization”. The book will be published with O’Reilly, and a preview is available here. The entire book is written in R Markdown, and the figures are made with ggplot2. The source for the book is available on github.

    Read More
  • Move over Strunk and White: My all-time favorite books on writing

    “Read Strunk and White” is the near universal recommendation any student receives who needs to improve their writing. I have only two explanations for this frequent recommendation: 1. People have never actually read Strunk and White or don’t remember anything about it [1]. 2. People have never read any book on writing other than Strunk and White and hence have nothing better to recommend. To provide an alternative, here I would like to present my all-time favorite books on writing, covering three distinct topics: storytelling, copy editing, and writing productively. My recommendations are made from the perspective of the scientist as a writer. Nevertheless, the books I recommend are written for a broad audience and will likely be useful for anybody who is writing non-fiction documents.

    Read More
  • Springer's abusive licensing demands

    Today I posted a tweetstorm on publishing an article with F1000Research that was originally commissioned by Springer:

    I received several requests to turn this into a blog post, so here we go. The blog post consists mostly of the text of the tweets, with some minor edits and clarifications.

    Read More
  • Goodbye Joyplots

    Anybody who has been paying any attention to the data visualization scene knows that the summer of 2017 was the summer of joyplots. This type of visualization turned viral, probably not in small part fueled by the R package ggjoy that I wrote in July. However, I think it’s time to retire both the name “joyplot” and the ggjoy package, and as of today the ggjoy package is officially deprecated. A replacement package ggridges is in place and provides essentially the same functionality.

    Read More
  • Do you have to publish papers to obtain a PhD?

    It is common for friction to arise between graduate students and their supervisors (PIs) over how many and what kind of papers the students need to publish before graduating. While on occasion the students’ complaint is that their PI keeps them from publishing [1], the much more common scenario is one where the PI wants the student to complete x papers in y journals while the student just wants to graduate and move on. When these conflicts come to a head, students usually start to inquire what the minimum requirements are before graduation.

    Read More
  • How to reject a rejection

    For a junior scientist, it can be a major blow when their manuscript is rejected. They have poured many months to years of their time into this project, have submitted the paper where they think it belongs, and the editor puts an end to their aspirations by rejecting the submission. However, more experienced scientists, in particular those with editorial roles at major journals, know very well that many a rejection is not final. Often, a rejection is only the first step in an ongoing negotiation with the journal, one that frequently ends with the eventual publication of the article. To level the playing field between the junior and the more senior scientists, here I’ll reveal this secret to the world: How to reject a rejection.

    Read More
  • Reading and combining many tidy data files in R

    Everybody who is familiar with the R libraries for processing of tidy data, such as dplyr and ggplot, knows how powerful they are and how much one can get done with just a few lines of R code. However, similarly, everybody who has used them has probably spent more time bringing data into the appropriate tidy format than writing analysis and/or plotting code. In particular, one scenario that arises all the time is that even if data files are in tidy format, the entire dataset may be spread out over many individual files, and loading them all in and combining them into one large table can be cumbersome. Here, I want to demonstrate some neat tricks, using the relatively new package purrr and some recent additions to the package tidyr, that make loading and combining many data files a piece of cake.

    The code shown here depends on the following R packages:

    Read More
  • The one time I failed to parasitize an established clinical researcher

    As regular readers of this blog probably know, I’m the paragon of a research parasite. I’m a computational biologist, and all I ever do is publish my own analyses of other people’s data. Except that one time, a few years back, when a senior clinical researcher stopped me in my tracks. Thanks to his careful and guarded stewardship of his data, I have been saved from drawing incorrect conclusions from his data and from publicly embarrassing myself by claiming his analysis is complete nonsense.

    Read More
  • When will that paper be ready?

    No matter how experienced you are as a writer, how many papers you have written, you’ll likely never fully overcome this obstacle: Writing papers takes time. A lot of it. To get a paper submission-ready always takes longer than one would want, and it frequently takes longer than even the worst-case scenario prior projection. As writers, we need to understand what causes these delays, so that we can mitigate them (where possible) and also to simply be prepared for what lies ahead. Here I present three observations I’ve made over the years that explain why most papers take so long until they’re finally completed.

    Read More
  • Don’t use the passive voice?

    I came across a talk by Steven Pinker on “Linguistics, Style and Writing in the 21st Century.” The talk is excellent and covers several important areas of writing advice. One of them is the topic of active and passive voice. I was pleased to see Pinker give the same advice I have been teaching for a while: The adage “don’t use the passive voice” is nonsensical. Clearly, passive voice cannot be categorically the wrong choice. If it were, then why should it even exist in the English language? There must be a valid use for this grammatical construct. Pinker provides one, and I agree with him. Here I’ll present this perspective in my own words.

    Read More
  • Hiding journal names from your publication list stinks

    Michael Eisen recently announced his new website, which features a new publication list that doesn’t mention journal names anywhere:

    This idea was quickly picked up by others, e.g.:

    I spoke out against this idea, since I immediately had the gut-feeling response that something was wrong with it:

    However, at the time, I couldn’t quite formulate what I thought the key issue was. I have now given this more thought, and I’ve found various reasons why I think it’s a bad idea to hide journal names. However, I’ve also realized that most of these arguments don’t even matter. As I’ll argue here, hiding journal names from the publication list is directly at odds with the principles of openness and egalitarianism that people like Michael Eisen so strongly promote. Therefore, to put it bluntly, I think this practice stinks.

    Read More
  • Formatting figure captions and tables

    Every scientist should know how to properly write and format figure captions and tables, yet this topic is rarely taught properly. We just hope that students and postdocs pick up this skill by osmosis. However, in my experience, this doesn’t necessarily happen or it may take a long time. To speed up this process, I gave a brief lecture on this topic in my graduate class today. Here are the slides I used. I hope you will find them useful.

    Read More