• Goodbye Joyplots

    Anybody who has been paying any attention to the data visualization scene knows that the summer of 2017 was the summer of joyplots. This type of visualization turned viral, probably not in small part fueled by the R package ggjoy that I wrote in July. However, I think it’s time to retire both the name “joyplot” and the ggjoy package, and as of today the ggjoy package is officially deprecated. A replacement package ggridges is in place and provides essentially the same functionality.

  • Do you have to publish papers to obtain a PhD?

    It is common for friction to arise between graduate students and their supervisors (PIs) over how many and what kind of papers the students need to publish before graduating. While on occasion the students’ complaint is that their PI keeps them from publishing [1], the much more common scenario is one where the PI wants the student to complete x papers in y journals while the student just wants to graduate and move on. When these conflicts come to a head, students usually start to inquire what the minimum requirements are before graduation.

  • How to reject a rejection

    For a junior scientist, it can be a major blow when their manuscript is rejected. They have poured many months to years of their time into this project, have submitted the paper where they think it belongs, and the editor puts an end to their aspirations by rejecting the submission. However, more experienced scientists, in particular those with editorial roles at major journals, know very well that many a rejection is not final. Often, a rejection is only the first step in an ongoing negotiation with the journal, one that frequently ends with the eventual publication of the article. To level the playing field between the junior and the more senior scientists, here I’ll reveal this secret to the world: How to reject a rejection.

  • Reading and combining many tidy data files in R

    Everybody who is familiar with the R libraries for processing of tidy data, such as dplyr and ggplot, knows how powerful they are and how much one can get done with just a few lines of R code. However, similarly, everybody who has used them has probably spent more time bringing data into the appropriate tidy format than writing analysis and/or plotting code. In particular, one scenario that arises all the time is that even if data files are in tidy format, the entire dataset may be spread out over many individual files, and loading them all in and combining them into one large table can be cumbersome. Here, I want to demonstrate some neat tricks, using the relatively new package purrr and some recent additions to the package tidyr, that make loading and combining many data files a piece of cake.

    The code shown here depends on the following R packages:

  • The one time I failed to parasitize an established clinical researcher

    As regular readers of this blog probably know, I’m the paragon of a research parasite. I’m a computational biologist, and all I ever do is publish my own analyses of other people’s data. Except that one time, a few years back, when a senior clinical researcher stopped me in my tracks. Thanks to his careful and guarded stewardship of his data, I have been saved from drawing incorrect conclusions from his data and from publicly embarrassing myself by claiming his analysis is complete nonsense.

  • When will that paper be ready?

    No matter how experienced you are as a writer, how many papers you have written, you’ll likely never fully overcome this obstacle: Writing papers takes time. A lot of it. To get a paper submission-ready always takes longer than one would want, and it frequently takes longer than even the worst-case scenario prior projection. As writers, we need to understand what causes these delays, so that we can mitigate them (where possible) and also to simply be prepared for what lies ahead. Here I present three observations I’ve made over the years that explain why most papers take so long until they’re finally completed.

  • Don’t use the passive voice?

    I came across a talk by Steven Pinker on “Linguistics, Style and Writing in the 21st Century.” The talk is excellent and covers several important areas of writing advice. One of them is the topic of active and passive voice. I was pleased to see Pinker give the same advice I have been teaching for a while: The adage “don’t use the passive voice” is nonsensical. Clearly, passive voice cannot be categorically the wrong choice. If it were, then why should it even exist in the English language? There must be a valid use for this grammatical construct. Pinker provides one, and I agree with him. Here I’ll present this perspective in my own words.

  • Hiding journal names from your publication list stinks

    Michael Eisen recently announced his new website, which features a new publication list that doesn’t mention journal names anywhere:

    This idea was quickly picked up by others, e.g.:

    I spoke out against this idea, since I immediately had the gut-feeling response that something was wrong with it:

    However, at the time, I couldn’t quite formulate what I thought the key issue was. I have now given this more thought, and I’ve found various reasons why I think it’s a bad idea to hide journal names. However, I’ve also realized that most of these arguments don’t even matter. As I’ll argue here, hiding journal names from the publication list is directly at odds with the principles of openness and egalitarianism that people like Michael Eisen so strongly promote. Therefore, to put it bluntly, I think this practice stinks.

  • Formatting figure captions and tables

    Every scientist should know how to properly write and format figure captions and tables, yet this topic is rarely taught properly. We just hope that students and postdocs pick up this skill by osmosis. However, in my experience, this doesn’t necessarily happen or it may take a long time. To speed up this process, I gave a brief lecture on this topic in my graduate class today. Here are the slides I used. I hope you will find them useful.

  • The Google Scholar preprint bug redux

    Regular readers of my blog will know that I regularly complain about Google Scholar’s handling of preprints, see e.g. here or here. Well, this week, I had the opportunity to raise my concerns to Anurag Acharya, the co-founder of Google Scholar. His initial response and the subsequent discussion have clarified several things. We now know:

    1. The bug exists
    2. The Scholar team is aware of it
    3. They don’t know how to fix it
    4. They don’t think it’s a particularly pressing problem
    5. For any given paper, the problem will go away eventually, after several months or more
  • How to not mess up your bibliographies with Bibtex

    Bibtex is the reference manager for Latex. I have used it for 20 years, I have written over 100 papers with it, and I think it works really well. I have also rarely met anybody who could use it without messing up their bibliography in some way. Bibtex is an archaic program, written 30 years ago by a graduate student and never substantively changed or updated since. It uses an awkward database format for storing bibliographic entries and an atrocious, poorly-documented programming language for describing how bibliographic entries should be formatted. In fact, the most complete description of bibtex’s inner workings is aptly called Tame the BeaST. (This document is well worth the read for anybody using bibtex with some regularity.) To help ordinary mortals succeed with using Bibtex, I’m providing here a set of best practices and useful guidelines that help you steer clear of the worst pitfalls of Bibtex.

  • Avoiding the official style

    Nobody turns into a good writer over night. Writing well takes a lot of dedicated practice, as well as mastery of many different topics, including grammar, punctuation, word choice, and document organization. However, if I had to name one single skill that likely makes the biggest difference in a person’s ability to write well, I would point to recognizing and avoiding the official style. This style, named so by professor of rhetoric Richard Lanham [1], makes heavy use of passive voice, prepositional phrases, and complex, wordy expressions with little content. And it permeates the scientific literature.

