Categories
original research

Appendix to JAWWS: An Incrementally Rewritten Paragraph

Yesterday, I published a post describing an idea to improve scientific style by rewriting papers as part of a new science journal. I originally wanted to conclude the post with a demonstration of how the rewriting could be done, but I didn’t want to add too much length. Here it is as an appendix.

We start with a paragraph taken more or less at random from a biology paper titled “Shedding light on the ‘dark side’ of phylogenetic comparative methods“, published by Cooper et al. in 2016. Then, in five steps, we’ll incrementally improve it — at least according to my preferences! Let me know if it fits your own idea of good scientific writing as well.

1. Original

Most models of trait evolution are based on the Brownian motion model (Cavalli-Sforza & Edwards 1967; Felsenstein 1973). The Ornstein–Uhlenbeck (OU) model can be thought of as a modification of the Brownian model with an additional parameter that measures the strength of return towards a theoretical optimum shared across a clade or subset of species (Hansen 1997; Butler & King 2004). OU models have become increasingly popular as they tend to fit the data better than Brownian motion models, and have attractive biological interpretations (Cooper et al. 2016b). For example, fit to an OU model has been seen as evidence of evolutionary constraints, stabilising selection, niche conservatism and selective regimes (Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013). However, the OU model has several well-known caveats (see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014). For example, it is frequently incorrectly favoured over simpler models when using likelihood ratio tests, particularly for small data sets that are commonly used in these analyses (the median number of taxa used for OU studies is 58; Cooper et al. 2016b). Additionally, very small amounts of error in data sets can result in an OU model being favoured over Brownian motion simply because OU can accommodate more variance towards the tips of the phylogeny, rather than due to any interesting biological process (Boettiger, Coop & Ralph 2012; Pennell et al. 2015). Finally, the literature describing the OU model is clear that a simple explanation of clade-wide stabilising selection is unlikely to account for data fitting an OU model (e.g. Hansen 1997; Hansen & Orzack 2005), but users of the model often state that this is the case. Unfortunately, these limitations are rarely taken into account in empirical studies.

Okay, first things first: let’s banish all those horrendous inline citations to footnotes.

2. With footnotes

Most models of trait evolution are based on the Brownian motion model.1Cavalli-Sforza & Edwards 1967; Felsenstein 1973 The Ornstein–Uhlenbeck (OU) model can be thought of as a modification of the Brownian model with an additional parameter that measures the strength of return towards a theoretical optimum shared across a clade or subset of species.2Hansen 1997; Butler & King 2004 OU models have become increasingly popular as they tend to fit the data better than Brownian motion models, and have attractive biological interpretations.3Cooper et al. 2016b For example, fit to an OU model has been seen as evidence of evolutionary constraints, stabilising selection, niche conservatism and selective regimes.4Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013 However, the OU model has several well-known caveats.5see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014 For example, it is frequently incorrectly favoured over simpler models when using likelihood ratio tests, particularly for small data sets that are commonly used in these analyses.6the median number of taxa used for OU studies is 58; Cooper et al. 2016b Additionally, very small amounts of error in data sets can result in an OU model being favoured over Brownian motion simply because OU can accommodate more variance towards the tips of the phylogeny, rather than due to any interesting biological process.7Boettiger, Coop & Ralph 2012; Pennell et al. 2015 Finally, the literature describing the OU model is clear that a simple explanation of clade-wide stabilising selection is unlikely to account for data fitting an OU model,8e.g. Hansen 1997; Hansen & Orzack 2005 but users of the model often state that this is the case. Unfortunately, these limitations are rarely taken into account in empirical studies.

Much better.

Does this need to be a single paragraph? No, it doesn’t. Let’s not go overboard with cutting it up, but I think a three-fold division makes sense.

3. Multiple paragraphs

Most models of trait evolution are based on the Brownian motion model.9Cavalli-Sforza & Edwards 1967; Felsenstein 1973

The Ornstein–Uhlenbeck (OU) model can be thought of as a modification of the Brownian model with an additional parameter that measures the strength of return towards a theoretical optimum shared across a clade or subset of species.10Hansen 1997; Butler & King 2004 OU models have become increasingly popular as they tend to fit the data better than Brownian motion models, and have attractive biological interpretations.11Cooper et al. 2016b For example, fit to an OU model has been seen as evidence of evolutionary constraints, stabilising selection, niche conservatism and selective regimes.12Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013

However, the OU model has several well-known caveats.13see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014 For example, it is frequently incorrectly favoured over simpler models when using likelihood ratio tests, particularly for small data sets that are commonly used in these analyses.14the median number of taxa used for OU studies is 58; Cooper et al. 2016b Additionally, very small amounts of error in data sets can result in an OU model being favoured over Brownian motion simply because OU can accommodate more variance towards the tips of the phylogeny, rather than due to any interesting biological process.15Boettiger, Coop & Ralph 2012; Pennell et al. 2015 Finally, the literature describing the OU model is clear that a simple explanation of clade-wide stabilising selection is unlikely to account for data fitting an OU model,16e.g. Hansen 1997; Hansen & Orzack 2005 but users of the model often state that this is the case. Unfortunately, these limitations are rarely taken into account in empirical studies.

We haven’t rewritten anything yet — the changes so far are really low-hanging fruit! Let’s see if we can improve the text more with some rephrasing. This is trickier, because there’s a risk I change the original meaning, but it’s not impossible.

4. Some rephrasing

Most models of trait evolution are based on the Brownian motion model, in which traits evolve randomly and accrue variance over time.17Cavalli-Sforza & Edwards 1967; Felsenstein 1973

What if we add a parameter to measure how much the trait motion returns to a theoretical optimum for a given clade or set of species? Then we get a family of models called Ornstein-Uhlenbeck,18Hansen 1997; Butler & King 2004 first developed as a way to describe friction in the Brownian motion of a particle. These models have become increasingly popular, both because they tend to fit the data better than simple Brownian motion, and because they have attractive biological interpretations.19Cooper et al. 2016b For example, fit to an Ornstein-Uhlenbeck model has been seen as evidence of evolutionary constraints, stabilising selection, niche conservatism and selective regimes.20Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013

However, Ornstein-Uhlenbeck models have several well-known caveats.21see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014 For example, they are frequently — and incorrectly — favoured over simpler Brownian models. This occurs with likelihood ratio tests, particularly for the small data sets that are commonly used in these analyses.22the median number of taxa used for Ornstein-Uhlenbeck studies is 58; Cooper et al. 2016b It also happens when there is error in the data set, even very small amounts of error, simply because Ornstein-Uhlenbeck models accommodate more variance towards the tips of the phylogeny — therefore suggesting an interesting biological process where there is none.23Boettiger, Coop & Ralph 2012; Pennell et al. 2015 Additionally, users of Ornstein-Uhlenbeck models often state that clade-wide stabilising selection accounts for data fitting the model, even though the literature describing the model warns that such a simple explanation is unlikely.24e.g. Hansen 1997; Hansen & Orzack 2005 Unfortunately, these limitations are rarely taken into account in empirical studies.

What did I do here? First, I completely got rid of the “OU” acronym. Acronyms may look like they simplify the writing, but in fact they often ask more cognitive resources from the reader, who has to constantly remember that OU means Ornstein-Uhlenbeck.

Then I rephrased several sentences to make them flow better, at least according to my taste.

I also added a short explanation of what Brownian and Ornstein-Uhlenbeck models are. That might not be necessary, but it’s always good to make life easier for the reader. Even if you defined the terms earlier in the paper, repetition is useful to avoid asking the reader an effort to remember. And even if everyone reading your paper is expected to know what Brownian motion is, there’ll be some student somewhere thanking you for reminding them.25I considered doing this with the “evolutionary constraints, stabilising selection, niche conservatism and selective regimes” enumeration too, but these are mere examples, less critical to the main idea of the section. Adding definitions would make the sentence quite long and detract from the main flow. Also I don’t know what the definitions are and don’t feel like researching lol.

This is already pretty good, and still close enough to the original. What if I try to go further?

5. More rephrasing

Most models of trait evolution are based on the Brownian motion model.26Cavalli-Sforza & Edwards 1967; Felsenstein 1973 Brownian motion was originally used to describe the random movement of a particle through space. In the context of trait evolution, it assumes that a trait (say, beak size in some group of bird species) changes randomly, with some species evolving a larger beak, some a smaller one, and so on. Brownian motion implies that variance in beak size, across the group of species, increases over time.

This is a very simple model. What if we refined it by adding a parameter? Suppose there is a theoretical optimal beak size for this group of species. The new parameter measures how much the trait tends to return to this optimum. This gives us a type of model called Ornstein-Uhlenbeck,27Hansen 1997; Butler & King 2004 first developed as a way to add friction to the Brownian motion of a particle.

Ornstein-Uhlenbeck models have become increasingly popular in trait evolution, for two reasons.28Cooper et al. 2016b First, they tend to fit the data better than simple Brownian motion. Second, they have attractive biological interpretations. For example, fit to an Ornstein-Uhlenbeck model has been seen as evidence of a number of processes, including evolutionary constraints, stabilising selection, niche conservatism and selective regimes.29Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013

Despite this, Ornstein-Uhlenbeck models are not perfect, and have several well-known caveats.30see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014 Sometimes you really should use a simpler model! It is common, but incorrect, to favour an Ornstein-Uhlenbeck model over a Brownian model after performing likelihood ratio tests, particularly for the small data sets that are often used in these analyses.31the median number of taxa used for Ornstein-Uhlenbeck studies is 58; Cooper et al. 2016b Then there is the issue of error in data sets. Even a very small amount of error can lead researchers to pick an Ornstein-Uhlenbeck model, simply because they accommodate more variance towards the tips of the phylogeny — therefore suggesting interesting biological processes where there is none.32Boettiger, Coop & Ralph 2012; Pennell et al. 2015

Additionally, users of Ornstein-Uhlenbeck models often state that the reason their data fits the model is clade-wide stabilising selection (for instance, selection for intermediate beak sizes, rather than extreme ones, across the group of birds). Yet the literature describing the model warns that such simple explanations are unlikely.33e.g. Hansen 1997; Hansen & Orzack 2005

Unfortunately, these limitations are rarely taken into account in empirical studies.

Okay, many things to notice here. First, I added an example, bird beak size. I’m not 100% sure I understand the topic well enough for my example to be particularly good, but I think it’s decent. I also added more explanation of what Brownian models are in trait evolution. Then I rephrased other sentences to make the tone less formal.

As a result, this version is longer than the previous ones. It seemed justified to cut it up into more paragraphs to accommodate the extra length. It’s plausible that the authors originally tried to include too much content in too few words, perhaps to satisfy a length constraint posed by the journal.

Let’s do one more round…

6. Rephrasing, extreme edition

Suppose you want to model the evolution of beak size in some fictional family of birds. There are 20 bird species in the family, all with different average beak sizes. You want to create a model of how their beaks changed over time, so you can reimagine the beak of the family’s ancestor and understand what happened exactly.

Most people who try to model the evolution of a biological trait use some sort of Brownian motion model.34Cavalli-Sforza & Edwards 1967; Felsenstein 1973 Brownian motion, originally, refers to the random movement of a particle in a liquid or gas. The mathematical analogy here is that beak size evolves randomly: it becomes very large in some species, very small in others, with various degrees of intermediate forms between the extremes. Therefore, across the 20 species, the variance in beak size increases over time.

Brownian motion is a very simple model. What if we add a parameter to get a slightly more complicated one? Let’s assume there’s a theoretical optimal beak size for our family of birds — maybe because the seeds they eat have a constant average diameter. The new parameter measures how much beak size tends to return to the optimum during its evolution. This gives us a type of model called Ornstein-Uhlenbeck,35Hansen 1997; Butler & King 2004 first developed as a way to add friction to the Brownian motion of a particle. We can imagine the “friction” to be the resistance against deviating from the optimum.

Ornstein-Uhlenbeck models have become increasingly popular, for two reasons.36Cooper et al. 2016b First, they often fit real-life data better than simple Brownian motion. Second, they are easy to interpret biologically. For example, maybe our birds don’t have as extreme beak sizes as we’d expect from a Brownian model, so it makes sense to assume there’s some force pulling the trait towards an intermediate optimum. That force might be an evolutionary constraint, stabilising selection (i.e. selection against extremes), niche conservatism (the tendency to keep ancestral traits), or selective regimes. Studies using Ornstein-Uhlenbeck models have been seen as evidence for each of these patterns.37Wiens et al. 2010; Beaulieu et al. 2012; Christin et al. 2013; Mahler et al. 2013

Of course, Ornstein-Uhlenbeck aren’t perfect, and in fact have several well-known caveats.38see Ives & Garland 2010; Boettiger, Coop & Ralph 2012; Hansen & Bartoszek 2012; Ho & Ané 2013, 2014 For example, simpler models are sometimes better. It’s common for researchers to incorrectly choose Ornstein-Uhlenbeck instead of Brownian motion when using likelihood ratio tests to compare models, a problem especially present due to the small data sets that are often used in these analyses.39the median number of taxa used for Ornstein-Uhlenbeck studies is 58; Cooper et al. 2016b Then there is the issue of error in data sets (e.g. when your beak size data isn’t fully accurate). Even a very small amount of error can lead researchers to pick an Ornstein-Uhlenbeck model, simply because it’s better at accommodating variance among closely related species at the tips of a phylogenetic tree. This can suggest interesting biological processes where there are none.40Boettiger, Coop & Ralph 2012; Pennell et al. 2015

One particular mistake that users of Ornstein-Uhlenbeck models often make is to assume that their data fits the model due to clade-wise stabilising selection (e.g. selection for intermediate beak sizes, rather than extreme ones, across the family of birds). Yet the literature warns against exactly that — according to the papers describing the models, such simple explanations are unlikely.41e.g. Hansen 1997; Hansen & Orzack 2005

Unfortunately, these limitations are rarely taken into account in empirical studies.

This is longer still than the previous version! At this point I’m convinced the original paragraph was artificially short. That is, it packed far more information than a text of its size normally should.

This is a common problem in science writing. Whenever you write something, there’s a tradeoff between brevity, clarity, amount of information, and complexity: you can only maximize three of them. Since science papers often deal with a lot of complex information, and have word limits, clarity often gets the short end of the stick.

Version 6 is a good example of sacrificing brevity to get more clarity. In this case it’s important to keep the amount of information constant, because I don’t want to change what the original authors were saying. It is possible that they were saying too many things. On the other hand, this is only one paragraph in a longer paper, so maybe it made sense to simply mention some ideas without developing them.

I tried a Version 7 in which I aimed for a shorter paragraph, on the scale of the original one, but I failed. To be able to keep all the information, I would have to sacrifice the extra explanations and the bird beak example, and we’d be back to square one. This suggests that both the original paragraph and my rewritten version are on different points on the tradeoff curve. The original is brief, information-rich, and complex dense; my version is information-rich, complex, and clear.. To get brief and clear would require taking some information out, which I can’t do as a rewriter.

It is my opinion that sacrificing clarity is the worst possible world, at least in most contexts. We could then rephrase my project as attempting to emphasize clarity above all else — after all, brevity, information richness and complexity serve no purpose if they fail to communicate what they want to.

Leave a Reply

Your email address will not be published. Required fields are marked *