How infographics can lose the plot

By this point it should’ve become apparent to most people who engage with infographics on a semi-regular basis that there are some rules about what they should or shouldn’t look like, and that your canvas isn’t actually infinite in terms of what you can create that will a) look good and b) make sense. But just when you think everyone’s going to create sane visualisations of data, there comes along one absolute trash-fire of an infographic to remind you that there are still people out there who can and will ruin your day. And when that someone is a media channel the size of News18, the issue at hand actually transforms from being a molehill to a mountain.

Because it’s News18, it’s no longer just about following good practices when making an infographic but also about moving the hundreds of thousands of people who will have seen the infographic (@CNNnews18 has 3.4 million followers) away from the idea that News18’s effort produced something legitimate. It’s like you and your squad are guiding a group of people quietly through a jungle at night, almost unseen, when an idiot decides he has to smoke a joint, lights his match, gives your position away to the enemy and you all get killed. To the wider world, you were all idiots – but only you will know that things would’ve been rosier if it hadn’t been for that junkie (and spare me your consternation about what a lousy analogy this is). Without further ado, the trash-fire:

Fonts and colours, not bad, but that’s it. Here’s what’s wrong:

  1. The contours of the chicken-leg and the leaf appear to have dictated the positioning of numbers and lines in the graphic, whereas it should’ve been the other way around
  2. The same length represented by 25% for Rajasthan also signifies 31% and 33% for Haryana and Punjab, respectively
  3. The states (in the graphic) from Bihar to Telangana all have less than 10% on the veg side – but the amount of leafy area would suggest these values are much higher than actual
  4. If anything, West Bengal and Telangana are the worst offenders: the breadth of leaf they have for their measly 1% is longer than that of Rajasthan’s 25%
  5. The numbers say that only 4/21 states have more vegetarians than non-vegetarians – but a glance would suggest that fraction’s closer to 13/21
  6. Also: wtf are these irregular shapes? Why not just pick regular rectangles and shade them accordingly?

In fact, across the board (of mistakes), it seems the designer may have forgotten or ignored just one guiding principle of all infographics: that they should give a clear and accurate impression of the truth as represented by the numbers. This often requires the designer to ensure that the axes are clearly visible, that representations of values through parameters like distance, area, volume, etc. are consistent and predictable throughout the graphic, that the representation of relative values is proportionate, that colours and/or stylisations don’t mislead the reader, etc.

These are the reasons why the ‘3D’ pie-chart offered by MS Powerpoint hasn’t found wider use. It offers nothing at all in addition to the normal ‘flat’ pie-chart but actually make things worse by distorting how the values are displayed. Similarly, you take one look at this chicken-leaf thing and you take away… nothing. You need to look at it again, closer each time, toss the numbers around a bit if they make sense, etc. It’s really just an attention-whore of an infographic, to be used as bait with which to trawl Twitter for a flamewar around the Indian government’s recent attitude towards the consumption of meat, especially beef.

Also: “So what if it’s a little off the mark to get some attention? It’s done its job, right?” → if this is your question, then the answer is that if you don’t force designers – especially those working with journalists – to follow best practices when making an infographic, you’ll be setting a lower bar that will soon turn around and assault you with all kinds of charts and plots conceived to hide what the numbers are really saying and instead massage your preconceived biases while playing up ‘almost-right’ propaganda. Yes, infographics can quickly and effectively misguide, especially when you don’t have much time to spend scrutinising it. Hell, isn’t that why infographics were invented in the first place: to let you take one look at a visualisation and get a good idea of what’s going on? This is exactly why there’s a lot of damage done when you’re screwing with infographics.

So DON’T DO IT.

Some research misconduct trends by the numbers

A study published in eLIFE on August 14, 2014, looked at data pertaining to some papers published between 1992 and 2012 that the Office of Research Integrity had determined contained research misconduct. From the abstract:

Data relating to retracted manuscripts and authors found by the Office of Research Integrity (ORI) to have committed misconduct were reviewed from public databases. Attributable costs of retracted manuscripts, and publication output and funding of researchers found to have committed misconduct were determined. We found that papers retracted due to misconduct accounted for approximately $58 million in direct funding by the NIH between 1992 and 2012, less than 1% of the NIH budget over this period. Each of these articles accounted for a mean of $392,582 in direct costs (SD $423,256). Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI.

While the number of retractions worldwide is on the rise – also because the numbers of papers being published and of journals are on the rise – the study addresses a subset of these papers and only those drawn up by researchers who received funding from the National Institutes of Health (NIH).

pubsfreq

Among them, there is no discernible trend in terms of impact factors and attributable losses. In the chart below, the size of each datapoint corresponds to the direct attributable loss and its color, to the impact factor of the journal that published the paper.

tabpublic 15-08-2014 100128

However, is the time to retraction dropping?

The maximum time to retraction has been on the decline since 1997. However, on average, the time to retraction is still fluctuating, influenced as it is by the number of papers retracted and the nature of misconduct.

trendTimeToRetr

No matter the time to retraction or the impact factors of the journals, most scientists experience a significant difference in funding before and after the ORI report comes through, as the chart below shows, sorted by quanta of funds. The right axis displays total funding pre-ORI and the left, total funding post-ORI.

prepostfund

As the study’s authors summarize in their abstract: “Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI,” while total funding toward all implicated researchers went from $131 million to $74.5 million.

There could be some correlation between the type of misconduct and decline in funding, but there’s not enough data to determine that. Nonetheless, there are eight instances in 1992-2012 when the amount of funding increased after the ORI report, of which the lowest rise as such as is seen for John Ho, who committed fraud, and the highest for Alan Landay, implicated for plagiarism, a ‘lesser’ charge.

incfundFrom the paper:

The personal consequences for individuals found to have committed research misconduct are considerable. When a researcher is found by the ORI to have committed misconduct, the outcome typically involves a voluntary agreement in which the scientist agrees not to contract with the United States government for a period of time ranging from a few years to, in rare cases, a lifetime. Recent studies of faculty and postdoctoral fellows indicate that research productivity declines after censure by the ORI, sometimes to zero, but that many of those who commit misconduct are able to find new jobs within academia (Redman and Merz, 2008, 2013). Our study has found similar results. Censure by the ORI usually results in a severe decrease in productivity, in many cases causing a permanent cessation of publication. However the exceptions are instructive.

Retraction Watch reported the findings with especial focus on the cost of research misconduct. They spoke to Daniele Fanelli, one part of whose quote is notable – albeit no less than the rest.

The question of collateral damage, by which I mean the added costs caused by other research being misled, is controversial. It still has to be conclusively shown, in other words, that much research actually goes wasted directly because of fabricated findings. Waste is everywhere in science, but the role played by frauds in generating it is far from established and is likely to be minor.

References

Stern, A.M., Casadevall, A., Steen, R.G. and Fang, F.C., Financial costs and personal consequences of research misconduct resulting in retracted publications, eLIFE. August 14, 2014;3:e02956.

Replication studies, ceiling effects, and the psychology of science

On May 25, I found Erika Salomon’s tweet:

The story started when the journal Social Psychology decided to publish successful and failed replication attempts instead of conventional papers and their conclusions for a Replications Special Issue (Volume 45, Number 3 / 2014). It accepted proposals from scientists stating which studies they wanted to try to replicate, and registered the accepted ones. This way, the journal’s editors Brian Nosek and Daniel Lakens could ensure that a study was published no matter the outcome – successful or not.

All the replication studies were direct replication studies, which means they used the same experimental procedure and statistical methods to analyze the data. And before the replication attempt began, the original data, procedure and analysis methods were scrutinized, and the data was shared with the replicating group. Moreover, an author of the original paper was invited to review the respective proposals and have a say in whether the proposal could be accepted. So much is pre-study.

Finally, the replication studies were performed, and had their results published.


The consequences of failing to replicate a study

Now comes the problem: What if the second group failed to replicate the findings of the first group? There are different ways of looking at this from here on out. The first person such a negative outcome affects is the original study’s author, whose reputation is at stake. Given the gravity of the situation, is the original author allowed to ask for a replication of the replication?

Second, during the replication study itself (and given the eventual negative outcome), how much of a role is the original author allowed to play when performing the experiment, analyzing the results and interpreting them? This could swing both ways. If the original author is allowed to be fully involved during the analysis process, there will be a conflict of interest. If the original author is not allowed to participate in the analysis, the replicating group could get biased toward a negative outcome for various reasons.

Simone Schnall, a psychology researcher from Cambridge writes on the SPSP blog (linked to in the tweet above) that, as an author of a paper whose results have been unsuccessfully replicated and reported in the Special Issue, she feels “like a criminal suspect who has no right to a defense and there is no way to win: The accusations that come with a “failed” replication can do great damage to my reputation, but if I challenge the findings I come across as a “sore loser.””

People on both sides of this issue recognize the importance of replication studies; there’s no debate there. But the presence of these issues calls into question how replication studies are designed, reviewed and published, with a just as firm support structure, or they all suffer the risk of becoming personalized. Forget who replicates the replicators, it could just as well become who bullies the bullies. And in the absence of such rules, replication studies are becoming actively disincentivized. Simone Schnall acceded to a request to replicate her study, but the fallout could set a bad example.

During her commentary, Schnall links to a short essay by Princeton University psychologist Daniel Kahneman titled ‘A New Etiquette for Replication‘. In the piece, Kahneman writes, “… tension is inevitable when the replicator does not believe the original findings and intends to show that a reported effect does not exist. The relationship between replicator and author is then, at best, politely adversarial. The relationship is also radically asymmetric: the replicator is in the offense, the author plays defense.”

In this blog post by one of the replicators, the phrase “epic fail” is an example of how things could be personalized. Note: the author of the post has struck out the words and apologized.

In order to eliminate these issues, the replicators could be asked to keep things specific. Various stakeholders have suggested different ways to resolve this issue. For one, replicators should address the questions and answers raised in the original study instead of the author and her/his credentials. Another way is to regularly publish reports of replication results instead of devoting a special issue to it, and make them part of the scientific literature.

This is one concern that Schnall raises in her answers (in response to question #13):”I doubt anybody would have widely shared the news had the replication been considered “successful.”” So there’s a need to address a bias here: are journals likelier to publish replication studies that fail to replicate previous results? Erasing this bias requires publishers to actively incentivize replication studies.

A paper published in Perspectives on Psychological Science in 2012 paints a slightly different picture. It looks at the number of replication studies published in the field and pegs the replication rate at 1.07%. Despite the low rate, one of the paper’s conclusions was that among all published replication studies, most of them reported successful, not unsuccessful, replications. It also notes that since 2000, among all replication studies published, the fraction reporting successful outcomes stands at 69.4%, and that reporting unsuccessful outcomes at 11.8%.

chart_1
Sorry about the lousy resolution. Click on the chart for a better view.

At the same time, Nosek and Lakens concede in this editorial that, “In the present scientific culture, novel and positive results are considered more publishable than replications and negative results.”


The ceiling effect

Schnall does raise many questions about the replication, including alleging the presence of a ceiling effect. As she describes it (in response to question #8):

“Imagine two people are speaking into a microphone and you can clearly understand and distinguish their voices. Now you crank up the volume to the maximum. All you hear is this high-pitched sound (“eeeeee”) and you can no longer tell whether the two people are saying the same thing or something different. Thus, in the presence of such a ceiling effect it would seem that both speakers were saying the same thing, namely “eeeeee”.

The same thing applies to the ceiling effect in the replication studies. Once a majority of the participants are giving extreme scores, all differences between two conditions are abolished. Thus, a ceiling effect means that all predicted differences will be wiped out: It will look like there is no difference between the two people (or the two experimental conditions).”

She states this as an important reason to get the replicators’ results replicated.


My opinions

// Because Schnall thinks the presence of a ceiling effect is a reason to have the replicators’ results replicated, it implies that there could be a problem with the method used to evaluate the authors’ hypothesis. Both the original and the replication studies used the same method, and the emergence of an effect in one of them but not the other implies the “fault”, if that, could lie with the replicator – for improperly performing the experiment – or with the original author – for choosing an inadequate set-up to verify the hypothesis. Therefore, one thing that Schnall felt strongly about, the scrutiny of her methods, should also have been formally outlined, i.e. a replication study is not just about the replication of results but about the replication of methods as well.

// Because both papers have passed scrutiny and have been judged worthy of publication, it makes sense to treat them as individual studies in their own right instead of one being a follow up to the other (even though technically that’s what they are), and to consider both together instead of selecting one over the other – especially in terms of the method. This sort of debate gives room for Simone Schnall to publish an official commentary in response to the replication effort and make the process inclusive. In some sense, I think this is also the sort of debate that Ivan Oransky and Adam Marcus think scientific publishing should engender.

// Daniel Lakens explains in a comment on the SPSP blog that there was peer-review of the introduction, method, and analysis plan by the original authors and not an independent group of experts. This was termed “pre-data peer review”: a review of the methods and not the numbers. It is unclear to what extent this was sufficient because it’s only with a scrutiny of the numbers does any ceiling effect become apparent. While post-publication peer-review can check for this, it’s not formalized (at least in this case) and does little to mitigate Schnall’s situation.

// Schnall’s paper was peer-reviewed. The replicators’ paper was peer-reviewed by Schnall et al. Even if both passed the same level of scrutiny, they didn’t pass the same type of it. On this basis, there might be reason for Schnall to be involved with the replication study. Ideally, however, it would have been better if the replication was better formulated, with normal peer-review, in order to eliminate Schnall’s interference. Apart from the conflict of interest that could arise, a replication study needs to be fully independent to make it credible, just like the peer-review process is trusted to be credible because it is independent. So while it is commendable that Schnall shared all the details of her study, it should have been made possible for her participation to end there.

// While I’ve disagreed with Kahneman over the previous point, I do agree with point #3 in his essay that describes the new etiquette: “The replicator is not obliged to accept the author’s suggestions [about the replicators’ M.O.], but is required to provide a full description of the final plan. The reasons for rejecting any of the author’s suggestions must be explained in detail.” [Emphasis mine]

I’m still learning about this fascinating topic, so if I’ve made mistakes in interpretations, please point them out.


Featured image: shutterstock/(c)Sunny Forest

The case of the red-haired kids

This blog post first appeared, as written by me, on The Copernican science blog on December 30, 2012.

Seriously, shame on me for not noticing the release of a product named Correlate until December 2012. Correlate by Google was released in May last year and is a tool to see how two different search trends have panned out over a period of time. But instead of letting you pick out searches and compare them, Correlate saves a bit of time by letting you choose one trend and then automatically picks out trends similar to the one you’ve your eye on.

For instance, I used the “Draw” option and drew a straight, gently climbing line from September 19, 2004, to July 24, 2011 (both randomly selected). Next, I chose “India” as the source of search queries for this line to be compared with, and hit “Correlate”. Voila! Google threw up 10 search trends that varied over time just as my line had.

correlate_date

Since I’ve picked only India, the space from which the queries originate remains fixed, making this a temporal trend – a time-based one. If I’d fixed the time – like a particular day, something short enough to not produce strong variations – then it’d have been a spatial trend, something plottable on a map.

Now, there were a lot of numbers on the results page. The 10 trends displayed in fact were ranked according to a particular number “r” displayed against them. The highest ranked result, “free english songs”, had r = 0.7962. The lowest ranked result, “to 3gp converter”, had r = 0.7653.

correlations

And as I moused over the chart itself, I saw two numbers, one each against the two trends being tracked. For example, on March 1, 2009, the “Drawn Series” line had a number +0.701, and the “free english songs” line had a number -0.008, against it.

correlate_zoom

What do these numbers mean?

This is what I want to really discuss because they have strong implications on how lay people interpret data that appears in the context of some scientific text, like a published paper. Each of these numbers is associated with a particular behaviour of some trend at a specific point. So, instead of looking at it as numbers and shapes on a piece of paper, look at it for what it represents and you’ll see so many possibilities coming to life.

The numbers against the trends, +0.701 for “Drawn Series” (my line) and -0.008 for “free english songs” in March ‘09, are the deviations. The deviation is a lovely metric because it sort of presents the local picture in comparison to the global picture, and this perspective is made possible by the simple technique used to evaluate it.

Consider my line. Each of the points on the line has a certain value. Use this information to find their average value. Now, the deviation is how much a point’s value is away from the average value.

It’s like if 11 red-haired kids were made to stand in a line ordered according to the redness of their hair. If the “average” colour around was a perfect orange, then the kid with the “reddest” hair and the kid with the palest-red hair will be the most deviating. Kids with some semblance of orange in their hair-colour will be progressively less deviating until they’re past the perfect “orangeness”, and the kid with perfectly-orange hair will completely non-deviating.

So, on August 23, 2009, “Drawn Series” was higher than its average value by 0.701 and “free english songs” was lower than its average value by 0.008. Now, if you’re wondering what the units are to measure these numbers: Deviations are dimensionless fractions – which means they’re just numbers whose highness or lowness are indications of intensity.

And what’re they fractions of? The value being measured along the trend being tracked.

Now, enter standard deviation. Remember how you found the average value of a point on my line? Well, the standard deviation is the average value among all deviations. It’s like saying the children fitting a particular demographic are, for instance, 25 per cent smarter on average than other normal kids: the standard deviation is 25 per cent and the individual deviations are similar percentages of the “smartness” being measured.

So, right now, if you took the bigger picture, you’d see the chart, the standard deviation (the individual deviations if you chose to mouse-over), the average, and that number “r”. The average will indicate the characteristic behaviour of the trend – let’s call it “orange” – the standard deviation will indicate how far off on average a point’s behaviour will be deviating in comparison to “orange” – say, “barely orange”, “bloody”, etc. – and the individual deviations will show how “orange” each point really is.

At this point I must mention that I conveniently oversimplified the example of the red-haired kids to avoid a specific problem. This problem has been quite single-handedly responsible for the news-media wrongly interpreting results from the LHC/CERN on the Higgs search.

In the case of the kids, we assumed that, going down the line, each kid’s hair would get progressively darker. What I left out was how much darker the hair would get with each step.

Let’s look at two different scenarios.

Scenario 1: The hair gets darker by a fixed amount each step.

Let’s say the first kid’s got hair that’s 1 units of orange, the fifth kid’s got 5 units, and the 11th kid’s got 11 units. This way, the average “amount of orange” in the lineup is going to be 6 units. The deviation on either side of kid #6 is going to increase/decrease in steps of 1. In fact, from the first to the last, it’s going to be 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, and 5. Straight down and then straight up.

blue_bars

Scenario 2: The hair gets darker slowly and then rapidly, also from 1 to 11 units.

In this case, the average is not going to be 6 units. Let’s say the “orangeness” this time is 1, 1.5, 2, 2.5, 3, 3.5, 4, 5.5, 7.5, 9.75, and 11 per kid, which brings the average to ~4.6591 units. In turn, the deviations are 3.6591, 3.1591, 2.6591, 2, 1591, 1.6591, 1.1591, 0.6591, 0.8409, 2.8409, 5.0909, and 6.3409. In other words, slowly down and then quickly more up.

red_bars

In the second scenario, we saw how the average got shifted to the left. This is because there were more less-orange kids than more-orange ones. What’s more important is that it didn’t matter if the kids on the right had more more-orange hair than before. That they were fewer in number shifted the weight of the argument away from them!

In much the same way, looking for the Higgs boson from a chart that shows different peaks (number of signature decay events) at different points (energy levels), with taller but fewer peaks to one side and shorter but many more peaks to the other, can be confusing. While more decays could’ve occurred at discrete energy levels, the Higgs boson is more likely (note: not definitely) to be found within the energy-level where decays occur more frequently (in the chart below, decays are seen to occur more frequently at 118-126 GeV/c2 than at 128-138 GeV/c2 or 110-117 GeV/c2).

incidence
Idea from Prof. Matt Strassler’s blog

If there’s a tall peak where a Higgs isn’t likely to occur, then that’s an outlier, a weirdo who doesn’t fit into the data. It’s probably called an outlier because its deviation from the average could be well outside the permissible deviation from the average.

This also means it’s necessary to pick the average from the right area to identify the right outliers. In the case of the Higgs, if its associated energy-level (mass) is calculated as being an average of all the energy levels at which a decay occurs, then freak occurrences and statistical noise are going to interfere with the calculation. But knowing that some masses of the particle have been eliminated, we can constrain the data to between two energy levels, and then go after the average.

So, when an uninformed journalist looks at the data, the taller peaks can catch the eye, even run away with the ball. But look out for the more closely occurring bunches – that’s where all the action is!

If you notice, you’ll also see that there are no events at some energy levels. This is where you should remember that uncertainty cuts both ways. When you’re looking at a peak and thinking “This can’t be it; there’s some frequency of decays to the bottom, too”, you’re acknowledging some uncertainty in your perspective. Why not acknowledge some uncertainty when you’re noticing absent data, too?

While there’s a peak at 126 GeV/c2, the Higgs weighs between 124-125 GeV/c2. We know this now, so when we look at the chart, we know we were right in having been uncertain about the mass of the Higgs being 126 GeV/c2. Similarly, why not say “There’s no decays at 113 GeV/c2, but let me be uncertain and say there could’ve been a decay there that’s escaped this measurement”?

Maybe this idea’s better illustrated with this chart.

incidence_valley

There’s a noticeable gap between 123 and 125 GeV/c2. Just looking at this chart and you’re going to think that with peaks on either side of this valley, the Higgs isn’t going to be here… but that’s just where it is! So, make sure you address uncertainty when you’re determining presences as well as absences.

So, now, we’re finally ready to address “r”, the Pearson covariance coefficient. It’s got a formula, and I think you should see it. It’s pretty neat.

daum_equation_1356801915634

(TeX: rquad =quad frac { { Sigma }_{ i=1 }^{ n }({ X }_{ i }quad -quad overset { _ }{ X } )({ Y }_{ i }quad -quad overset { _ }{ Y } ) }{ sqrt { { Sigma }_{ i=1 }^{ n }{ ({ X }_{ i }quad -quad overset { _ }{ X } ) }^{ 2 } } sqrt { { Sigma }_{ i=1 }^{ n }{ (Y_{ i }quad -quad overset { _ }{ Y } ) }^{ 2 } } })

The equation says “Let’s see what your Pearson covariance, “r”, is by seeing how much all of your variations are deviant keeping in mind both your standard deviations.”

The numerator is what’s called the covariance, and the denominator is basically the product of the standard deviations. X-bar, which is X with a bar atop, is the average value of X – my line – and the same goes for Y-bar, corresponding to Y – “mobile games”. Individual points on the lines are denoted with the subscript “i”, so the points would be X1, X2, X3, …, and Y1, Y2, Y3, …”n” in the formula is the size of the sample – the number of days over which we’re comparing the two trends.

The Pearson covariance coefficient is not called the Pearson deviation coefficient, etc., because it normalises the graph’s covariance. Simply put, covariance is a measure of how much the two trends vary together. It can have a minimum value of 0, which would mean one trend’s variation has nothing to do with the other’s, and a maximum value of 1, which would mean one trend’s variation is inescapably tied with the variation of the other’s. Similarly, if the covariance is positive, it means that if one trend climbs, the other would climb, too. If the covariance is negative, then one trend’s climbing would mean the other’s descending (In the chart below, between Oct ’09 and Jan ’10, there’s a dip: even during the dive-down, the blue line is on an increasing note – here, the local covariance will be negative).

correlate_sample

Apart from being a conveniently defined number, covariance also records a trend’s linearity. In statistics, linearity is a notion that stands by its name: like a straight line, the rise or fall of a trend is uniform. If you divided up the line into thousands of tiny bits and called each one on the right the “cause” and the one on the left the “effect”, then you’d see that linearity means each effect for each cause is either an increase or a decrease by the same amount.

Just like that, if the covariance is a lower positive number, it means one trend’s growth is also the other trend’s growth, and in equal measure. If the covariance is a larger positive number, you’d have something like the butterfly effect: one trend moves up by an inch, the other shoots up by a mile. This you’ll notice is a break from linearity. So if you plotted the covariance at each point in a chart as a chart by itself, one look will tell you how the relationship between the two trends varies over time (or space).

The case of the red-haired kids

Seriously, shame on me for not noticing the release of a product named Correlate until December 2012. Correlate by Google was released in May last year and is a tool to see how two different search trends have panned out over a period of time. But instead of letting you pick out searches and compare them, Correlate saves a bit of time by letting you choose one trend and then automatically picks out trends similar to the one you’ve your eye on.

For instance, I used the “Draw” option and drew a straight, gently climbing line from September 19, 2004, to July 24, 2011 (both randomly selected). Next, I chose “India” as the source of search queries for this line to be compared with, and hit “Correlate”. Voila! Google threw up 10 search trends that varied over time just as my line had.

Since I’ve picked only India, the space from which the queries originate remains fixed, making this a temporal trend – a time-based one. If I’d fixed the time – like a particular day, something short enough to not produce strong variations – then it’d have been a spatial trend, something plottable on a map.

Now, there were a lot of numbers on the results page. The 10 trends displayed in fact were ranked according to a particular number “r” displayed against them. The highest ranked result, “free english songs”, had r = 0.7962. The lowest ranked result, “to 3gp converter”, had r = 0.7653.

And as I moused over the chart itself, I saw two numbers, one each against the two trends being tracked. For example, on March 1, 2009, the “Drawn Series” line had a number +0.701, and the “free english songs” line had a number -0.008, against it.

What do these numbers mean?

This is what I want to really discuss because they have strong implications on how lay people interpret data that appears in the context of some scientific text, like a published paper. Each of these numbers is associated with a particular behaviour of some trend at a specific point. So, instead of looking at it as numbers and shapes on a piece of paper, look at it for what it represents and you’ll see so many possibilities coming to life.

The numbers against the trends, +0.701 for “Drawn Series” (my line) and -0.008 for “free english songs” in March ‘09, are the deviations. The deviation is a lovely metric because it sort of presents the local picture in comparison to the global picture, and this perspective is made possible by the simple technique used to evaluate it.

Consider my line. Each of the points on the line has a certain value. Use this information to find their average value. Now, the deviation is how much a point’s value is away from the average value.

It’s like if 11 red-haired kids were made to stand in a line ordered according to the redness of their hair. If the “average” colour around was a perfect orange, then the kid with the “reddest” hair and the kid with the palest-red hair will be the most deviating. Kids with some semblance of orange in their hair-colour will be progressively less deviating until they’re past the perfect “orangeness”, and the kid with perfectly-orange hair will completely non-deviating.

So, on August 23, 2009, “Drawn Series” was higher than its average value by 0.701 and “free english songs” was lower than its average value by 0.008. Now, if you’re wondering what the units are to measure these numbers: Deviations are dimensionless fractions – which means they’re just numbers whose highness or lowness are indications of intensity.

And what’re they fractions of? The value being measured along the trend being tracked.

Now, enter standard deviation. Remember how you found the average value of a point on my line? Well, the standard deviation is the average value among all deviations. It’s like saying the children fitting a particular demographic are, for instance, 25 per cent smarter on average than other normal kids: the standard deviation is 25 per cent and the individual deviations are similar percentages of the “smartness” being measured.

So, right now, if you took the bigger picture, you’d see the chart, the standard deviation (the individual deviations if you chose to mouse-over), the average, and that number “r”. The average will indicate the characteristic behaviour of the trend – let’s call it “orange” – the standard deviation will indicate how far off on average a point’s behaviour will be deviating in comparison to “orange” – say, “barely orange”, “bloody”, etc. – and the individual deviations will show how “orange” each point really is.

At this point I must mention that I conveniently oversimplified the example of the red-haired kids to avoid a specific problem. This problem has been quite single-handedly responsible for the news-media wrongly interpreting results from the LHC/CERN on the Higgs search.

In the case of the kids, we assumed that, going down the line, each kid’s hair would get progressively darker. What I left out was how much darker the hair would get with each step.

Let’s look at two different scenarios.

Scenario 1The hair gets darker by a fixed amount each step.

Let’s say the first kid’s got hair that’s 1 units of orange, the fifth kid’s got 5 units, and the 11th kid’s got 11 units. This way, the average “amount of orange” in the lineup is going to be 6 units. The deviation on either side of kid #6 is going to increase/decrease in steps of 1. In fact, from the first to the last, it’s going to be 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, and 5. Straight down and then straight up.

Scenario 2The hair gets darker slowly and then rapidly, also from 1 to 11 units.

In this case, the average is not going to be 6 units. Let’s say the “orangeness” this time is 1, 1.5, 2, 2.5, 3, 3.5, 4, 5.5, 7.5, 9.75, and 11 per kid, which brings the average to ~4.6591 units. In turn, the deviations are 3.6591, 3.1591, 2.6591, 2, 1591, 1.6591, 1.1591, 0.6591, 0.8409, 2.8409, 5.0909, and 6.3409. In other words, slowly down and then quickly more up.

In the second scenario, we saw how the average got shifted to the left. This is because there were more less-orange kids than more-orange ones. What’s more important is that it didn’t matter if the kids on the right had more more-orange hair than before. That they were fewer in number shifted the weight of the argument away from them!

In much the same way, looking for the Higgs boson from a chart that shows different peaks (number of signature decay events) at different points (energy levels), with taller but fewer peaks to one side and shorter but many more peaks to the other, can be confusing. While more decays could’ve occurred at discrete energy levels, the Higgs boson is more likely (note: not definitely) to be found within the energy-level where decays occur more frequently (in the chart below, decays are seen to occur more frequently at 118-126 GeV/c2 than at 128-138 GeV/c2 or 110-117 GeV/c2).

If there’s a tall peak where a Higgs isn’t likely to occur, then that’s an outlier, a weirdo who doesn’t fit into the data. It’s probably called an outlier because its deviation from the average could be well outside the permissible deviation from the average.

This also means it’s necessary to pick the average from the right area to identify the right outliers. In the case of the Higgs, if its associated energy-level (mass) is calculated as being an average of all the energy levels at which a decay occurs, then freak occurrences and statistical noise are going to interfere with the calculation. But knowing that some masses of the particle have been eliminated, we can constrain the data to between two energy levels, and then go after the average.

So, when an uninformed journalist looks at the data, the taller peaks can catch the eye, even run away with the ball. But look out for the more closely occurring bunches – that’s where all the action is!

If you notice, you’ll also see that there are no events at some energy levels. This is where you should remember that uncertainty cuts both ways. When you’re looking at a peak and thinking “This can’t be it; there’s some frequency of decays to the bottom, too”, you’re acknowledging some uncertainty in your perspective. Why not acknowledge some uncertainty when you’re noticing absent data, too?

While there’s a peak at 126 GeV/c2, the Higgs weighs between 124-125 GeV/c2. We know this now, so when we look at the chart, we know we were right in having been uncertain about the mass of the Higgs being 126 GeV/c2. Similarly, why not say “There’s no decays at 113 GeV/c2, but let me be uncertain and say there could’ve been a decay there that’s escaped this measurement”?

Maybe this idea’s better illustrated with this chart.

– IDEA FROM Prof. Matt Strassler’s blog

There’s a noticeable gap between 123 and 125 GeV/c2. Just looking at this chart and you’re going to think that with peaks on either side of this valley, the Higgs isn’t going to be here… but that’s just where it is! So, make sure you address uncertainty when you’re determining presences as well as absences.

So, now, we’re finally ready to address “r”, the Pearson covariance coefficient. It’s got a formula, and I think you should see it. It’s pretty neat.

(TeX: rquad =quad frac { { Sigma }_{ i=1 }^{ n }({ X }_{ i }quad -quad overset { _ }{ X } )({ Y }_{ i }quad -quad overset { _ }{ Y } ) }{ sqrt { { Sigma }_{ i=1 }^{ n }{ ({ X }_{ i }quad -quad overset { _ }{ X } ) }^{ 2 } } sqrt { { Sigma }_{ i=1 }^{ n }{ (Y_{ i }quad -quad overset { _ }{ Y } ) }^{ 2 } } })

The equation says “Let’s see what your Pearson covariance, “r“, is by seeing how much all of your variations are deviant keeping in mind both your standard deviations.”

The numerator is what’s called the covariance, and the denominator is basically the product of the standard deviations. X-bar, which is X with a bar atop, is the average value of X – my line – and the same goes for Y-bar, corresponding to Y – “mobile games”. Individual points on the lines are denoted with the subscript “i”, so the points would be X1, X2, X3, …, and Y1, Y2, Y3, …”n” in the formula is the size of the sample – the number of days over which we’re comparing the two trends.

The Pearson covariance coefficient is not called the Pearson deviation coefficient, etc., because it normalises the graph’s covariance. Simply put, covariance is a measure of how much the two trends vary together. It can have a minimum value of 0, which would mean one trend’s variation has nothing to do with the other’s, and a maximum value of 1, which would mean one trend’s variation is inescapably tied with the variation of the other’s. Similarly, if the covariance is positive, it means that if one trend climbs, the other would climb, too. If the covariance is negative, then one trend’s climbing would mean the other’s descending (In the chart below, between Oct ’09 and Jan ’10, there’s a dip: even during the dive-down, the blue line is on an increasing note – here, the local covariance will be negative).

Apart from being a conveniently defined number, covariance also records a trend’s linearity. In statistics, linearity is a notion that stands by its name: like a straight line, the rise or fall of a trend is uniform. If you divided up the line into thousands of tiny bits and called each one on the right the “cause” and the one on the left the “effect”, then you’d see that linearity means each effect for each cause is either an increase or a decrease by the same amount.

Just like that, if the covariance is a lower positive number, it means one trend’s growth is also the other trend’s growth, and in equal measure. If the covariance is a larger positive number, you’d have something like the butterfly effect: one trend moves up by an inch, the other shoots up by a mile. This you’ll notice is a break from linearity. So if you plotted the covariance at each point in a chart as a chart by itself, one look will tell you how the relationship between the two trends varies over time (or space).

A cultured evolution?

Can perceptions arising out of cultural needs override evolutionary goals in the long-run? For example, in India, the average marriage-age is in the late 20s now. Here, the (popular) tradition is to frown down upon, and even ostracize, those who would engage in premarital sex. So, after 10,000 years, say, are Indians more likely to have the development of their sexual desires postponed to occur in their late 20s (if they are not exposed to any avenues of sexual expression)? This question arose as a consequence of a short discussion with some friends on an article that appeared in SciAm: about if (heterosexual) men and women could stay “just friends”. To paraphrase the principal question in the context of the SciAm-featured “study”:

  1. Would you agree that the statistical implications of gender-sensitive studies will vary from region to region simply because the reasons on the basis of which such relationships can be established vary from one socio-political context to another?
  2. Assuming you have agreed to the first question: Would you contend that the underlying biological imperatives can, someday, be overridden altogether in favor of holding up cultural paradigms (or vice versa)?

Is such a thing even possible? (To be clear: I’m not looking for hypotheses and conjectures; if you can link me to papers that support your point of view, that’d be great.)

The weakening measurement

Unlike the special theory of relativity that the superluminal-neutrinos fiasco sought to defy, Heisenberg’s uncertainty principle presents very few, and equally iffy, measurement techniques to stand verified. While both Einstein’s and Heisenberg’s foundations are close to fundamental truths, the uncertainty principle has more guided than dictated applications that involved its consequences. Essentially, a defiance of Heisenberg is one for the statisticians.

And I’m pessimistic. Let’s face it, who wouldn’t be?

Anyway, the parameters involved in the experiment were:

  1. The particles being measured
  2. Weak measurement
  3. The apparatus

The experimenters claim that a value of the photon’s original polarization, X, was obtained upon a weak measurement. Then, a “stronger” measurement was made, yielding a value A. However, according to Heisenberg’s principle, the observation should have changed the polarization from A to some fixed value A’.

Now, the conclusions they drew:

  1. Obtaining X did not change A: X = A
  2. A’ – A < Limits set by Heisenberg

The terms of the weak measurement are understood with the following formula in mind:

(The bra-ket, or Dirac, notation signifies the dot-product between two vectors or vector-states.)

Here, φ(1,2) denote the pre- and post-selected states, A-hat the observable system, and Aw the value of the weak-measurement. Thus, when the pre-selected state tends toward becoming orthogonal to the post-selected state, the value of the weak measurement increases, becoming large, or “strong”, enough to affect the being-measured value of A-hat.

In our case: Aw = A – X; φ(1) = A; φ(2) = A’.

As listed above, the sources of error are:

  1. φ(1,2)
  2. X

To prove that Heisenberg was miserly all along, Aw would have been increased until φ(1) • φ(2) equaled 0 (through multiple runs of the same experiment), and then φ(2) – φ(1), or A’ – A, measured and compared to the different corresponding values of X. After determining the strength of the weak measurement thus, A’ – X can be determined.

I am skeptical because X signifies the extent of coupling between the measuring device and the system being measured, and its standard deviation, in the case of this experiment, is dependent on the standard deviation of A’ – A, which is in turn dependent on X.