Friday, 29 November 2013

Binomial Probability

In 2011, Donald Smith from Victoria University asked me a question about computing gambling outcomes using Mathematica. I generated a CDF examining three cases.

Probability of correctly selecting 1 from m from n goes, with vertical lines at the expected average, and the break-even line when paying odds of p:1:


For larger samples the abscissa indicates percentages, which keeps the average and break-even lines in the same place for comparison purposes:


Use RandomVariate to randomly select 10k samples from BinomialDistribution, plotting the results and expected results onto percentage abscissa:

Friday, 15 November 2013

The problem with P-values

Valen Johnson has been getting a lot of press this year for his paper Revised standards for statistical evidence. It was highlighted in Nature as Weak statistical standards implicated in scientific irreproducibility, on the ABC as Stringent statistics make better science, and mentioned (twice) in The Australian: Pharmas 'concerned' at low evidentiary bar.

The paper is well-written, and the mathematical appendix clear and useful. However, my initial reading of the paper indicates that some useful prior work was not cited. In particular, I first became aware of the problem with P-values through the writings of Robert Matthews. His 1998 paper Facts versus Factions: The use and abuse of subjectivity in scientific research—which was published in 2000 in Rethinking Risk and the Precautionary Principle (pages 247-282)—cites Pocock and Spiegelhalter (1992), which is also cited in Bayesian Methods in Clinical Trials by Deborah Ashby (2005).

Facts versus Factions addresses some of the same topics as Johnson and also has a mathematical appendix, which I re-worked as a Mathematica Notebook here. There is also a Bayesian Credibility Analysis online calculator, and a nice overview of this topic in Matthews' Bayesian Critique of Statistics in Health: The Great Health Hoax.

Updates: Nature has just published a very readable news feature by Regina Nuzzo entitled P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.

And my colleague Ron Monson has collected a number of groaning puns that are defining a new growth industry:
  • The problem with p values: how significant are they, really?
  • The Earth is round p<0
  • Give p a chance: significance testing is misunderstood
  • Friends don’t let friends calculate p-values
  • A Dirty Dozen: Twelve P-Value Misconceptions
  • The road to NHST is paved with good intentions
  • On The Surprising Longevity Of Flogged Horses: Why There Is a Case for the Significance Test
  • Replicability in Psychological Science: A Crisis of Confidence?
  • To P or not to P: on the evidential nature of P-values and their place in scientific inference
And  finally, if you worked at a particular UK university this one might have come across your desk as a research proposal (sadly for the academy though, I don't think it ever did make it to a thesis or publication stage):
  • Ménage à Trois Inference Style: Unifying Three Hypothesis Testing Doctrines
As much as one wants to defend the scientific method against pseudo-science, modern medicine is smug in its superiority over chiropractory and alternative medicine, but it is sobering that all medical research is based on using p values as the "gold standard".