The Higgs Boson and 5 sigma
In the summer of 2012, ATLAS and CMS collaborations at CERNâs Large Hadron Collider announced the confirmation of the Higgs Boson at 126 GeV. Instantly recognized by many as the greatest scientific achievement of the 21st century so far, this immediately preceded the 2013 Nobel Prize in physics being awarded to Peter Higgs and François Englert. When describing the significance associated with this discovery, physicists often use the term â5 sigmaâ. But what does this mean? And how can we be so sure about the existence of a particle with a mean lifetime of 1.6Ã10^(-22) seconds? The answer lies in statistics.
So what is the Higgs Boson?
There exists a Standard Model of particle physics, which is the closest science has come to inventing a grand unifying theory of all matter and physical phenomena. The Standard Model offers predictions for all decays and interactions between fundamental particles.
In this theory, fundamental forces like electricity and magnetism are âcarriedâ by particles called bosons. First theorized in 1964, the Higgs Boson is the boson which is responsible for the gravitational force, and until 2012 was the final boson in the Standard Model that had yet to be proven empirically. Which brings us to the crux of the issue:
How do you empirically prove something that is incredibly rare and nearly impossible to detect?
The experimental setup of the LHC is staggering in size, and requires thousands of scientists working together to probe the fundamental nature of particles and their interactions. In simple terms, it is a gigantic atom smasher. Protons are accelerated in opposite directions around a 27km ring, until they are collided together at specific locations where enormous âdetectorsâ are situated, ready to collect the bits and pieces left in the aftermath. In the instants after collision, called an event, these particles decay rapidly, interacting with each other and getting all jumbled up. Scientists have the task of piecing together what decays took place, tracing trajectories as close as possible to the event. The Higgs occurs once for every 10 billion events, and its signal is often obscured by background.
How Was it Discovered?
Statistical hypothesis testing is a way to determine whether results of a study are meaningful. In a statistical experiment, at CERN or anywhere else, researchers start with a null hypothesis. This is what is accepted to be true, or the norm. The study must have a predetermined significance level α, which is the probability of the study rejecting the null hypothesis, given that the null is true. Different fields establish a different α, for example in many other studies, α = .05 or .01 is sufficient. In order for data that goes against the hypothesis must be deemed statistically significant, its p-value (probability of being from chance, given that the null hypothesis is true) must be less than α. Only when this condition is met can a researcher reject the null hypothesis, and conclude that the data did not occur due to chance.
âThe significance of an excess is quantified by the probability (p0) that a background-only experiment is more signal-like than that observed.â
In the case the search for the Higgs, the null hypothesis was that a particle like the Higgs did not exist at 126 GeV. More precisely, it was that the background model sufficiently describes the physics occurring at that energy level. When researchers at CERN counted the number of events recorded at different energies for a given decay channel where the Higgs is produced, they were able to record an excess located around the rest mass of the Higgs. The chosen significance value of α corresponds to a p-value of 0.00003% ( 1 in 3,500,000). This is extraordinarily small, and with good reason.
The small bump on the plot above may seem unimportant, but it represents the culmination of over 40 years of physics research. The baseline data above is the standard model prediction, and the data points represent measured event counts. The deviation from expected values is large enough to reject the null hypothesis that there is not a Higgs with 126 GeV mass.
If you thought that this finding was already very statistically rigorous, just wait: theres more! CERN has a separate detector running the same experiments, collecting completely separate data in order to check the validity of the other experiment. Between ATLAS and CMS, tens of petabytes (thats 1000 TB) of raw event data has been collected and cross-referenced. This vast quantity of data is what allows CERN to conclusively state that the Higgs exists.
When considering the p-value associated with this significance level, it can be difficult to visualize the distribution of the data being discussed. This blog post is particularly interesting, and discusses a mistake made by Nature when they originally reported the Higgs discovery. They incorrectly stated that the significance level corresponds to a p-value of 0.00006% (1 in 1,750,000). This assumes a one-tailed test, rather than the correct 2-tailed one. Just goes to show that even the pros get it wrong sometimes, and statistics is less straight-forward that we think.
I would like to credit David A. van Dyk for his insightful statistical review of methods employed by scientists at CERN, which goes into far more depth, and is some awesome further reading.
https://wwwf.imperial.ac.uk/~dvandyk/Research/14-reviews-higgs.pdf