Welcome to the home of the LHC Olympics 2020!
Despite an impressive and extensive effort by the LHC collaborations, there is currently no convincing evidence for new particles produced in high-energy collisions. At the same time, there has been a growing interest in machine learning techniques to enhance potential signals using all of the available information.
In the spirit of the first LHC Olympics (circa 2005-2006) [1st, 2nd, 3rd, 4th], we are organizing the 2020 LHC Olympics. Our goal is to ensure that the LHC search program is sufficiently well-rounded to capture “all” rare and complex signals. The final state for this Olympics will be more focused (generic multijet events) but the observable phase space and potential BSM parameter space(s) are large: all hadrons in the event can be used for learning (be it “cuts”, supervised machine learning, or unsupervised machine learning). One class of BSM topology captured by this challenge is illustrated in the following picture.
We provide two types of files (from this Zenodo link):
-
“Monte Carlo Simulation Background”: This is a simulated sample that does not have signal. Be warned that both the physics and the detector modeling for this simulation may not exactly reflect the “Data”.
-
“Data”: These are the LHCO 2020 black boxes. These samples may contain some new signal(s). We will release three black boxes during this challenge. The first one was released on November 19. The second one was released on December 4.
Both the “Simulation” and “Data” have the following event selection: at least one anti-kT R = 1.0 jet with pseudorapidity |η| < 2.5 and transverse momentum pT > 1.2 TeV. For each event, we provide a list of all hadrons (pT, η, φ, pT, η, φ, …) zero-padded up to 700 hadrons.
What you should report:
-
A p-value associated with the dataset having no new particles (null hypothesis).
-
As complete a description of the new physics as possible. For example: the masses and decay modes of all new particles (and uncertainties on those parameters).
-
How many signal events (+uncertainty) are in the dataset (before any selection criteria).
Partial submissions in only a subset of the categories are welcome! You can submit your findings at this Google form. Outcomes will be judged based on the accuracy of the new physics characterization. For accuracy, we will use the # of sigmas |(your answer - right answer) / your uncertainty| from the right answer wherever applicable.
For setting up, developing, and validating your methods, we provide background events and a benchmark signal model. You can download these from this page. To help get you started, we have also prepared simple python scripts to read in the data and do some basic processing. The page describing the R&D phase of the challenge can be found here.
Please do not hesitate to ask questions: we will use the ML4Jets slack channel to discuss technical questions related to this challenge. You are also encouraged to sign up for the mailing list lhc-olympics@cern.ch using the e-groups.cern.ch interface for infrequent announcements and communications.
Good luck!
Gregor Kasieczka, Ben Nachman, and David Shih
Workshops
Winter Olympics
The deadline for the Winter Olympics (Black Box 1) challenge was Sunday, January 12, 2020 at 5pm Eastern US Time. Results were presented in a dedicated session at the ML4Jets2020 conference.
See the outcome of the Winter Olympics here.
Summer Olympics
Black boxes 2 and 3 will be opened at an event originally scheduled to be hosted in Hamburg in July 2020. However, given the situation with COVID-19, this event was virtual.
Publications
We strongly encourage you to publish your original research methods using these datasets. We are currently compiling a community comparison / summary paper - please contact the organizers for details (anyone who participated in the Olympics has been invited to contribute). Here are papers published with the LHCO dataset. Please send links to your papers if you have used this dataset! Many more preliminary studies can be found in workshops listed above.
N.B. this list is no longer updated and is incomplete; please check Inspire on the LHCO paper for a complete list
-
Anomalous Jet Identification via Sequence Modeling, Alan Kahn, Julia Gonski, Ines Ochoa, Daniel Williams, and Gustaaf Brooijmansa, hep-ph/2105.09274
-
Comparing Weak- and Unsupervised Methods for Resonant Anomaly Detection, Jack H. Collins, Pablo Martin-Ramiro, Benjamin Nachman, David Shih, hep-ph/2104.02092
-
Bump Hunting in Latent Space, B. Bortolato et al., hep-ph/2103.06595
-
The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics, G. Kasieczka et al., hep-ph/2101.08320
-
QUAK: Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge, Sang Eon Park, Dylan Rankin, Silviu-Marian Udrescu, Mikaeel Yunus, Philip Harris, hep-ph/2011.03550
-
UCluster: Unsupervised clustering for collider physics, Vinicius Mikuni and Florencia Canelli, hep-ph/2010.07106
-
Simulation-Assisted Decorrelation for Resonant Anomaly Detection, Kees Benkendorfer, Luc Le Pottier, and Benjamin Nachman, hep-ph/2009.02205
-
Tag N’ Train: A Technique to Train Improved Classifiers on Unlabeled Data, Oz Amram and Cristina Mantilla Suarez, hep-ph/2002.123760
-
Simulation Assisted Likelihood-free Anomaly Detection, Anders Andreassen, Benjamin Nachman, David Shih, hep-ph/2001.05001
-
Anomaly Detection with Density Estimation, Benjamin Nachman, David Shih, hep-ph/2001.04990