View on GitHub

LHC Olympics 2020

Welcome to the home of the LHC Olympics 2020!


Despite an impressive and extensive effort by the LHC collaborations, there is currently no convincing evidence for new particles produced in high-energy collisions.  At the same time, there has been a growing interest in machine learning techniques to enhance potential signals using all of the available information.  

In the spirit of the first LHC Olympics (circa 2005-2006) [1st, 2nd, 3rd, 4th], we are organizing the 2020 LHC Olympics.  Our goal is to ensure that the LHC search program is sufficiently well-rounded to capture “all” rare and complex signals.  The final state for this Olympics will be more focused (generic multijet events) but the observable phase space and potential BSM parameter space(s) are large: all hadrons in the event can be used for learning (be it “cuts”, supervised machine learning, or unsupervised machine learning). One class of BSM topology captured by this challenge is illustrated in the following picture.


We provide two types of files (from this Zenodo link):

Both the “Simulation” and “Data” have the following event selection: at least one anti-kT R = 1.0 jet with pseudorapidity |η| < 2.5 and transverse momentum pT > 1.2 TeV.   For each event, we provide a list of all hadrons (pT, η, φ, pT, η, φ, …) zero-padded up to 700 hadrons.

What you should report:

  1. A p-value associated with the dataset having no new particles (null hypothesis).

  2. As complete a description of the new physics as possible. For example: the masses and decay modes of all new particles (and uncertainties on those parameters).

  3. How many signal events (+uncertainty) are in the dataset (before any selection criteria).

Partial submissions in only a subset of the categories are welcome! You can submit your findings at this Google form.  Outcomes will be judged based on the accuracy of the new physics characterization. For accuracy, we will use the # of sigmas |(your answer - right answer) / your uncertainty| from the right answer wherever applicable.

For setting up, developing, and validating your methods, we provide background events and a benchmark signal model.  You can download these from this page.  To help get you started, we have also prepared simple python scripts to read in the data and do some basic processing.   The page describing the R&D phase of the challenge can be found here.

Please do not hesitate to ask questions: we will use the ML4Jets slack channel to discuss technical questions related to this challenge. You are also encouraged to sign up for the mailing list using the interface for infrequent announcements and communications.

Good luck!

Gregor Kasieczka, Ben Nachman, and David Shih


Winter Olympics

The deadline for the Winter Olympics (Black Box 1) challenge was Sunday, January 12, 2020 at 5pm Eastern US Time. Results were presented in a dedicated session at the ML4Jets2020 conference

See the outcome of the Winter Olympics here.

Summer Olympics

Black boxes 2 and 3 will be opened at an event originally scheduled to be hosted in Hamburg in July 2020. However, given the situation with COVID-19, this event will be virtual. We will announce the details soon.


We strongly encourage you to publish your original research methods using these datasets. We envision making a community comparison / summary paper at some point. Here are papers published with the LHCO dataset. Please send links to your papers if you have used this dataset! Many more preliminary studies can be found in workshops listed above.