Quark and Gluon Jets

A dataset consisting of up to 2 million total quark and gluon jets generated with PYTHIA 8.226. To avoid downloading unnecessary samples, the dataset is contained in twenty files with 100k jets each, and only the required files are downloaded. These samples are used in 1810.05165. Splitting the data into 1.6M/200k/200k train/validation/test sets is recommended for standardized comparisons.

The dataset qg_jets consists of two components:

  • X : a three-dimensional numpy array of the jets with shape (num_data, max_num_particles, 4)
  • y : a numpy array of quark/gluon jet labels (quark=1 and gluon=0).

The jets are padded with zero-particles in order to make a contiguous array. The particles are given as (pt,y,phi,pid) values, where pid is the particle's PDG id.

The samples are $Z(\to\nu\bar\nu)+g$ and $Z(\to\nu\bar\nu)+(u,d,s)$ events generated with PYTHIA for $pp$ collisions at $\sqrt{s}=14$ TeV using the WeakBosonAndParton:qqbar2gmZg and WeakBosonAndParton:qg2gmZq processes, ignoring the photon contribution and requiring the $Z$ to decay invisibly to neutrinos. Hadronization and multiple parton interactions (i.e. underlying event) are turned on and the default tunings and shower parameters are used. Final state non-neutrino particles are clustered into $R=0.4$ anti-$k_T$ jets using FASTJET 3.3.0. Jets with transverse momentum $p_T\in[500,550]$ GeV and rapidity $|y|<2.0$ are kept. Particles are ensured to have $\phi$ values within $\pi$ of the jet (i.e. no $\phi$-periodicity issues). No detector simulation is performed.


load

energyflow.datasets.qg_jets.load(num_data=100000, cache_dir=None)

Loads samples from the dataset (which in total is contained in twenty files). Any file that is needed that has not been cached will be automatically downloaded. Downloading a file causes it to be cached for later use. Basic checksums are performed.

Arguments

  • num_data : int
    • The number of events to return. A value of -1 means read in all events.
  • cache_dir : str
    • The directory where to store/look for the file.

Returns

  • 3-d numpy.ndarray, 1-d numpy.ndarray
    • The X and y components of the dataset as specified above.

Quark and Gluon Nsubs

A dataset consisting of 45 $N$-subjettiness observables for 100k quark and gluon jets generated with Pythia 8.230. Following 1704.08249, the observables are in the following order:

The dataset contains two members: 'X' which is a numpy array of the nsubs that has shape (100000,45) and 'y' which is a numpy array of quark/gluon labels (quark=1 and gluon=0).


load

energyflow.datasets.qg_nsubs.load(num_data=-1, cache_dir=None)

Loads the dataset. The first time this is called, it will automatically download the dataset. Future calls will attempt to use the cached dataset prior to redownloading.

Arguments

  • num_data : int
    • The number of events to return. A value of -1 means read in all events.
  • cache_dir : str
    • The directory where to store/look for the file.

Returns

  • 3-d numpy.ndarray, 1-d numpy.ndarray
    • The X and y components of the dataset as specified above.