### Quark and Gluon Jets

A dataset consisting of up to 2 million total quark and gluon jets generated with PYTHIA 8.226. To avoid downloading unnecessary samples, the dataset is contained in twenty files with 100k jets each, and only the required files are downloaded. These samples are used in 1810.05165. Splitting the data into 1.6M/200k/200k train/validation/test sets is recommended for standardized comparisons.

The dataset qg_jets consists of two components:

• X : a three-dimensional numpy array of the jets with shape (num_data, max_num_particles, 4)
• y : a numpy array of quark/gluon jet labels (quark=1 and gluon=0).

The jets are padded with zero-particles in order to make a contiguous array. The particles are given as (pt,y,phi,pid) values, where pid is the particle's PDG id.

The samples are $Z(\to\nu\bar\nu)+g$ and $Z(\to\nu\bar\nu)+(u,d,s)$ events generated with PYTHIA for $pp$ collisions at $\sqrt{s}=14$ TeV using the WeakBosonAndParton:qqbar2gmZg and WeakBosonAndParton:qg2gmZq processes, ignoring the photon contribution and requiring the $Z$ to decay invisibly to neutrinos. Hadronization and multiple parton interactions (i.e. underlying event) are turned on and the default tunings and shower parameters are used. Final state non-neutrino particles are clustered into $R=0.4$ anti-$k_T$ jets using FASTJET 3.3.0. Jets with transverse momentum $p_T\in[500,550]$ GeV and rapidity $|y|<2.0$ are kept. Particles are ensured to have $\phi$ values within $\pi$ of the jet (i.e. no $\phi$-periodicity issues). No detector simulation is performed.

energyflow.datasets.qg_jets.load(num_data=100000, pad=True, cache_dir=None)


Loads samples from the dataset (which in total is contained in twenty files). Any file that is needed that has not been cached will be automatically downloaded. Downloading a file causes it to be cached for later use. Basic checksums are performed.

Arguments

• num_data : int
• The number of events to return. A value of -1 means read in all events.
• Whether to pad the events with zeros to make them the same length.
• cache_dir : str
• The directory where to store/look for the file.

Returns

• 3-d numpy.ndarray, 1-d numpy.ndarray
• The X and y components of the dataset as specified above.

### Quark and Gluon Nsubs

A dataset consisting of 45 $N$-subjettiness observables for 100k quark and gluon jets generated with Pythia 8.230. Following 1704.08249, the observables are in the following order:

The dataset contains two members: 'X' which is a numpy array of the nsubs that has shape (100000,45) and 'y' which is a numpy array of quark/gluon labels (quark=1 and gluon=0).

energyflow.datasets.qg_nsubs.load(num_data=-1, cache_dir=None)


• The number of events to return. A value of -1 means read in all events.
• The X and y components of the dataset as specified above.