On this page, you can download our multilabel dataset generator. It is entirely programmed in MATLAB® and the most important technical details are explained in the following technical report:

Multilabel classification is a very active area of research. However, the experimental studies have only available a small collection of benchmark datasets. Moreover, the handy data frequently do not capture the essential features of multilabel classification tasks. In this paper we present a generator of synthetic datasets implemented using a genetic algorithm. The datasets generated are able to reproduce a wide variety of situations. The paper reports a thoroughly experimentation devoted to assess the features of these datasets.

O. Luaces, J. Díez, Juan José del Coz, José Barranquero, and Antonio Bahamonde. (2012) “Synthetic Datasets for Sound Experimental Evaluation of Multilabel Classifiers“. Technical Report.

© ML-GROUP, Artificial Intelligence Center, University of Oviedo at Gijón, 2012