About me

I am a PhD student and a graduate research assistant in the College of Information and Computer Sciences at the University of Massachusetts Amherst under the supervision of Subhransu Maji in the Computer Vision Lab. Previously, I spent three years as a graduate research assistant under the supervision of Pablo Arbeláez in the Biomedical Computer Vision Group at the Universidad de Los Andes. My interests are in Pattern Recognition in Computer Vision using Artificial Intelligence.


  • 12/15/2016 - Received Best project of the faculty award at Encuentro de Experiencias en Investigación en Ingeniería (EEII). Universidad de los Andes.
  • 01/18/2016 - Started a Ms in Biomedical Engineering at Universidad de los Andes.


Lung Nodule Malignancy Prediction in Sequential CT Scans: Summary of ISBI 2018 Challenge
Yoganand Balagurunathan, Andrew Beers, Michael McNitt-Gray, Lubomir Hadjiiski, Sandy Napel, Dmitry Goldgof, Gustavo Perez, Pablo Arbelaez, et al.
IEEE Transactions on Medical Imaging, 2021.

Lung cancer is by far the leading cause of cancer death in the US. Recent studies have demonstrated the effectiveness of screening using low dose CT (LDCT) in reducing lung cancer related mortality. While lung nodules are detected with a high rate of sensitivity, this exam has a low specificity rate and it is still difficult to separate benign and malignant lesions. The ISBI 2018 Lung Nodule Malignancy Prediction Challenge, developed by a team from the Quantitative Imaging Network of the National Cancer Institute, was focused on the prediction of lung nodule malignancy from two sequential LDCT screening exams using automated (non-manual) algorithms. We curated a cohort of 100 subjects who participated in the National Lung Screening Trial and had established pathological diagnoses. Data from 30 subjects were randomly selected for training and the remaining was used for testing. Participants were evaluated based on the area under the receiver operating characteristic curve (AUC) of nodule-wise malignancy scores generated by their algorithms on the test set. The challenge had 17 participants, with 11 teams submitting reports with method description, mandated by the challenge rules. Participants used quantitative methods, resulting in a reporting test AUC ranging from 0.698 to 0.913. The top five contestants used deep learning approaches, reporting an AUC between 0.87 -0.91. The team’s predictor did not achieve significant differences from each other nor from a volume change estimate (p=.05 with Bonferroni-Holm’s correction).

IEEE | BibTex

StarcNet: Machine Learning for Star Cluster Classification
Gustavo Perez, Matteo Messa, Daniela Calzetti, Subhransu maji, Dooseok Jung, Angela Adamo, Mattia Sirressi
The Astrophysical Journal (ApJ), 2021.

We present a machine learning (ML) pipeline to identify star clusters in the multi{color images of nearby galaxies, from observations obtained with the Hubble Space Telescope as part of the Treasury Project LEGUS (Legacy ExtraGalactic Ultraviolet Survey). StarcNet (STAR Cluster classification NETwork) is a multi{scale convolutional neural network (CNN) which achieves an accuracy of 68.6% (4 classes)/86.0% (2 classes: cluster/non{cluster) for star cluster classification in the images of the LEGUS galaxies, nearly matching human expert performance. We test the performance of StarcNet by applying pre{trained CNN model to galaxies not included in the training set, finding accuracies similar to the reference one. We test the effect of StarcNet predictions on the inferred cluster properties by comparing multi{color luminosity functions and mass{age plots from catalogs produced by StarcNet and by human{labeling; distributions in luminosity, color, and physical characteristics of star clusters are similar for the human and ML classified samples. There are two advantages to the ML approach: (1) reproducibility of the classifications: the ML algorithm's biases are fixed and can be measured for subsequent analysis; and (2) speed of classification: the algorithm requires minutes for tasks that humans require weeks to months to perform. By achieving comparable accuracy to human classifiers, StarcNet will enable extending classifications to a larger number of candidate samples than currently available, thus increasing significantly the statistics for cluster studies.

ApJ | project page | preprint | BibTex

Automated lung cancer diagnosis using three-dimensional convolutional neural networks
Gustavo Perez, Pablo Arbelaez
Medical & Biological Engineering & Computing, 2020.

Lung cancer is the deadliest cancer worldwide. It has been shown that early detection using low-dose computer tomography (LDCT) scans can reduce deaths caused by this disease. We present a general framework for the detection of lung cancer in chest LDCT images. Our method consists of a nodule detector trained on the LIDC-IDRI dataset followed by a cancer predictor trained on the Kaggle DSB 2017 dataset and evaluated on the IEEE International Symposium on Biomedical Imaging (ISBI) 2018 Lung Nodule Malignancy Prediction test set. Our candidate extraction approach is effective to produce accurate candidates with a recall of 99.6%. In addition, our false positive reduction stage classifies successfully the candidates and increases precision by a factor of 2000. Our cancer predictor obtained a ROC AUC of 0.913 and was ranked 1st place at the ISBI 2018 Lung Nodule Malignancy Prediction challenge.

pdf | project page | springer | BibTex

Finding Four-Leaf Clovers: A Benchmark for Fine-Grained Object Localization
Laura Bravo*, Alejandro Pardo*, Gustavo Perez*, Pablo Arbelaez
The Sixth Workshop on Fine-Grained Visual Categorization (FGVC6), CVPR 2019.

We present the Four-Leaf Clover (FLC) dataset, a new experimental framework for studying fine-grained object localization problems. We built the FLC dataset with the contribution of trained hobbyists, who were assigned the task of spotting four-leaf clovers on a fixed geographical extension over two clover seasons, one season for the train set and another for the test set. We then annotated each object instance for the tasks of object detection, semantic segmentation, instance segmentation, object parsing and semantic boundary detection. Our dataset is composed of more than 100,000 images, containing 2,151 carefully annotated clover instances of four, five or six leaves. The FLC dataset is extremely challenging and adapted to fine-grained object localization problems due to its small inter-class variance and its very large intra-class variation. We perform extensive experiments with state-of-the-art methods in order to establish strong baselines for each of the tasks.

pdf | poster | project page | dataset | BibTex

Automated Detection of Lung Nodules with Three-dimensional Convolutional Neural Networks
Gustavo Perez, Pablo Arbelaez →(oral)
13th International Conference on Medical Information Processing and Analysis, 2017.

Lung cancer is the cancer type with highest mortality rate worldwide. It has been shown that early detection with computer tomography (CT) scans can reduce deaths caused by this disease. Manual detection of cancer nodules is costly and time-consuming. We present a general framework for the detection of nodules in lung CT images. Our method consists of the pre-processing of a patient’s CT with filtering and lung extraction from the entire volume using a previously calculated mask for each patient. From the extracted lungs, we perform a candidate generation stage using morphological operations, followed by the training of a three-dimensional convolutional neural network for feature representation and classification of extracted candidates for false positive reduction. We perform experiments on the publicly available LIDC-IDRI dataset. Our candidate extraction approach is effective to produce precise candidates with a recall of 99.6%. In addition, false positive reduction stage manages to successfully classify candidates and increases precision by a factor of 7.000.

pdf | project page | spie | BibTex