home    publications    projects

StarcNet: Machine Learning for Star Cluster Classification

Gustavo Perez, Matteo Messa, Daniela Calzetti, Subhransu Maji, Dooseok Jung, Angela Adamo, Mattia Sirressi

Github code repository  

We present a machine learning (ML) pipeline to identify star clusters in the multi{color images of nearby galaxies, from observations obtained with the Hubble Space Telescope as part of the Treasury Project LEGUS (Legacy ExtraGalactic Ultraviolet Survey). StarcNet (STAR Cluster classification NETwork) is a multi{scale convolutional neural network (CNN) which achieves an accuracy of 68.6% (4 classes)/86.0% (2 classes: cluster/non{cluster) for star cluster classification in the images of the LEGUS galaxies, nearly matching human expert performance. We test the performance of StarcNet by applying pre{trained CNN model to galaxies not included in the training set, finding accuracies similar to the reference one. We test the effect of StarcNet predictions on the inferred cluster properties by comparing multi{color luminosity functions and mass{age plots from catalogs produced by StarcNet and by human{labeling; distributions in luminosity, color, and physical characteristics of star clusters are similar for the human and ML classified samples. There are two advantages to the ML approach: (1) reproducibility of the classifications: the ML algorithm's biases are fixed and can be measured for subsequent analysis; and (2) speed of classification: the algorithm requires minutes for tasks that humans require weeks to months to perform. By achieving comparable accuracy to human classifiers, StarcNet will enable extending classifications to a larger number of candidate samples than currently available, thus increasing significantly the statistics for cluster studies.

Fig. 1 The StarcNet pipeline. Graphic sketch of the machine learning pipeline used in this work to classify cluster candidates in the LEGUS images. (Left): The Hubble Space Telescope images as processed by the LEGUS project through a custom pipeline to generate automatic catalogs of cluster candidates, which are part of the public LEGUS catalogs release (Calzetti et al. 2015; Adamo et al. 2017); we apply StarcNet to the LEGUS catalogs and images. (Center–Left): The region surrounding each candidate is selected from the 5 band images at three magnifications, and is used as input to our multi-scale StarcNet. (Center–Right and Right): Each of the three pathways of the CNN consists of 7 convolutional layers, which are later connected to produce a prediction for the candidate in one of four classes.

Results

Fig. 2 Performance of StarcNet on the test set. Confusion matrix normalized over the classes in test set of the LEGUS dataset (20% of the total sources or about 3000 objects). The rows show the distribution of the human–classified sources, while the columns are the predictions of StarcNet. Parenthesis in the confusion matrix refers to the unnormalized values. (Left) Overall accuracy evaluated for 4 class classification using raw bands as input. (Middle) Results calculated with 2 classes (cluster/non-cluster classification). (Right) Precision-recall curves for each of the four classes, as well as for binary classification. The overall accuracy is 68.6% with 4 classes and 86.0% with binary classification.



Publications

StarcNet: Machine Learning for Star Cluster Classification
Gustavo Perez, Matteo Messa, Daniela Calzetti, Subhransu maji, Dooseok Jung, Angela Adamo, Mattia Sirressi
The Astrophysical Journal (ApJ), 2021.
ApJ · project page · preprint · BibTex