home    publications    research

Star Cluster Classification from Nearby Galaxies

The Hubble Space Telescope (HST), the recently launched James Web Space Telescope (JWST), and many earth-based observatories collect data allowing astronomers to answer fundamental questions about the Universe. In this work we focus on an ecosystem of AI tools for cataloging bright sources within galaxies, and use them to analyze young star clusters -- groups of stars held together by their gravitational fields. Their ages and masses, among other properties provide insights into the process of star formation and the birth and evolution of galaxies. Significant domain expertise and resources are required to discriminate star clusters among tens of thousands of sources that may be extracted for each galaxy. To accelerate this step we propose: 1) a web-based annotation tool to label and visualize high-resolution astronomy data, encouraging efficient labeling and consensus building; and 2) techniques to reduce the annotation cost by leveraging recent advances in unsupervised representation learning on images.

Fig. 1 (a) Example of a young cluster inside an HII region. The cluster is shown in UV + optical bands (top), Emission lines (middle), and low-resolution near-IR (bottom). (b) Region with high HII concentration to sample sources for annotation. (c) Crops of our labeling tool showing SExtractor sources (yellow circle markers), sources also in the final LEGUS catalog for NGC 4449 (blue circle marker), and sources sampled for annotation (white large circle marker).


Fig. 4. The proposed tool for cataloging sources within galaxies. (Top) An AI model assists data labeling by using the few labels provided initially to guide further labeling. (Bottom) A web-based user interface to label sources. The UI allows annotators to zoom into different regions dynamically and can be customized to support different labels (e.g., center coordinates, bounding boxes, irregular polygons) and spectral measurements as layers. In this figure we show a customized version of the UI for IR images showing annotations overlaid on HST observations of galaxy NGC 4449.

Annotation tool demo

Cite this paper

Github code repository  

Fig. 2 The StarcNet pipeline. Graphic sketch of the machine learning pipeline used in this work to classify cluster candidates in the LEGUS images. (Left): The Hubble Space Telescope images as processed by the LEGUS project through a custom pipeline to generate automatic catalogs of cluster candidates, which are part of the public LEGUS catalogs release (Calzetti et al. 2015; Adamo et al. 2017); we apply StarcNet to the LEGUS catalogs and images. (Center–Left): The region surrounding each candidate is selected from the 5 band images at three magnifications, and is used as input to our multi-scale StarcNet. (Center–Right and Right): Each of the three pathways of the CNN consists of 7 convolutional layers, which are later connected to produce a prediction for the candidate in one of four classes.

We developed StarcNet, a multiscale CNN, with the goal of morphologically classifying stellar clusters in nearby galaxies. StarcNet aims at speeding up by orders of magnitude the process of visual cluster classification, which currently is the single most important limitation to securing large catalogs for studies of these sources. Availability of reliable and fast ways to classify star clusters will become even more critical with the advent of extremely large surveys, such as those that will be produced by the Vera Rubin Observatory and the Nancy Roman Space Telescope.

NSF Award #1815267

Cite this paper

Fig. 3 An HST false-color image (courtesy of ESA/Hubble) of M101 is shown overlaid with the combined HST WFC3+ACS field of view composed of five pointings extending from the SE to the NW of the galaxy’s disk. Our coverage allows us to probe distinctly different star-forming environments between the central regions and the outer disk. In total 39,705 sources are detected across these five fields which satisfy this cut with detections in at least two broadband filters which are ultimately extracted and included in our initial photometric catalog.

For the 2,351 sources with both a visual and ML classification StarcNet is able to reproduce the human classifications at high levels of accuracy (∼80%–90%) for binary classification (cluster versus non-cluster), which is equivalent to the level of agreement between human classifiers in LEGUS. In particular, StarcNet appears to be able to recover several faint Class 1 objects that are misclassified as Class 4 in the human-generated catalog. By comparing the magnitude distributions for each catalog using both classification methods we find that indeed the vast majority of Class 1 and 2 clusters added in the StarcNet catalog relative to the human-classified catalog is a population of faint sources that were mostly missed by the limitations of visual inspection.

Cite this paper