Exoplanet Hunting

The math behind the magic

Project Introduction

Our goal for this product is to develop a model that can classify the luminosity data of different stars based on their likelihood of having exoplanets.

Detecting exoplanets: Doppler effect

The Doppler effect occurs when an object is moving past you at sufficiently high speeds to “stretch” the wavelength of light.

Detecting exoplanets: Spectroscopy

Spectroscopy is the process of recording the wavelengths of light that hit an instrument over a period of time and using the recorded data to conclude a particular object.

Transit Photometry

Transit Photometry involves the process of measuring & recording the luminosity of a star over a significant period of time.

We then compare the luminosity readings of a star without any known exoplanets to those of another star we observe. If they differ significantly, we have evidence that the star we are observing is likely to have an exoplanet.

Our Analysis

One way to compare luminosity data is to examine their periodicity. Suppose we zoom in on a data set. In that case, we can use an algorithm to approximate the period by segmenting and “looping” the data back on itself with various period lengths to find the length that best aligns with our data.

How Models are Trained

AI models in reality are largely calculus and linear algebra.

Different AI Classifier Models:
MLP & CNN

We tried several neural network model types and settled on using CNN, a feedforward convolution neural network.

Terminology

We had to learn a few new terms to describe certain things for ML training.

Application of models to data (KNN)

These are the results of our exoplanet predictions when we use a KNN model for our classifier. We want all of our results to be in the 0-0 and 1-1 squares. Based on our results, this model is unsatisfactory.

Application of models to data (Logistic Regression)

These are the results of our exoplanet predictions when we use a Logistic Regression model for our classifier. Due to the large amounts of false positives, this model is also unsatisfactory.

Augmented Logistic Regression

These are the results of our predictions for exoplanets when we use an Augmented Logistic Regression model for our classifier. This model “augments” the data to make it more distinctive. At a glance, it performs significantly better in training data; however, the test data shows that our model is still unsatisfactory.

Multi-Layer Perceptron

These are the results of our exoplanet predictions when we use an MLP model for our classifier. This model was trained for 20 epochs. Immediately, it performed significantly better than previous models, with minimal false positives and false negatives. It reached an accuracy of 99.5%, which is very high. However, we can do better in the test data, as it is comparably poor at detecting exoplanet stars.

CNN Model

These are the results of our exoplanet predictions when we use an optimized CNN model for our classifier. This model achieved the highest accuracy in both test and training data, 99.83% in training data and 99.3% in testing data, making only one false negative.

Credits to my team: Hugo, Felix, Michael and Luis