The Scientific Process in a Search for Lost Planets

New planet discoveries have been coming in fast these last few years. Yesterday, The Astronomical Journal published another such planet discovery in a project I led with assistance from Dr. Jon Jenkins of the NASA Ames Research Center and my advisor Professor Debra Fischer.

Science is celebrated for its results, but the process of obtaining those results is where all the uncelebrated real work happens. This post gives the nitty gritty, behind-the-scenes details of just how science works, at least in this one case.


Nearly 3,500 exoplanets – planets beyond the solar system – have now been discovered, most of them in the last three years. This exoplanet revolution has been made possible by the Kepler mission, a space telescope launched in 2009. Kepler finds exoplanets by taking very precise measurements of thousands of stars simultaneously for long periods of time. When a planet happens to cross in front of (i.e., transit) one of those stars, it is observed as a dip in the star’s brightness that lasts typically for a few hours. The spacing between these dips tells you the planet’s period, and the size of the dip tells you the planet’s size. In systems with multiple planets, a somewhat odd result is that few systems have planets with long periods (≳100 days), even after you account for the fact that short-period planets are easier to find.

Motivated by these apparently “lost” planets with long periods, my advisor Professor Debra Fischer came up with a hypothesis that the computer algorithm that initially identifies the vast majority of Kepler planets might contribute to why long-period planets are not found. In the Kepler pipeline, after a suspected planet is found, all data points associated with the suspected planet’s transits are removed. This leaves gaps in the data whenever the planet was transiting the star. The pipeline then searches the remaining data for another planet. If another suspected planet is found, this cycle repeats. Therefore, stars that have multiple planets could have lots of data removed, which could result in additional planets being missed. The original purpose of removing these data points was to avoid being overloaded with false positive signals caused by improperly fitting and subtracting out the planet transits.

Our plan was then to take all of the stars with three or more known planets (i.e., stars with a lot of data removed because they have many planet transits), fit each planetary transit, and then carefully subtract out the transit signals rather than completely removing the data associated with those transits. Subsequent searches for planets orbiting that star could then use the entire data set.

The Scientific Process

The final paper was published March 28, 2017. I began this project on January 6, 2016. This one paper has taken nearly 15 months from start to finish. Of course, I wasn’t working on this project the entire time, far from it. Almost every researcher balances several projects at the same time, and this becomes truer and truer as you move up the ladder from undergraduate to graduate to post-doc to professor/research scientist.

Astronomy isn’t all sitting at a telescope like you might think. In my 4.5 years in the Yale Department of Astronomy, I’ve been at a telescope maybe 14 nights total. Instead, it’s all about reducing and analyzing the subsequent data. Data is rarely born clean, and Kepler data is no exception. Many stars vary in brightness over timescales of minutes to days. This natural variability needed to be removed to search for planet transits. Removing this variability was a long and tedious process that took weeks to finish. Each of the 114 stars has about 15 quarters of data. Each quarter is 90 days long and needed to be individually tailored to remove stellar variability and then double-checked visually to see if the code did so properly. That’s ~1,800 individual files with a total of about 7.5 million data points covering 400 years of data.

All of that effort was just to get to the point where we could actually start the real project: fitting the data with and subtracting out the known planet signals rather than removing the associated data altogether. There were over 400 planets that needed to be fit. Each planet took anywhere from a few minutes to a couple of hours to fit, so although I could run a few planet fits at the same time, this was a drawn-out process that took several weeks. Throughout, I had to manually check many of the thousands of transits to check that they were properly fit, although I decided that checking all transits would be too time-consuming and not worth my time.

Those first two steps took about four months of staring at a computer screen, but I now had a clean set of data that was ready to be searched for new planets. Then I hit a new problem. I didn’t have a code that could quickly and adequately search the data for new planets. Rather than reinventing the wheel, this is where we brought in Dr. Jon Jenkins, who is one of the experts who helped designed the Kepler pipeline to search for planets. He agreed to join the project. However, he was busy wrapping up Kepler-related projects. This effectively put the project on hold for a few months.

I used this delay to run a simulation on whether our original hypothesis was even feasible in the first place. Obviously, this is something I would have liked to do before anything else, but when you’re overloaded with work, the things that would you would like to do get ignored in favor of the things that you need to do. Thankfully, I already had a code from a previous project that could easily be jury-rigged to do this simulation (a frequent occurrence in science). Because I had never optimized this code for speed, this would also take a couple of months to run, but it could run in the background with little monitoring. When it finally finished, it revealed what we had already assumed: the removal of data in the pipeline could result in planets being lost in data holes, although it would not happen often.

After the long pause in the project, during which time I had other work to keep me busy, Dr. Jenkins got back to me with the results. After we both took a few days to analyze it, we saw that we had come up empty. Months of work resulted in zero new credible planetary signals. This was incredibly disappointing, but this was also science. The only benefit in my mind was that it would be quick to wrap up the project. Astronomical journals are generally decent at accepting interesting null results, and we had basically just shown that a known potential flaw in the Kepler pipeline seemed to be insignificant.

As a last ditch attempt to get a more interesting result, my advisor Professor Fischer suggested that we put the data on Planet Hunters, a citizen science program designed to let volunteers online find planets in Kepler data by visually searching through the data. I initially scoffed at the idea (in my head). After the negative result, I was ready to move on quickly, and showing the data on Planet Hunters would extend this project for a few more months. I didn’t want to wait. We compromised, and I decided to take on the extremely tedious task of manually searching through all 400 years of data myself. Since the data was already cleaned, it was surprisingly quick. I finished it in two days over winter break (although it was approximately 15 hours each day).

It was in these two days that I discovered the new exoplanet Kepler-150 f. When I first saw it, I literally stopped breathing for a few seconds. Then my scientific instincts kicked in, and I thought, “How can I disprove this?” Over the next few days, I did everything I could to try to disprove the idea that this was a planet, but it held up. After being pushed by Dr. Jenkins, I took another week to quantify our confidence in the result. I calculated our discovery to be at a confidence level of >99.998%.

This was one of the long-period planets we had been looking for. The computer algorithm had missed it because Kepler-150 f only transited twice, while the algorithm requires three transits to protect against finding too many false positive. We had discovered a new exoplanet, and it was thrilling.

Final Thoughts

The thrill of the discovery is an amazing feeling, but it came on the back of months and months of often very tedious work. For some scientists, these thrills outweigh the hard, tedious work that goes into making them. For others, it doesn’t. Undergraduate research can give you a taste of both, but research is secondary to your classes. In graduate school, this flips. Research comes first; classwork is second. Your research becomes the focus of your life. In choosing to go into a career in science, it’s very important to know that the vast majority of your time will not be making new discoveries. Instead it’s a lot of difficult work occasionally punctuated by a cool new discovery. Whether the thrill outweighs the work is a personal decision that every student scientist has to make for themselves.

Illustration by Michael S. Helfenbein