Piloting a Data Science Challenge

By Sergio Marconi

To study biodiversity using data inferred from the sky, we need to know how to “read” this information. In short, despite scaling-up ecological patterns and processes that are crucial to understanding the effects of environmental change on natural systems and human society, we are still doing a bad job.

Task 1: Segmentation

Collaborative data analysis challenges (a.k.a. competitions) have proven highly effective in other fields for quickly improving methods for converting image data to useful information. With this perspective, we are piloting a Data Science Challenge where multiple groups attempt to use the same remote sensing data from low flying airplanes to infer the location and type of trees in forests. This will allow forests to be studied in detail at much larger scales than is

Task 2: Alignment

currently possible.


There are three sets of tasks: 1) Segmentation: Identifying individual trees in remote sensing images; 2) Alignment: Aligning ground data with remote sensing data; and 3) Classification: Classifying trees into species.

Task 3: Classification

Teams (or individuals) can participate in all of them or just pick the tasks they are most interested in. Tasks 2 and 3 can be accomplished using just tabular data. Task 1 requires working directly with spatial data. Details of the different tasks and links to the data are available at the challenge website: https://www.ecodse.org.

We plan to write a general paper about the competition, the data, and the performance of the different methods used. Individual participants will be invited to write and publish associated short papers on the methods they used (so, why don’t you participate yourself?) and results they produced. We already have a journal that has agreed to publish all of these related contributions together into a collection (pending review of course).

The challenge is already open and the deadline for submissions is December 15th. Once you sign up on the website you will receive an email with some additional details. If you have any questions feel free to respond to that email or checkout the FAQ to see if they have already been answered.

This challenge is sponsored by the National Institute of Standards Technology as part of it’s Data Science Evaluation series and is also partially supported by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through grant GBMF4563. It uses data from the National Ecological Observatory Network in addition to data collected by the organizers. The challenge is being organized by the Data Science Research lab, the Weecology lab, and Stephanie Bohlman’s lab all at the University of Florida.