Citation: Ha NT, Manley-Harris M, Pham TD, Hawes I. A Comparative Assessment of Ensemble-Based Machine Learning and Maximum Likelihood Methods for Mapping Seagrass Using Sentinel-2 Imagery in Tauranga Harbor, New Zealand. Remote Sensing [Internet]. MDPI AG; 2020 Jan 21;12(3):355. Available from: http://dx.doi.org/10.3390/rs12030355
What do remote sensing, machine learning, and statistics have in common? Enhancing the accuracy of seagrass monitoring, for one.
Seagrass meadows are highly productive blue carbon ecosystems found in shallow, salty and brackish waters across the world, from the tropics to the Arctic Circle. Seagrasses belong to a group of plants known as monocotyledons, which include lilies, grasses, and palms; like their relatives, seagrasses have roots, leaves, and veins, and produce seeds and flowers. Now if you weren’t already amazed, here is the even more amazing part: they evolved around 100 million years ago and yet remain the most understudied of ecosystems, with their total geographic extent largely unknown due to challenges in mapping, globally (Hastings et al. 2020). Seagrass can form dense underwater meadows, many of which are large enough to be seen from space. Despite being one of the most productive ecosystems in the world, seagrasses do not receive enough research, funding, or attention.
Seagrasses have been acknowledged as one of the most productive ‘blue carbon’ (the potential for carbon to be stored in coastal and marine sediments) ecosystems, and are in significant decline across most of the globe. One of the first steps towards conservation is to map and monitor existing seagrass meadows. A number of different methods are used for this task, including statistical modeling and geospatial analyses. But what about using the power of machine learning to convert satellite images into seagrass meadow maps? This technology has been used in other conservation efforts, but is still uncommon in seagrass work. In a recent study, authors explore the possibility of bringing machine learning into the world of seagrass conservation.
Therefore, Dr. Ha and team aimed to develop a novel approach for seagrass monitoring, using state-of-the-art machine learning, along with data from Sentinel-2 imagery. Sentinel-2 is a constellation of two identical satellites in the same orbit, imaging land and coastal areas at high spatial resolutions. Tauranga Harbor, New Zealand served as a validation site for extensive ground-truthing data to compare methods. The team developed ensemble machine learning methods including random forest (RF), rotation forests (RoF), and canonical correlation forests (CCF) with the more traditional maximum likelihood classifier (MLC) technique. Using a group of validation metrics, their results indicated that their machine learning techniques outperformed the MLC, and RoF was the best performer.
Now, you have probably lost me at this point: so, let’s break these terms apart to understand what Dr. Ha really did.
In recent years, machine learning (ML) has emerged as a novel approach for seagrass mapping and monitoring. Machine learning has the benefits of rapid learning, working with non-linear data, and an increasing number of new, open source algorithms. In the field of seagrass mapping and monitoring, machine learning is still in its infancy. All of the statistical and machine learning applications used until now have included mixed results, but they support the exploration of new machine learning approaches, particularly for improving low coverage seagrass mapping.
Among the various machine learning algorithms, rotation forest (RoF) and canonical correlation forest (CCF) algorithms are now emerging as reliable techniques for land cover mapping, landslide mapping using multi-spectral or hyper-spectral imagery, and rapid building mapping using multi-source data. These machine learning algorithms are well-known for detecting boundaries. These techniques potentially offer benefits in the classification of low coverage through enhanced recognition of edge boundaries. Therefore, their goal in this study was to compare the use of three ML algorithms to the more traditional machine learning approaches for mapping the aboveground distribution of seagrass communities at low and high coverage using satellite imagery data.
The team’s target was Tauranga Harbor, New Zealand, as it offers a mosaic of dense, sparse, and zero seagrass coverage. The team discussed the difference in the performance of the selected models for seagrass detection at two densities in this paper. Their hope was that their results would contribute alternative solutions for the mapping and monitoring of seagrass at various regions in the world, and assist in the conservation of this important blue carbon ecosystem.
So what does it all mean?
Well, first, in this study the machine learning methods outperformed the statistical methods for all evaluation metrics.
Until now, no literature had looked at the comparative performance of the machine learning classifier methods for seagrass mapping, along with a full radiometric correction (calibrating the image pixel values and correcting for error) of the image. Additionally, of the machine learning ensemble approaches that Dr. Ha’s team used the RoF model to demonstrate superior performance than that of CCF and RF; this is a unique result because previous studies demonstrate superior performance of CCF. Of the methods tested here, only the RF technique has previously been applied to seagrass mapping using very high spatial resolution imagery. In that case, high precision (0.947) and recall (0.968) values were determined mapping Posidonia oceanica from digital airborne images, though no comparison to other methods was attempted. In another seagrass study, the overall accuracy only reached 82% using the RF algorithm applied to RapidEye imagery. Considering the size of the seagrass meadows and the mix of substrate in Tauranga Harbor, the measured scores in Dr. Ha’s team’s results were reliable for both dense and sparse seagrass mapping using medium spatial resolution of Sentinel-2 data (10 m pixel size), which is a really interesting result!
Dr. Ha’s results attest to the reliable application of the RoF model for the mapping and monitoring of seagrass in shallow water using satellite imagery. Despite a lower accuracy for sparse than dense seagrass meadow classification, the CCF model shows potential for the mapping of seagrass and merits further testing at various scales and in various case studies. Regarding MLC, this model is still an applicable candidate for dense seagrass meadows, however, it may not be applicable for the mapping of sparse to very sparse seagrass meadows.
With the development of computer vision and pattern recognition, deep learning approaches using a variety of algorithms such as convolutional neural networks (CNNs or recurrent neural networks (RNNs)) for semantic segmented imagery applied sub-pixel techniques should be encouraged for future studies.
Krti is interested in the transmission dynamics of environmental diseases as they relate to climate and anthropogenic stressors. As a Fulbright Scholar, Krti conducted analyses on the responses of dengue fever to climatic stressors off the coast of the Bay of Bengal, in India. Currently, Krti works with Stanford University to understand the role of schistosomiasis in environmental reservoirs, and leads the pursuit of a computational-based based analysis of eelgrass wasting disease dynamics. At Stanford, Krti serves as one of the few trans-disciplinary experts for planetary health topics, via machine learning and computer vision, data science, environmental policy, and science communication. As a STEM innovator and a first-generation woman of color, Krti is proud to be a writer for Oceanbites!