A Novel Framework of Detecting Convective Initiation Combining Automated Sampling, Machine Learning, and Repeated Model Tuning from Geostationary Satellite Data
published in Remote Sensing doi:10.3390/rs11121454
Abstract
This paper proposes a complete framework of a machine learning-based model that detects convective initiation (CI) from geostationary meteorological satellite data. The suggested framework consists of three main processes: (1) An automated sampling tool; (2) machine learning-based CI detection modelling; (3) repeated model tuning through validation. In this study, the automated sampling tool was able to track the CI objects iteratively, even without ancillary data such as an atmospheric motion vector (AMV). The collected samples were used to train the machine learning model for CI detection. Random forest (RF) was used to classify the CI and non-CI. To enhance the advantages of the machine learning approach, we adopted model tuning to iteratively update the training dataset from each validation result by adding hits and misses to the CI samples, and false alarms and correct negatives to the non-CI samples. Using 12 interest fields from the Himawari-8 Advanced Himawari Imager (AHI) over the Korean Peninsula, this simple and intuitive tuning process increased the overall probability of detection (POD) from 0.79 to 0.82 and decreased the overall false alarm rate (FAR) from 0.46 to 0.37 with around 40 min of the lead-time. Amongst the 12 interest fields, Tb (11.2) µm was identified as the most significant predictor in the RF model, followed by Tb (8.6—11.2) µm, and Tb (6.2–7.3) µm. The effect of model tuning on the CI detection performance was also analyzed using spatiotemporal validation maps. By automatically collecting and updating the machine learning training dataset, the suggested framework is expected to help the maintenance of the CI detection model from an operational perspective.
An overall flowchart of this study. The suggested framework is composed as follows: (a) The automated sampling tool; (b) machine learning-based CI modelling; (c) model tuning through validation. ML in (a,b) stands for machine learning.
The process of the automated sampling tool. From the right to left side, the backward cloud object tracking was conducted using the region growing method. Starting with the seed points from radar constant altitude plan position indicator (CAPPI) echo ≥35dBZ area, seed points were updated iteratively. t0 is the time when rain starts to fall.
Time series of CI detection results (a–d) and radar CAPPI echoes (e,f) on 02 August 2017 (CI-C2) from 04:00 to 04:50 after model tuning. In (a–d), the predicted CI area is shown in red with Tb (11.2) µm as a background. The area over the target CI events is marked with a black circle in (e,f). The first ≥35 dBZ of radar echo over the target CI events occurred at 04:50 (f).