Introduction
The template I chose for this project is “Gather your own dataset”.
This project intends to use state-of-the-art image generation models such as Latent Diffusion Models popularized by Stable Diffusion to generate a synthetic aircraft runway image dataset to be used in vision-based landing tasks. This approach can greatly increase the availability of runway images to be used in other research projects, and, by sharing the models weights freely, allow other researchers to generate their own synthetic dataset with their own desired charactheristics.
Motivation
Currently, there is an increasing interest in autonomous systems in the aerospace field. Commercial airplanes already have autopilot when the aircraft is in cruise mode, but takeoff and landing are still heavily in the pilots’ hands. Machine learning models are a promising approach to help in vision-based landing, specially in the runway detection component.
But, contemporary machine learning techniques usually require 10s or 100s of thousands of items in a dataset. But there is a lack of open-source datasets of aerial images that can be used in the task of runway detection.
Synthetic image dataset generation can be a viable strategy to bridge this gap of data availability.
Literature Review
In this section, prior work on building a runway dataset is reviewed. Four well-known, publicly available runway images datasets are reviewed. Additionally, papers relevant to the methodology are reviewed as well.
LARD – Landing Approach Runway Detection [1] is a dataset built using images from Google Earth. It uses geometry to to simulate the viewpoint of a camera in the airplane’s nose seeing the runway. Along with the public dataset, code to generate images using Google Earth studio is also provided. Currently, the dataset has over 17,000 images.
FS2020 Runway Dataset [2] is a public runway dataset available on Kaggle built using extracted video footage from Microsoft Flight Simulator. The researches recorded videos of airplanes landing on 60 different runways across the world in different light conditions. Then, the videos are split into several frames, and each one annotated with segmentations labels. The dataset contains 5,587 images.
BARS: A Benchmark for Airport Runway Segmentation [3] is a public dataset built using images from the X-Plane, a FAA-certified flight simulator. Their dataset contains 10,256 images, collected from different airports and weather conditions, to simulate real-world scenarios.
Runway Landing Dataset (RLD) [4] is also a dataset built using X-Plane. The researchers built RLD to use in VALNet, a runway image segmentation network. In the paper, the researchers had access to BARS and other non-publicly available datasets. They claim to have built RLD to “alleviate the deficiency of large-scale, high-quality, publicly available datasets for runway segmentation”. The dataset includes images covering entire landing scenarios and different weather conditions. The dataset has 12,239 images.
LARD, the biggest dataset, is also the most recent one. The images generated by it are easily reproducible, as all the code to generate new scenarios and new images is publicly available. In the synthetic images of LARD, there are no variations in the weather conditions and night view is simulated with simple reduction of lightness in the photos, both of which create unrealistic images.
The datasets built using flight simulators (FS2020, BARS, RLD) have far better data diversity as they are able to produce realistic images in different weather and light conditions. But, as they use flight simulator games, they are not easily reproducible and extendable.
These trade-offs indicate the potential of the novel approach of this paper. A runway image generating model has the potential to be easily reproducible and extendable as well as being capable of generating images for different realistic scenarios. It could also help in generating edge-cases that are hard to replicate in flight simulators and also generate images of novel and unseen airports and runways.
The paper Runway Detection and Localization in Aerial Images Using Deep Learning [5] trains a two-stage pipeline: first, there is a model that detects if there’s a runway in the image and then there is another model that does localization. Their approach to runway detection is using a Resnet50 model, fine-tuned with a runway image dataset. They are able to achieve an accuracy of around 90% in detection, although the dataset used in evaluation uses images with a top-down or satellite-like camera perspective, which is not realistic in the vision-based landing scenario explored in this paper.
A hard problem in building a dataset in general is evaluating the quality of this dataset. Ideally, the best way would be to train a runway detection and segmentation model on this new dataset and evaluating the performance of this dataset. Because of time constraints, this won’t be covered in this paper. But training a simple model to detect if there’s a runway in the image can be a helpful proxy to the quality of the dataset.
References
[1] - https://arxiv.org/pdf/2304.09938 [2] - https://www.researchgate.net/publication/379082680_An_image-based_runway_detection_method_for_fixed-wing_aircraft_based_on_deep_neural_network [3] - https://arxiv.org/abs/2210.12922 [4] - https://www.mdpi.com/2072-4292/16/12/2161 [5] - https://ieeexplore.ieee.org/document/8945889
Stable Diffusion
https://arxiv.org/abs/2112.10752 https://arxiv.org/abs/2302.05543 https://arxiv.org/abs/2307.01952
tutorial / library: https://huggingface.co/docs/diffusers/en/index