Computational Data Analysis of Simulated FOQA Data¶
Georgia Tech: ISYE-6740 Fall 2023 Project
- Brian Popp
- Personal (preferred): bpopp@bpopp.net
- Georgia Tech: bpopp7@gatech.edu
About this Project¶
Modern airplanes are equipped with hardware that records thousands of data parameters multiple times per second (1-16Hz). This data is transmitted to recording devices on a network of busses and sensors that measure things like:
- Indicated Airspeed (IAS): speed of the air moving over the surface of the airplane’s wing (knots or kts)
- Above Ground Level (AGL): the altitude of the plane above the ground (typically in feet)
Using this data, airlines can evaluate the performance and safety of their pilots and look for things that could indicate higher levels of risk. In the United States, these programs are designated by the FAA as Flight Operational Quality Assurance programs, or FOQA (1).
For this study, I’ve focused on the final approach of each flight, just prior to touchdown. This is typically the most demanding and potentially dangerous phase of flight (3). The following three charts show the altitude (AGL) and airspeed (IAS) of several flights just prior to landing:
With just these two parameters, we can derive a surprising amount about what happened during their descent. In aviation, airspeed is measured in knots (abbreviated kts) and this will vary greatly depending on airplane, gross weight, and wind. For the Cessna 172 used in this study, approach speeds are expected to be between 60 and 80 kts. The first flight is very slow (45 kts) and their glidepath gets erratic as they get close to the ground. The second flight is extremely fast, maintaining speeds of over 120 kts throughout most of their descent. The third flight is nearly perfect, with a consistent speed and constant rate of descent all the way to the ground.
Problem Statement¶
Traditional FOQA programs identify problem flights by using simple threshold alerts that look for specific, known issues. For example, an alert could be triggered if a pilot doesn’t lower the gear before descending below 1000ft. Like unit tests in software development, the effectiveness of these rulesets are heavily dependent on the domain knowledge of the developer and the exhaustiveness of the tests.
The objective of this study is to use modern analytical models to isolate abnormal flights, like those shown in the Introduction, without these explicitly defined rulesets.
The Data Problem¶
Airlines are generally not willing to release sensitive safety data into the public domain because doing so could violate the trust of their pilots and potentially expose the company to misguided scrutiny. There are some public datasets , but they are heavily de-identified and it’s difficult to ensure their validity (2).
To work around this issue, I’ve developed a plugin that will export flight recorder data from flights flown using the popular X-Plane 12 flight simulator. This data will match actual FOQA data as closely as possible and the hope is that these methodologies would translate directly to commercial aviation.
Methodology¶
Introduction to Data¶
The X-Plane plugin captures a handful of important parameters for measuring an airplane’s movement and energy during their descent to the runway. The data is currently sampled at a rate of 5 times per second (5hz), and each flight’s data is written to a comma separated (CSV) file in the user’s X-Plane directory. After touchdown, this flight data is sent to the GTFOQA website using an anonymous HTTP POST.
Data is checked for validity and some cleanup is done to ensure it meets the guidelines for the research. Once validated, it is imported into a SQLite database using the following schema:
This is a sample of the recorded gtf_flightdata showing 1 second of flight for flight number 1701753615:
Current dataset dimensions: (56629, 31)
ftime | lat | long | agl | gspd | ias | pitch | acc_vert | dist_from_td | |
---|---|---|---|---|---|---|---|---|---|
flight_num | |||||||||
1701753615 | 0.000000 | 35.092648 | -89.987892 | 883.082944 | 80.501259 | 78.302086 | -4.605147 | NaN | 2.868947 |
1701753615 | 0.218624 | 35.092571 | -89.987885 | 881.762357 | 80.658978 | 78.428658 | -4.736148 | NaN | 2.864331 |
1701753615 | 0.437248 | 35.092484 | -89.987869 | 880.205637 | 80.846345 | 78.580582 | -4.918189 | 0.954363 | 2.859111 |
1701753615 | 0.655872 | 35.092403 | -89.987854 | 878.691559 | 81.027549 | 78.728859 | -5.009783 | 0.939726 | 2.854252 |
1701753615 | 0.874496 | 35.092316 | -89.987846 | 877.008519 | 81.225185 | 78.897163 | -4.927802 | 0.961873 | 2.849037 |
76 total flights
The full dataset is composed of around 76 recorded flights flown by myself and another pilot. Most flights were flown conventionally, but some were added with varying degrees of instability. For each approach, I’ve captured between 500 and 900 observations with over 2 dozen parameters measuring energy and general flight characteristics (pitch, heading, speed, etc.). This puts the current dataset at around 56,629 rows x 31 columns.
At the conclusion of each flight, labels were assigned to identify general characteristics such as whether it was flown fast , slow , or relatively stable. The labels will be used in the evaluation phase to test the accuracy of our models.
Principal Component Analysis with KMeans Clustering¶
The objective of this unsupervised model is to use Principal Component Analysis (PCA) to reduce the dimensionality of time-series parameter data captured during an airplane’s approach, and then classify each flight based on its flight characteristics. This is a simpler version of the AD-Flight model described by Li in her PHD disertation, Anomaly detection in airline routine operations using flight data recorder data (4).
Step 1: Data Preparation¶
Even though we’re capturing dozens of parameters, there are really only a few relevant to our goal of identifying flight instability. For this study, I chose:
- Above Ground Level (agl): the altitude of the airplane above the ground in feet
- Indicated Airspeed (ias): the speed of the aircraft in kts
- Angle of Attack (pitch): the angle of the wings relative to the direction of travel
- Vertical Descent Rate (v_alt): the feet per minute that the aircraft is increasing or decreasing in altitude
It’s also important that we compare the same timeframe across each flight. Each sample in the flightdata table was given 3 fields to help identify its position relative to the runway.
iter_from_td | time_from_td | dist_from_td | agl | |
---|---|---|---|---|
flight_num | ||||
1695604080 | -1 | -0.22508 | -0.003235 | 0.193172 |
1695604080 | 0 | 0.00000 | 0.000000 | 0.173555 |
1695604080 | 1 | 0.22508 | 0.003175 | 0.226681 |
All three fields decrease from positive to negative as the plane approaches the runway, are zero at the moment of touchdown, and decrease from zero as the plane rolls down the runway. Iter_from_td is a simple integer, time_from_td is the time in seconds, and dist_from_td is distance in nautical miles.
For PCA, we need to take samples of each flight just prior to touchdown and arrange them in an m x 1 matrix for each flight and each parameter. For example, in this study, I used 300 samples (60 seconds). 300 samples x 4 parameters will leave us with a matrix that is 1200 columns wide by 76 rows deep (1 row per flight). We start with something that looks like this:
iter_from_td | agl | |
---|---|---|
flight_num | ||
1695604080 | 1 | 0.226681 |
1695604080 | 2 | 0.389290 |
1695604080 | 3 | 0.495893 |
1695604080 | 4 | 0.884472 |
1695604080 | 5 | 1.218021 |
And this will be transformed to somethign like this:
agl | ... | v_alt | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
iter_from_td | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 |
flight_num | |||||||||||||||||||||
1695604080 | 0.226681 | 0.389290 | 0.495893 | 0.884472 | 1.218021 | 1.473222 | 1.809524 | 2.140974 | 2.540337 | 2.878341 | ... | -2.217522 | -2.206491 | -2.201859 | -2.196597 | -2.185700 | -2.163687 | -2.128049 | -2.071956 | -2.008394 | -1.923710 |
1695605295 | 0.194048 | 0.118274 | -0.069434 | 0.752770 | 3.235868 | 5.445023 | 7.413692 | 9.236372 | 10.689563 | 11.859430 | ... | -1.245180 | -1.355882 | -2.180436 | -2.453055 | -1.943104 | -1.569088 | -2.347954 | -2.973040 | -2.301677 | -1.083545 |
1695605323 | 0.446178 | 0.443276 | 0.414198 | 0.931855 | 1.558640 | 2.002722 | 2.770974 | 3.552762 | 4.316129 | 5.068889 | ... | -4.112582 | -4.124018 | -4.134045 | -4.154707 | -4.173733 | -4.200887 | -4.220607 | -4.232209 | -4.223803 | -4.198871 |
1695605546 | 0.217129 | 0.187803 | 0.206922 | 1.118811 | 2.320607 | 3.584817 | 4.698702 | 6.104926 | 7.492855 | 8.932585 | ... | -2.567677 | -2.540776 | -2.508448 | -2.481683 | -2.477198 | -2.535953 | -2.640717 | -2.690482 | -2.679015 | -2.649963 |
4 rows × 1200 columns
Notice that the agl parameter’s first 5 row values in the first dataframe become columns for the corresponding flight in the second dataframe named pv. The flight_num field is used as an index so that the results can later be joined back to labels, snapshots, or any other metadata that might be needed.
Step 2: PCA¶
Before we can run the PCA, we need to scale our data around the mean. Technically this isn’t absolutely necessary, but it reduces the scale of the components and often improves the performance of the model. I used sklearn’s StandardScaler.
Once the PCA has finished, the high dimensional dataframe will be compressed down into a 76 row x 3 column matrix corresponding to the 3 principal components we requested:
pc1 | pc2 | pc3 | |
---|---|---|---|
flight_num | |||
1695604080 | -23.094983 | 10.103879 | 6.392800 |
1695605295 | -3.200182 | 0.599398 | 9.617005 |
1695605323 | -12.206696 | 9.933152 | -9.304026 |
Step 3: Use KMeans to Identify Clusters¶
KMeans will be used to find clusters of similar flights within our 3 dimensional data of principal components, but first we need to estimate an appropriate cluster size. To do this, I ran KMeans against the PCA matrix using cluster sizes between 2 and 10, and captured the error:
The point where the elbow starts to bend and the error starts reducing significantly can give us an idea which cluster size to use. Based on the error values shown here, 5 appears to be the optimal cluster size.
Next I used the results of the KMeans to add several fields to be used later during plotting and evaluation.
pc1 | pc2 | pc3 | Predicted | PredictedRank | EucDistance | Labels | |
---|---|---|---|---|---|---|---|
flight_num | |||||||
1696989035 | 66.265656 | -12.238110 | 40.484897 | 3 | 1 | 78.612565 | Fast, High Descent, Unstable |
1700456966 | 73.508685 | -12.026897 | 2.010012 | 3 | 1 | 74.513175 | Fast, Hard Landing, No Flare, Unstable |
1695703592 | 40.000347 | 58.653822 | -11.592774 | 1 | 2 | 71.935325 | Fast, Float, High Descent, Unstable |
Predicted are the labels (0-4) that KMeans assigned to each flight. PredictedRank reorders the predictions based on the number of observations in each cluster (1-5). This can be useful for standardizing colors, etc. EucDistance is the Euclidean distance of the point from 0,0,0. This tells us approximately how far from "normal" each observation is. Finally, I joined the results to the categorization labels that were assigned previously.
Step 4: Plotting¶
Finally all that work pays off. The Plotly library gives us a fully interactive, 3 dimensional scatter plot. Click and drag on the chart to rotate the perspective:
Visually, we can see that the model seems to be working. The 3 extremely fast/high descent flights were correctly classified in red in the left corner. The less egregious, but still abnormally fast flights are in orange. The slower, low descent flights are on the opposite end in light gray, and the “normal” flights are in green in the middle near 0,0,0.
As expected, there is some overlap, particularly on the slower end. This is intuitive because there’s less variance between a very slow flight (45 kts) and a normal flight (60-70 kts) compared to a very fast flight (120 kts). We’ll see this more clearly in the Evaluation section.
Step 5: Evaluation¶
This is an unsupervised model, which can be challenging to evaluate. This is compounded because manually classifying flight data can be extremely subjective. One pilot’s view of fast may be very different from another, and it’s not uncommon to see momentary fluctuations in otherwise normal flights. Many airlines have guidelines that outline normal thresholds, but for this study, I unfortunately didn’t set these expectations when we started collecting data.
For this section, I’ve broken the evaluation into two sections. The Visual section provides some intuition into how changes in parameters are reflected in the model. The Quantitative section provides a rough estimate on the accuracy of each cluster (matches/total observations) and overall model.
Visual Evaluation¶
Before we look at accuracy, let’s quickly visualize the most severe outliers our model caught. These flights were all captured by cluster 2 (red):

We’ll start by graphing these 3 flights. The light blue shading represents a “normal” range of values for altitude at that moment in time:
A line plot is a very effective way to look at a few parameters, but it can break down when more than a few dimensions are needed. A heatmap isn’t as intuitive, but it allows us to look at all values and all parameters simultaneously across the entire timespan. The far left of each heatmap is 60 seconds from touchdown and the far right is at touchdown.
To generate the heatmap, we start by calculating z-scores for each parameter using the original pca matrix:
agl|1 | agl|2 | agl|3 | agl|4 | agl|5 | agl|6 | agl|7 | agl|8 | agl|9 | agl|10 | ... | v_alt|291 | v_alt|292 | v_alt|293 | v_alt|294 | v_alt|295 | v_alt|296 | v_alt|297 | v_alt|298 | v_alt|299 | v_alt|300 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
flight_num | |||||||||||||||||||||
1695604080 | -0.360582 | -0.190157 | -0.218723 | -0.475861 | -0.764136 | -0.943757 | -1.001807 | -1.024454 | -1.023507 | -1.034339 | ... | 0.427010 | 0.436121 | 0.453448 | 0.465200 | 0.468491 | 0.480026 | 0.514558 | 0.553893 | 0.576586 | 0.603011 |
1695605295 | -0.399357 | -0.441764 | -0.645536 | -0.559223 | 0.290390 | 0.747728 | 0.946157 | 1.024852 | 1.010024 | 0.954998 | ... | 1.275389 | 1.188343 | 0.472841 | 0.230615 | 0.690041 | 1.018724 | 0.315534 | -0.255165 | 0.316365 | 1.330393 |
1695605323 | -0.099772 | -0.140038 | -0.280401 | -0.445870 | -0.586129 | -0.718257 | -0.667615 | -0.616699 | -0.580382 | -0.549127 | ... | -1.226449 | -1.259611 | -1.295632 | -1.325905 | -1.347074 | -1.365650 | -1.379303 | -1.385738 | -1.389079 | -1.366736 |
3 rows × 1200 columns
And then convert that into a heatmap using Plotly Express' imshow. The value will indicate the number of standard deviations above or below average for that parameter at that moment in time. Red indicates abnormally high values, and blue are abnormally low:
For the first flight (1696989035) found in the top left corner, we see their altitude and speed are very high, particularly as they get closer to the ground. Towards the end of the flight, this causes a significant increase in descent rate (V_ALT). Note that descent rate is negative (ie. -1000ft per minute), which is why blue implies faster descent rates. Notice also that the third flight (1695703592) looks very different. It has a very high altitude and speed initially, but burns off most of that energy as it gets closer to the ground. Based on this flight’s location in the graph, this appears to be captured by the model and with more flights, we would likely see a cluster for just these types of approaches.
Quantitative Evaluation¶
To estimate accuracy, I’ve selected the most commonly assigned label for each cluster and will count the number of observations that have the label, and the number that don’t. For example:
Color | PrimaryLabel | Autopilot | Bad Glidepath | Crosswind | Early Flare | Fast | Float | Hard Landing | High Descent | Late Flare | Low Descent | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PredictedRank | ||||||||||||
1 | red | Fast | 0 | 0 | 0 | 0 | 3 | 0 | 1 | 2 | 0 | 0 |
2 | orange | Fast | 0 | 0 | 0 | 0 | 7 | 2 | 1 | 5 | 0 | 0 |
3 | lightgray | Slow | 0 | 2 | 2 | 0 | 0 | 1 | 3 | 0 | 0 | 2 |
4 | lightgreen | Stable | 0 | 1 | 0 | 1 | 3 | 0 | 4 | 7 | 0 | 0 |
5 | green | Stable | 3 | 2 | 1 | 4 | 2 | 0 | 3 | 0 | 1 | 1 |
Here are the results after comparing each observation to each cluster’s primary label:
Color | PrimaryLabel | Observations | Matches | No Match | Accuracy | |
---|---|---|---|---|---|---|
PredictedRank | ||||||
1 | red | Fast | 3 | 3 | 0 | 100.000000 |
2 | orange | Fast | 9 | 7 | 2 | 77.777778 |
3 | lightgray | Slow | 11 | 8 | 3 | 72.727273 |
4 | lightgreen | Stable | 24 | 16 | 8 | 66.666667 |
5 | green | Stable | 29 | 25 | 4 | 86.206897 |
All | NaN | NaN | 76 | 59 | 17 | 77.631579 |
We can see that the model performed well at identifying stable flights and very fast, high descent flights, but didn’t do as well with the less severe clusters. Overall, we achieved a respectable 76% accuracy.
Conclusion¶
I’m frankly shocked how effective PCA was at condensing 1200 columns worth of flight data into just 3 values. It needs work and a more realistic dataset, but it seems this model has real potential at identifying problematic flights.
That said, with more time, here’s a few things I would do differently:
- Add more flights. While I have a very large dataset, there aren’t enough individual flights to create well-defined clusters. Each approach takes 10-20 minutes and I just didn’t have the time to devote to building a large, realistic dataset. At a commercial airline, you might see 1 or 2 unstable approaches for every 10,000 flights, but my dataset is closer to 1 in 2. I considered opening up the platform to strangers, but was concerned with the quality of data I would get.
- Better define normal. One challenge with this project is that flights were flown relatively freely without any guidelines or thresholds. Being in a simulator, people are less concerned with safety and legality and having unrealistic 120 kt approaches had an adverse impact on the overall model. I experimented with removing these outliers, but didn’t really have enough observations and opted to just leave them in.
- Add parameters. I realized late in the project, after most of my data had been collected, that it would be useful to add several additional parameters from the simulator. Many similar projects use the throttle of the motor to determine how much energy is being added to the aircraft.
- Focus on areas other than the final approach. This is more difficult because the further back in the approach you go, the more variables there are.
This has been a really interesting project and I’ve built a pretty solid infrastructure around it. I look forward to improving on and experimenting more with this dataset for future models.
Additional Areas for Research¶
- Use KDE to preview the general shapes discovered using PCA (see lecture 6, 49:00)
- Generate a "normal" flight for each cluster and pass that data back to the simulator to illustrate its base characteristics
- Attempt to detect differences in airframe, pilot, weather, and flight style (ie. high vs. shallow descent)
- Use probabilistic graphing model and/or random forest to visualize flight characteristics of various flight types
- Try to predict hard (ie. high velocity) landings based on descent parameters
- Use linear regression to find optimal values for stable flight (ie. 2.5 degreees and 65kts for 500ft descent rate)
[NbConvertApp] Converting notebook Research.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 4 image(s). [NbConvertApp] Writing 4741804 bytes to ..\static\research\Research.html