Computational Data Analysis of Simulated FOQA Data¶

Georgia Tech: ISYE-6740 Fall 2023 Project

Brian Popp
Personal (preferred): bpopp@bpopp.net
Georgia Tech: bpopp7@gatech.edu

About this Project¶

Modern airplanes are equipped with hardware that records thousands of data parameters multiple times per second (1-16Hz). This data is transmitted to recording devices on a network of busses and sensors that measure things like:

Indicated Airspeed (IAS): speed of the air moving over the surface of the airplane’s wing (knots or kts)
Above Ground Level (AGL): the altitude of the plane above the ground (typically in feet)

Using this data, airlines can evaluate the performance and safety of their pilots and look for things that could indicate higher levels of risk. In the United States, these programs are designated by the FAA as Flight Operational Quality Assurance programs, or FOQA (1).

For this study, I’ve focused on the final approach of each flight, just prior to touchdown. This is typically the most demanding and potentially dangerous phase of flight (3). The following three charts show the altitude (AGL) and airspeed (IAS) of several flights just prior to landing:

No description has been provided for this image

With just these two parameters, we can derive a surprising amount about what happened during their descent. In aviation, airspeed is measured in knots (abbreviated kts) and this will vary greatly depending on airplane, gross weight, and wind. For the Cessna 172 used in this study, approach speeds are expected to be between 60 and 80 kts. The first flight is very slow (45 kts) and their glidepath gets erratic as they get close to the ground. The second flight is extremely fast, maintaining speeds of over 120 kts throughout most of their descent. The third flight is nearly perfect, with a consistent speed and constant rate of descent all the way to the ground.

Problem Statement¶

Traditional FOQA programs identify problem flights by using simple threshold alerts that look for specific, known issues. For example, an alert could be triggered if a pilot doesn’t lower the gear before descending below 1000ft. Like unit tests in software development, the effectiveness of these rulesets are heavily dependent on the domain knowledge of the developer and the exhaustiveness of the tests.

The objective of this study is to use modern analytical models to isolate abnormal flights, like those shown in the Introduction, without these explicitly defined rulesets.

The Data Problem¶

Airlines are generally not willing to release sensitive safety data into the public domain because doing so could violate the trust of their pilots and potentially expose the company to misguided scrutiny. There are some public datasets , but they are heavily de-identified and it’s difficult to ensure their validity (2).

To work around this issue, I’ve developed a plugin that will export flight recorder data from flights flown using the popular X-Plane 12 flight simulator. This data will match actual FOQA data as closely as possible and the hope is that these methodologies would translate directly to commercial aviation.

Methodology¶

Introduction to Data¶

The X-Plane plugin captures a handful of important parameters for measuring an airplane’s movement and energy during their descent to the runway. The data is currently sampled at a rate of 5 times per second (5hz), and each flight’s data is written to a comma separated (CSV) file in the user’s X-Plane directory. After touchdown, this flight data is sent to the GTFOQA website using an anonymous HTTP POST.

Data is checked for validity and some cleanup is done to ensure it meets the guidelines for the research. Once validated, it is imported into a SQLite database using the following schema:

This is a sample of the recorded gtf_flightdata showing 1 second of flight for flight number 1701753615:

Current dataset dimensions: (56629, 31)

	ftime	lat	long	agl	gspd	ias	pitch	acc_vert	dist_from_td
flight_num
1701753615	0.000000	35.092648	-89.987892	883.082944	80.501259	78.302086	-4.605147	NaN	2.868947
1701753615	0.218624	35.092571	-89.987885	881.762357	80.658978	78.428658	-4.736148	NaN	2.864331
1701753615	0.437248	35.092484	-89.987869	880.205637	80.846345	78.580582	-4.918189	0.954363	2.859111
1701753615	0.655872	35.092403	-89.987854	878.691559	81.027549	78.728859	-5.009783	0.939726	2.854252
1701753615	0.874496	35.092316	-89.987846	877.008519	81.225185	78.897163	-4.927802	0.961873	2.849037

76  total flights

The full dataset is composed of around 76 recorded flights flown by myself and another pilot. Most flights were flown conventionally, but some were added with varying degrees of instability. For each approach, I’ve captured between 500 and 900 observations with over 2 dozen parameters measuring energy and general flight characteristics (pitch, heading, speed, etc.). This puts the current dataset at around 56,629 rows x 31 columns.

At the conclusion of each flight, labels were assigned to identify general characteristics such as whether it was flown fast , slow , or relatively stable. The labels will be used in the evaluation phase to test the accuracy of our models.

Principal Component Analysis with KMeans Clustering¶

The objective of this unsupervised model is to use Principal Component Analysis (PCA) to reduce the dimensionality of time-series parameter data captured during an airplane’s approach, and then classify each flight based on its flight characteristics. This is a simpler version of the AD-Flight model described by Li in her PHD disertation, Anomaly detection in airline routine operations using flight data recorder data (4).

Step 1: Data Preparation¶

Even though we’re capturing dozens of parameters, there are really only a few relevant to our goal of identifying flight instability. For this study, I chose:

Above Ground Level (agl): the altitude of the airplane above the ground in feet
Indicated Airspeed (ias): the speed of the aircraft in kts
Angle of Attack (pitch): the angle of the wings relative to the direction of travel
Vertical Descent Rate (v_alt): the feet per minute that the aircraft is increasing or decreasing in altitude

It’s also important that we compare the same timeframe across each flight. Each sample in the flightdata table was given 3 fields to help identify its position relative to the runway.

	iter_from_td	time_from_td	dist_from_td	agl
flight_num
1695604080	-1	-0.22508	-0.003235	0.193172
1695604080	0	0.00000	0.000000	0.173555
1695604080	1	0.22508	0.003175	0.226681

All three fields decrease from positive to negative as the plane approaches the runway, are zero at the moment of touchdown, and decrease from zero as the plane rolls down the runway. Iter_from_td is a simple integer, time_from_td is the time in seconds, and dist_from_td is distance in nautical miles.

For PCA, we need to take samples of each flight just prior to touchdown and arrange them in an m x 1 matrix for each flight and each parameter. For example, in this study, I used 300 samples (60 seconds). 300 samples x 4 parameters will leave us with a matrix that is 1200 columns wide by 76 rows deep (1 row per flight). We start with something that looks like this:

	iter_from_td	agl
flight_num
1695604080	1	0.226681
1695604080	2	0.389290
1695604080	3	0.495893
1695604080	4	0.884472
1695604080	5	1.218021

And this will be transformed to somethign like this:

	agl										...	v_alt
iter_from_td	1	2	3	4	5	6	7	8	9	10	...	291	292	293	294	295	296	297	298	299	300
flight_num
1695604080	0.226681	0.389290	0.495893	0.884472	1.218021	1.473222	1.809524	2.140974	2.540337	2.878341	...	-2.217522	-2.206491	-2.201859	-2.196597	-2.185700	-2.163687	-2.128049	-2.071956	-2.008394	-1.923710
1695605295	0.194048	0.118274	-0.069434	0.752770	3.235868	5.445023	7.413692	9.236372	10.689563	11.859430	...	-1.245180	-1.355882	-2.180436	-2.453055	-1.943104	-1.569088	-2.347954	-2.973040	-2.301677	-1.083545
1695605323	0.446178	0.443276	0.414198	0.931855	1.558640	2.002722	2.770974	3.552762	4.316129	5.068889	...	-4.112582	-4.124018	-4.134045	-4.154707	-4.173733	-4.200887	-4.220607	-4.232209	-4.223803	-4.198871
1695605546	0.217129	0.187803	0.206922	1.118811	2.320607	3.584817	4.698702	6.104926	7.492855	8.932585	...	-2.567677	-2.540776	-2.508448	-2.481683	-2.477198	-2.535953	-2.640717	-2.690482	-2.679015	-2.649963

4 rows × 1200 columns

Notice that the agl parameter’s first 5 row values in the first dataframe become columns for the corresponding flight in the second dataframe named pv. The flight_num field is used as an index so that the results can later be joined back to labels, snapshots, or any other metadata that might be needed.

Step 2: PCA¶

Before we can run the PCA, we need to scale our data around the mean. Technically this isn’t absolutely necessary, but it reduces the scale of the components and often improves the performance of the model. I used sklearn’s StandardScaler.

Once the PCA has finished, the high dimensional dataframe will be compressed down into a 76 row x 3 column matrix corresponding to the 3 principal components we requested:

	pc1	pc2	pc3
flight_num
1695604080	-23.094983	10.103879	6.392800
1695605295	-3.200182	0.599398	9.617005
1695605323	-12.206696	9.933152	-9.304026

Step 3: Use KMeans to Identify Clusters¶

KMeans will be used to find clusters of similar flights within our 3 dimensional data of principal components, but first we need to estimate an appropriate cluster size. To do this, I ran KMeans against the PCA matrix using cluster sizes between 2 and 10, and captured the error:

The point where the elbow starts to bend and the error starts reducing significantly can give us an idea which cluster size to use. Based on the error values shown here, 5 appears to be the optimal cluster size.

Next I used the results of the KMeans to add several fields to be used later during plotting and evaluation.

	pc1	pc2	pc3	Predicted	PredictedRank	EucDistance	Labels
flight_num
1696989035	66.265656	-12.238110	40.484897	3	1	78.612565	Fast, High Descent, Unstable
1700456966	73.508685	-12.026897	2.010012	3	1	74.513175	Fast, Hard Landing, No Flare, Unstable
1695703592	40.000347	58.653822	-11.592774	1	2	71.935325	Fast, Float, High Descent, Unstable

Predicted are the labels (0-4) that KMeans assigned to each flight. PredictedRank reorders the predictions based on the number of observations in each cluster (1-5). This can be useful for standardizing colors, etc. EucDistance is the Euclidean distance of the point from 0,0,0. This tells us approximately how far from "normal" each observation is. Finally, I joined the results to the categorization labels that were assigned previously.

Step 4: Plotting¶

Finally all that work pays off. The Plotly library gives us a fully interactive, 3 dimensional scatter plot. Click and drag on the chart to rotate the perspective:

Visually, we can see that the model seems to be working. The 3 extremely fast/high descent flights were correctly classified in red in the left corner. The less egregious, but still abnormally fast flights are in orange. The slower, low descent flights are on the opposite end in light gray, and the “normal” flights are in green in the middle near 0,0,0.

As expected, there is some overlap, particularly on the slower end. This is intuitive because there’s less variance between a very slow flight (45 kts) and a normal flight (60-70 kts) compared to a very fast flight (120 kts). We’ll see this more clearly in the Evaluation section.

Step 5: Evaluation¶

This is an unsupervised model, which can be challenging to evaluate. This is compounded because manually classifying flight data can be extremely subjective. One pilot’s view of fast may be very different from another, and it’s not uncommon to see momentary fluctuations in otherwise normal flights. Many airlines have guidelines that outline normal thresholds, but for this study, I unfortunately didn’t set these expectations when we started collecting data.

For this section, I’ve broken the evaluation into two sections. The Visual section provides some intuition into how changes in parameters are reflected in the model. The Quantitative section provides a rough estimate on the accuracy of each cluster (matches/total observations) and overall model.

Visual Evaluation¶

Before we look at accuracy, let’s quickly visualize the most severe outliers our model caught. These flights were all captured by cluster 2 (red):

We’ll start by graphing these 3 flights. The light blue shading represents a “normal” range of values for altitude at that moment in time:

A line plot is a very effective way to look at a few parameters, but it can break down when more than a few dimensions are needed. A heatmap isn’t as intuitive, but it allows us to look at all values and all parameters simultaneously across the entire timespan. The far left of each heatmap is 60 seconds from touchdown and the far right is at touchdown.

To generate the heatmap, we start by calculating z-scores for each parameter using the original pca matrix:

	agl\|1	agl\|2	agl\|3	agl\|4	agl\|5	agl\|6	agl\|7	agl\|8	agl\|9	agl\|10	...	v_alt\|291	v_alt\|292	v_alt\|293	v_alt\|294	v_alt\|295	v_alt\|296	v_alt\|297	v_alt\|298	v_alt\|299	v_alt\|300
flight_num
1695604080	-0.360582	-0.190157	-0.218723	-0.475861	-0.764136	-0.943757	-1.001807	-1.024454	-1.023507	-1.034339	...	0.427010	0.436121	0.453448	0.465200	0.468491	0.480026	0.514558	0.553893	0.576586	0.603011
1695605295	-0.399357	-0.441764	-0.645536	-0.559223	0.290390	0.747728	0.946157	1.024852	1.010024	0.954998	...	1.275389	1.188343	0.472841	0.230615	0.690041	1.018724	0.315534	-0.255165	0.316365	1.330393
1695605323	-0.099772	-0.140038	-0.280401	-0.445870	-0.586129	-0.718257	-0.667615	-0.616699	-0.580382	-0.549127	...	-1.226449	-1.259611	-1.295632	-1.325905	-1.347074	-1.365650	-1.379303	-1.385738	-1.389079	-1.366736

3 rows × 1200 columns

And then convert that into a heatmap using Plotly Express' imshow. The value will indicate the number of standard deviations above or below average for that parameter at that moment in time. Red indicates abnormally high values, and blue are abnormally low:

For the first flight (1696989035) found in the top left corner, we see their altitude and speed are very high, particularly as they get closer to the ground. Towards the end of the flight, this causes a significant increase in descent rate (V_ALT). Note that descent rate is negative (ie. -1000ft per minute), which is why blue implies faster descent rates. Notice also that the third flight (1695703592) looks very different. It has a very high altitude and speed initially, but burns off most of that energy as it gets closer to the ground. Based on this flight’s location in the graph, this appears to be captured by the model and with more flights, we would likely see a cluster for just these types of approaches.

Quantitative Evaluation¶

To estimate accuracy, I’ve selected the most commonly assigned label for each cluster and will count the number of observations that have the label, and the number that don’t. For example:

	Color	PrimaryLabel	Autopilot	Bad Glidepath	Crosswind	Early Flare	Fast	Float	Hard Landing	High Descent	Late Flare	Low Descent
PredictedRank
1	red	Fast	0	0	0	0	3	0	1	2	0	0
2	orange	Fast	0	0	0	0	7	2	1	5	0	0
3	lightgray	Slow	0	2	2	0	0	1	3	0	0	2
4	lightgreen	Stable	0	1	0	1	3	0	4	7	0	0
5	green	Stable	3	2	1	4	2	0	3	0	1	1

Here are the results after comparing each observation to each cluster’s primary label:

	Color	PrimaryLabel	Observations	Matches	No Match	Accuracy
PredictedRank
1	red	Fast	3	3	0	100.000000
2	orange	Fast	9	7	2	77.777778
3	lightgray	Slow	11	8	3	72.727273
4	lightgreen	Stable	24	16	8	66.666667
5	green	Stable	29	25	4	86.206897
All	NaN	NaN	76	59	17	77.631579

We can see that the model performed well at identifying stable flights and very fast, high descent flights, but didn’t do as well with the less severe clusters. Overall, we achieved a respectable 76% accuracy.

Conclusion¶

I’m frankly shocked how effective PCA was at condensing 1200 columns worth of flight data into just 3 values. It needs work and a more realistic dataset, but it seems this model has real potential at identifying problematic flights.

That said, with more time, here’s a few things I would do differently:

Add more flights. While I have a very large dataset, there aren’t enough individual flights to create well-defined clusters. Each approach takes 10-20 minutes and I just didn’t have the time to devote to building a large, realistic dataset. At a commercial airline, you might see 1 or 2 unstable approaches for every 10,000 flights, but my dataset is closer to 1 in 2. I considered opening up the platform to strangers, but was concerned with the quality of data I would get.
Better define normal. One challenge with this project is that flights were flown relatively freely without any guidelines or thresholds. Being in a simulator, people are less concerned with safety and legality and having unrealistic 120 kt approaches had an adverse impact on the overall model. I experimented with removing these outliers, but didn’t really have enough observations and opted to just leave them in.
Add parameters. I realized late in the project, after most of my data had been collected, that it would be useful to add several additional parameters from the simulator. Many similar projects use the throttle of the motor to determine how much energy is being added to the aircraft.
Focus on areas other than the final approach. This is more difficult because the further back in the approach you go, the more variables there are.

This has been a really interesting project and I’ve built a pretty solid infrastructure around it. I look forward to improving on and experimenting more with this dataset for future models.

Additional Areas for Research¶

Use KDE to preview the general shapes discovered using PCA (see lecture 6, 49:00)
Generate a "normal" flight for each cluster and pass that data back to the simulator to illustrate its base characteristics
Attempt to detect differences in airframe, pilot, weather, and flight style (ie. high vs. shallow descent)
Use probabilistic graphing model and/or random forest to visualize flight characteristics of various flight types
Try to predict hard (ie. high velocity) landings based on descent parameters
Use linear regression to find optimal values for stable flight (ie. 2.5 degreees and 65kts for 500ft descent rate)

[NbConvertApp] Converting notebook Research.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 4 image(s).
[NbConvertApp] Writing 4741804 bytes to ..\static\research\Research.html