Page 9 - IJAMD-2-1
P. 9
International Journal of AI for
Materials and Design
Utilizing AI for NTSB UAV accident categorization
aviation incidents. These reports, covering accidents analysis. After preprocessing, the sanitized text is input
from April 2006 to August 2023, were obtained through into the GPT-4 model, which categorizes each report into
the NTSB Aviation Investigation Search platform. The predefined aviation occurrence categories based on the
15
dataset includes various fields such as event dates, probable narrative descriptions. The categorized data is then saved
causes, and geographic coordinates, which are crucial for for subsequent analysis.
analyzing and categorizing the accidents.
The visualization script, NTSB_reports_visualisation_
The collected data underwent several preprocessing and_map_V1.1.py, focuses on visualizing the categorized
steps to ensure its quality and usability. Dynamic UAV accident data. The script begins by loading the
23
encoding detection was employed to accurately read files categorized accident data. It then uses various data
with different encodings, preventing data corruption. visualization tools, such as Matplotlib, Seaborn, and
16
Text sanitization was performed using the Python and Folium, to generate visual representations of the data. The
Unidecode library, which converts text to ASCII, making visualizations include line graphs and bar charts to illustrate
it uniform and easier to process. 17,18 Python’s NumPy and temporal trends and accident frequencies over time. In
Pandas together with the error handling mechanisms addition, interactive maps created with Folium highlight
were implemented to manage missing values, inconsistent geographic distributions and accident hotspots. These
formats, and other anomalies in the dataset, ensuring the visualizations enable the identification of key insights,
integrity and reliability of the processed data. 19,20 trends, and patterns in the accident data, facilitating a
3.1. AI categorization deeper understanding of UAV accident occurrences.
OpenAI’s GPT-4 application programming interface Together, these scripts streamline the processes of
(API) was utilized to categorize the probable causes of categorizing and visualizing NTSB accident reports,
UAV accidents. The categorization process involved enhancing the efficiency and accuracy of data analysis.
21
feeding the cleaned and sanitized report text data into the 4. Results
GPT-4 model, which then assigned an aviation occurrence
category from the CAST-ICAO common taxonomy team Following, OpenAI API was used to assign accident
(CICTT) to each accident report. This approach leveraged categories from the CICTT taxonomy to the NTSB UAV
22
the advanced NLP capabilities of GPT-4 to accurately accident reports. Table 1 lists names and codes of the
22
interpret and classify the narrative descriptions of the categories assigned by the AI using the ICAO list.
accidents, thereby automating the categorization process The analysis revealed that the primary classification
and reducing manual effort. 23 of UAV accidents is system and component failure,
3.2. Visualization specifically categorized as “System and Component Failure
or Malfunction (SCF-NP).” This category encompasses
Various data visualization tools and techniques were used issues such as loss of control, transmission failures,
to illustrate the findings and trends in the UAV accident
data. Libraries such as Matplotlib, Seaborn, and Folium
were employed to create charts, graphs, and maps. 12-14 Table 1. UAV accident codes and category names
These visualizations helped in identifying geographic Code Category name
distributions, temporal patterns, and key insights from the FUEL Fuel related
data. For instance, Matplotlib and Seaborn were used to ICE Icing
generate detailed plots showing seasonal variations and
accident trends over the years, while Folium was utilized RAMP Ground handling
to create interactive maps highlighting accident hotspots WSTRW Wind shear or thunderstorm
across the United States. LOC-G Loss of control – ground
SEC Security related
3.3. Python scripts for NTSB accident report analysis
SCF-PP System/component failure or malfunction (powerplant)
The categorization script, NTSB_analysis_with_gpt4_ MAC Airprox/TCAS alert/loss of separation/near midair
V3.0.py, employs OpenAI’s GPT-4 model to categorize collisions/midair collisions
UAV accident reports from the NTSB. The process NAV Navigation errors
23
begins with data preprocessing, where the script reads LOC-I Loss of control – inflight
raw accident reports and cleans and normalizes the text
to ensure uniformity. This preprocessing step is crucial ARC Abnormal runway contact
for removing inconsistencies and preparing the text for SCF-NP System/component failure or malfunction (non-powerplant)
Volume 2 Issue 1 (2025) 3 doi: 10.36922/ijamd.8544

