Page 9 - IJAMD-2-1
P. 9

International Journal of AI for
            Materials and Design
                                                                          Utilizing AI for NTSB UAV accident categorization


            aviation incidents. These reports, covering accidents   analysis. After preprocessing, the sanitized text is input
            from April 2006 to August 2023, were obtained through   into the GPT-4 model, which categorizes each report into
            the  NTSB  Aviation  Investigation  Search  platform.   The   predefined aviation occurrence categories based on the
                                                     15
            dataset includes various fields such as event dates, probable   narrative descriptions. The categorized data is then saved
            causes, and geographic coordinates, which are crucial for   for subsequent analysis.
            analyzing and categorizing the accidents.
                                                                 The visualization script,  NTSB_reports_visualisation_
              The collected data underwent several preprocessing   and_map_V1.1.py, focuses on visualizing the categorized
            steps to ensure its quality and usability. Dynamic   UAV accident data.  The script begins by loading the
                                                                               23
            encoding detection was employed to accurately read files   categorized accident data. It then uses various data
            with different encodings, preventing data corruption.    visualization tools, such as Matplotlib, Seaborn, and
                                                         16
            Text  sanitization  was  performed  using  the  Python  and   Folium, to generate visual representations of the data. The
            Unidecode library, which converts text to ASCII, making   visualizations include line graphs and bar charts to illustrate
            it uniform and easier to process. 17,18  Python’s NumPy and   temporal trends and accident frequencies over time. In
            Pandas together with the error handling mechanisms   addition, interactive maps created with Folium highlight
            were implemented to manage missing values, inconsistent   geographic distributions and accident hotspots. These
            formats, and other anomalies in the dataset, ensuring the   visualizations enable the identification of key insights,
            integrity and reliability of the processed data. 19,20  trends,  and patterns in  the  accident  data,  facilitating  a

            3.1. AI categorization                             deeper understanding of UAV accident occurrences.
            OpenAI’s GPT-4 application programming interface     Together, these scripts streamline the processes of
            (API) was utilized to categorize the probable causes of   categorizing and visualizing NTSB accident reports,
            UAV accidents.  The categorization process involved   enhancing the efficiency and accuracy of data analysis.
                         21
            feeding the cleaned and sanitized report text data into the   4. Results
            GPT-4 model, which then assigned an aviation occurrence
            category from the CAST-ICAO common taxonomy team   Following, OpenAI API was used to assign accident
            (CICTT) to each accident report.  This approach leveraged   categories from the CICTT taxonomy to the NTSB UAV
                                      22
            the advanced NLP capabilities of GPT-4 to accurately   accident reports.   Table 1 lists names and codes of the
                                                                             22
            interpret and classify the narrative descriptions of the   categories assigned by the AI using the ICAO list.
            accidents, thereby automating the categorization process   The  analysis revealed that the  primary  classification
            and reducing manual effort. 23                     of UAV accidents is system and component failure,
            3.2. Visualization                                 specifically categorized as “System and Component Failure
                                                               or Malfunction (SCF-NP).” This category encompasses
            Various data visualization tools and techniques were used   issues  such  as  loss  of  control,  transmission  failures,
            to illustrate the findings and trends in the UAV accident
            data. Libraries such as Matplotlib, Seaborn, and Folium
            were employed to create  charts,  graphs, and maps. 12-14    Table 1. UAV accident codes and category names
            These visualizations helped in identifying geographic   Code            Category name
            distributions, temporal patterns, and key insights from the   FUEL  Fuel related
            data. For instance, Matplotlib and Seaborn were used to   ICE  Icing
            generate  detailed plots  showing  seasonal  variations  and
            accident trends over the years, while Folium was utilized   RAMP  Ground handling
            to create interactive maps highlighting accident hotspots   WSTRW Wind shear or thunderstorm
            across the United States.                          LOC-G  Loss of control – ground
                                                               SEC    Security related
            3.3. Python scripts for NTSB accident report analysis
                                                               SCF-PP  System/component failure or malfunction (powerplant)
            The categorization script,  NTSB_analysis_with_gpt4_  MAC  Airprox/TCAS alert/loss of separation/near midair
            V3.0.py, employs OpenAI’s GPT-4  model  to categorize     collisions/midair collisions
            UAV accident reports from the NTSB.  The process   NAV    Navigation errors
                                              23
            begins with data preprocessing, where the script reads   LOC-I  Loss of control – inflight
            raw accident reports and cleans and normalizes the text
            to ensure uniformity. This preprocessing step is crucial   ARC  Abnormal runway contact
            for removing inconsistencies and preparing the text for   SCF-NP  System/component failure or malfunction (non-powerplant)


            Volume 2 Issue 1 (2025)                         3                              doi: 10.36922/ijamd.8544
   4   5   6   7   8   9   10   11   12   13   14