Page 89 - IJAMD-1-2
P. 89

International Journal of AI for
            Materials and Design
                                                                                   AMTransformer for process dynamics


            and a transpose operator, denoted by  Q,  K,  V, and  T,
            respectively, as shown in Equation XI: 28,31


            AttentionQ KV,,   softmax( QK T  V )     (XI)
                                      d k
            where q = N  (ε ) ∈Q, k = N  (ε ) ∈ K, and v  = N  (ε ) ∈
                               i
                                   k
                                                 i
                                      i
                         i
                  i
                                                     v
                                                       i
                      q
            V.  Each  neural  network  N ,  N ,  and  N   contains  weight
                                  q
                                     k
                                            v
            matrices that perform linear transformations on the input
            AM state embedding ε . These transformations are applied
                              i
            as shown in Equation XII: 28
            q = ε  W , k = ε  W , v  = ε  W           (XII)    Figure 6. An example of a masked attention mechanism applied to melt
             i  i  Q  i  i  K  i  i  V                         pool states, showing how dynamical dependencies are captured. When
            where  ε  is the input embedding, and  W ,  W , and  W   v  the query is ε , the decoder masks future information, focusing only on
                                                                       i–1
                  i
                                                  k
                                              Q
            are the weight matrices for the queries, keys, and values,   the previous sequence. The black arrows signify transition, while the blue
            respectively. In the multi-head attention mechanism, the   arrows represent dependencies captured through attention.
            outputs of all attention heads are concatenated and linearly   forward neural network is used to apply a non-linear
            transformed to produce the final attention output, as   transformation to every individual AM state in the input
            shown in Equation XIII: 28
                                                               sequence. This part enhances the model’s representational
            MultiHead (Q, K, V) = Concat(h ,⋯, h )W O  (XIII)  ability by introducing non-linear processing and
                                     1    n
            where  W  is the output weight matrix that projects the   facilitating interaction between different AM states in
                   O
            concatenated outputs back to the dimensions of input   the input sequence. This non-linearity is crucial because
                                                 th
            embeddings,  h represents the output of the  j  attention   it allows the AMTransformer to capture intricate patterns
                        j
            head, and n is the number of attention heads.      and relationships inherent in the data that cannot be
                                                               represented by linear transformations alone. The output of
              Multi-head attention allows the AMTransformer to   the transformer consists of predicted embedding vectors,
            explore various complexities of AM processes captured   which the decoder of the AM state embedder reconstructs
            by the AM state embeddings at different positions. Each   back into the original physical space.
            attention head focuses on different parts of the embedding
            concatenations, capturing diverse dynamical dependencies.   The  transformer  undergoes  training  utilizing a  loss
            AM processes have a hierarchical structure, with each point,   function that enhances its accuracy and performance, as
            line, or layer contributing to the overall shape and properties   shown in Equation XIV:
            of complex manufactured objects. The AMTransformer
            captures this complexity through multi-head self-attention
            mechanisms, allowing the model to simultaneously attend                                      (XIV)
            to different levels of abstraction. As the model employs   where  L  stands for the total loss function of the
                                                                      T
            multi-head masked attention layers, the dimension of the   transformer,  ε  represents the actual embedded vector for
                                                                           i
                                                                           j
            query and key for each layer is obtained by dividing the   the j  AM state in the i  AM state sequence, ε  represents
                                                                                                    i
                                                                                 th
                                                                  th
            dimension of the AM embedding vector by the number of                                   j
            parallel attention layers, which is equivalent to the number of   the transformer’s predicted embedding for   , N represents
            heads. The masking in the attention mechanism enables the   the number of the AM state sequences, T represents the
            decoder to selectively focus on past and present AM states   length of an AM state sequence, which is the number of
            during training. This masking mechanism ensures that the   AM state embeddings constituting an AM state sequence,
            AMTransformer learns to predict the future AM states in   and l  represents the loss incurred for each future AM state
                                                                   T
            the concatenations based solely on the relevant spatial and   embedding prediction.
            temporal dynamical dependencies from the AM states that
            have already occurred at that time, without any information   5. Case study
            from the future, as illustrated in  Figure  6. In addition,   In this section, we present a case study that demonstrates
            masking enables  the  AMTransformer to  handle  multiple   the proposed AMTransformer using LPBF melt pool
            sequence positions in parallel, up to the number of heads.  experimental data. The objective  of this case study is to
              The last part of the decoder is the feed-forward neural   assess the effectiveness of the AMTransformer, with a
            network. In each decoder layer of the architecture, a feed-  specific focus on predicting future melt pools.
            Volume 1 Issue 2 (2024)                         83                             doi: 10.36922/ijamd.3919
   84   85   86   87   88   89   90   91   92   93   94