Page 89 - IJAMD-1-2

P. 89

International Journal of AI for
Materials and Design
AMTransformer for process dynamics

and a transpose operator, denoted by Q, K, V, and T,
respectively, as shown in Equation XI: 28,31

AttentionQ KV,, softmax( QK T V ) (XI)
d k
where q = N (ε ) ∈Q, k = N (ε ) ∈ K, and v = N (ε ) ∈
i
k
i
i
i
i
v
i
q
V. Each neural network N , N , and N contains weight
q
k
v
matrices that perform linear transformations on the input
AM state embedding ε . These transformations are applied
i
as shown in Equation XII: 28
q = ε W , k = ε W , v = ε W (XII) Figure 6. An example of a masked attention mechanism applied to melt
i i Q i i K i i V pool states, showing how dynamical dependencies are captured. When
where ε is the input embedding, and W , W , and W v the query is ε , the decoder masks future information, focusing only on
i–1
i
k
Q
are the weight matrices for the queries, keys, and values, the previous sequence. The black arrows signify transition, while the blue
respectively. In the multi-head attention mechanism, the arrows represent dependencies captured through attention.
outputs of all attention heads are concatenated and linearly forward neural network is used to apply a non-linear
transformed to produce the final attention output, as transformation to every individual AM state in the input
shown in Equation XIII: 28
sequence. This part enhances the model’s representational
MultiHead (Q, K, V) = Concat(h ,⋯, h )W O (XIII) ability by introducing non-linear processing and
1 n
where W is the output weight matrix that projects the facilitating interaction between different AM states in
O
concatenated outputs back to the dimensions of input the input sequence. This non-linearity is crucial because
th
embeddings, h represents the output of the j attention it allows the AMTransformer to capture intricate patterns
j
head, and n is the number of attention heads. and relationships inherent in the data that cannot be
represented by linear transformations alone. The output of
Multi-head attention allows the AMTransformer to the transformer consists of predicted embedding vectors,
explore various complexities of AM processes captured which the decoder of the AM state embedder reconstructs
by the AM state embeddings at different positions. Each back into the original physical space.
attention head focuses on different parts of the embedding
concatenations, capturing diverse dynamical dependencies. The transformer undergoes training utilizing a loss
AM processes have a hierarchical structure, with each point, function that enhances its accuracy and performance, as
line, or layer contributing to the overall shape and properties shown in Equation XIV:
of complex manufactured objects. The AMTransformer
captures this complexity through multi-head self-attention
mechanisms, allowing the model to simultaneously attend (XIV)
to different levels of abstraction. As the model employs where L stands for the total loss function of the
T
multi-head masked attention layers, the dimension of the transformer, ε represents the actual embedded vector for
i
j
query and key for each layer is obtained by dividing the the j AM state in the i AM state sequence, ε represents
i
th
th
dimension of the AM embedding vector by the number of j
parallel attention layers, which is equivalent to the number of the transformer’s predicted embedding for , N represents
heads. The masking in the attention mechanism enables the the number of the AM state sequences, T represents the
decoder to selectively focus on past and present AM states length of an AM state sequence, which is the number of
during training. This masking mechanism ensures that the AM state embeddings constituting an AM state sequence,
AMTransformer learns to predict the future AM states in and l represents the loss incurred for each future AM state
T
the concatenations based solely on the relevant spatial and embedding prediction.
temporal dynamical dependencies from the AM states that
have already occurred at that time, without any information 5. Case study
from the future, as illustrated in Figure 6. In addition, In this section, we present a case study that demonstrates
masking enables the AMTransformer to handle multiple the proposed AMTransformer using LPBF melt pool
sequence positions in parallel, up to the number of heads. experimental data. The objective of this case study is to
The last part of the decoder is the feed-forward neural assess the effectiveness of the AMTransformer, with a
network. In each decoder layer of the architecture, a feed- specific focus on predicting future melt pools.
Volume 1 Issue 2 (2024) 83 doi: 10.36922/ijamd.3919

84 85 86 87 88 89 90 91 92 93 94