# transformer-torch **Repository Path**: aiworkstep/transformer-torch ## Basic Information - **Project Name**: transformer-torch - **Description**: 基于pytorch的transfomer - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2026-01-13 - **Last Updated**: 2026-01-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Transformer 基于 pytorch 的 transformer,由 [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html) 而来 - ### block ``` embedding - Embeddings(vocab_size, embedding_size) - PositionalEncoding(embedding_size, max_len) multi-head attention - self-attention(q, k, v, mask) - MultiHeadedAttention(head, hidden_size) - q/k/v -> linear -> [batch_size, seq_len, head, dim_k] -> [bs, h, sl, dk] - softmax(q*k'/sqrt(d_k)) * v [bs, h, sl, dk] - concat [bs, sl, dim_k] - attention值保留下来,可用于后面进行可视化 encoder - x, mask - EncoderLayer - multi-head attention - feed-forward (hidden_size, ff_size) max(0, w1*x+b1)*w2+b2 - Add & Norm decoder - memory, x, src_mask, tgt_mask - DecoderLayer - self-att(multi-head, x, tgt_mask) - src-att(multi-head, x, memory, src_mask) - feed-forward - Add & Norm ``` - ### transformer