diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
deleted file mode 100644
index f3fe9db..0000000
--- a/.github/ISSUE_TEMPLATE.md
+++ /dev/null
@@ -1,41 +0,0 @@
----
-name: Issue Report
-about: Report a bug or suggest a feature
-title: '[ISSUE] '
-labels: ''
-assignees: ''
----
-
-## Type
-Please select the type of this issue:
-- [ ] Bug Report
-- [ ] Feature Request
-
-## Description
-A clear and concise description of the issue or suggestion.
-
-## Steps to Reproduce (for bugs)
-1.
-2.
-3.
-
-## Expected Behavior (for bugs)
-What you expected to happen.
-
-## Actual Behavior (for bugs)
-What actually happened.
-
-## Solution Proposal (for features)
-Describe the solution you'd like.
-
-## Alternatives Considered (for features)
-Describe any alternative solutions or features you've considered.
-
-## Environment (for bugs)
-- Python version:
-- AstrAI version (or commit hash):
-- OS:
-- GPU (if applicable):
-
-## Additional Context
-Add any other context, screenshots, or logs here.
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index eae218e..f8f1d17 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,5 +11,6 @@
!LICENSE
!pyproject.toml
+!.github/ISSUE_TEMPLATE/*
!.github/workflows/lint.yml
!.github/workflows/tests.yml
\ No newline at end of file
diff --git a/README.md b/README.md
index 593a410..395aa89 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,6 @@
-
-
AstrAI
+
A lightweight Transformer training & inference framework
@@ -14,7 +13,6 @@
-
@@ -130,7 +128,7 @@ For major changes, please open an issue first to discuss what you would like to
- **GitHub Issues**: [Issue Tracker](https://github.com/ViperEkura/AstrAI/issues)
- **Discussions**: [GitHub Discussions](https://github.com/ViperEkura/AstrAI/discussions)
-- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk/AstrAI)
+- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk)
### License
diff --git a/assets/docs/README-zh-CN.md b/assets/docs/README-zh-CN.md
index 36dc289..4c87f4c 100644
--- a/assets/docs/README-zh-CN.md
+++ b/assets/docs/README-zh-CN.md
@@ -1,7 +1,6 @@
-
-
AstrAI
+
-
## 📖 目录
@@ -131,7 +129,7 @@ python scripts/demo/generate_ar.py
- **GitHub Issues**: [问题追踪](https://github.com/ViperEkura/AstrAI/issues)
- **Discussions**: [GitHub 讨论区](https://github.com/ViperEkura/AstrAI/discussions)
-- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk/AstrAI)
+- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk)
### 许可证
diff --git a/assets/docs/introduction.md b/assets/docs/introduction.md
index b7fc12f..9d5e4a6 100644
--- a/assets/docs/introduction.md
+++ b/assets/docs/introduction.md
@@ -1,12 +1,54 @@
## Model Introduction
-
-
### 1. Model Architecture
This model uses the Transformer architecture with GQA mechanism (q_head=24, kv_head=4), which saves KV cache memory compared to traditional MHA (although KV cache is not currently implemented). The model is built by stacking 24 layers of Transformer blocks, with 1.0 billion parameters. Transformer is an autoregressive model that calculates the relationship between all previous tokens to obtain the probability distribution of the next token.
-
+```mermaid
+flowchart TB
+ subgraph Layers["Transformer Layers"]
+ direction TB
+ A[Input Embedding] --> B[Transformer Block\nLayer 1]
+ B --> C[Transformer Block\nLayer ...]
+ C --> D[Transformer Block\nLayer 32]
+ D --> E[RMSNorm]
+ E --> F[Linear]
+ F --> G[SoftMax]
+ end
+
+ subgraph TransformerBlock["Transformer Block"]
+ direction TB
+ H[x] --> I[RMSNorm]
+ I --> J[Linear → Q/K/V]
+ J --> K[Q]
+ J --> L[K]
+ J --> M[V]
+ K --> N[RoPE]
+ L --> O[RoPE]
+ N --> P["Q @ K^T / sqrt(d)"]
+ O --> P
+ P --> Q[Masked SoftMax]
+ Q --> R[S @ V]
+ M --> R
+ R --> S[Linear]
+ S --> T[+]
+ H --> T
+ T --> U[RMSNorm]
+ U --> V[Linear]
+ V --> W[SiLU]
+ V --> X[×]
+ W --> X
+ X --> Y[Linear]
+ Y --> Z[+]
+ T --> Z
+ Z --> AA[x']
+ end
+
+ classDef main fill:#e6f3ff,stroke:#0066cc;
+ classDef block fill:#fff2e6,stroke:#cc6600;
+ class Layers main;
+ class TransformerBlock block;
+```
What is an autoregressive model? After splitting a sentence into tokens, the model predicts the probability distribution of the next token. This means the model calculates the probability of the next possible token and its corresponding probability based on the given context (the sequence of tokens that have already appeared).
diff --git a/assets/images/logo.png b/assets/images/logo.png
new file mode 100644
index 0000000..a8987fb
Binary files /dev/null and b/assets/images/logo.png differ
diff --git a/assets/images/project_logo.png b/assets/images/project_logo.png
deleted file mode 100644
index 25cecdb..0000000
Binary files a/assets/images/project_logo.png and /dev/null differ
diff --git a/assets/images/structure.png b/assets/images/structure.png
deleted file mode 100644
index 5f4f3a2..0000000
Binary files a/assets/images/structure.png and /dev/null differ