docs: 更新说明内容

This commit is contained in:
ViperEkura 2026-03-31 15:18:49 +08:00
parent 4ead0a20cf
commit e7721eafc6
8 changed files with 52 additions and 54 deletions

View File

@ -1,41 +0,0 @@
---
name: Issue Report
about: Report a bug or suggest a feature
title: '[ISSUE] '
labels: ''
assignees: ''
---
## Type
Please select the type of this issue:
- [ ] Bug Report
- [ ] Feature Request
## Description
A clear and concise description of the issue or suggestion.
## Steps to Reproduce (for bugs)
1.
2.
3.
## Expected Behavior (for bugs)
What you expected to happen.
## Actual Behavior (for bugs)
What actually happened.
## Solution Proposal (for features)
Describe the solution you'd like.
## Alternatives Considered (for features)
Describe any alternative solutions or features you've considered.
## Environment (for bugs)
- Python version:
- AstrAI version (or commit hash):
- OS:
- GPU (if applicable):
## Additional Context
Add any other context, screenshots, or logs here.

1
.gitignore vendored
View File

@ -11,5 +11,6 @@
!LICENSE
!pyproject.toml
!.github/ISSUE_TEMPLATE/*
!.github/workflows/lint.yml
!.github/workflows/tests.yml

View File

@ -1,7 +1,6 @@
<div align="center">
<!-- <img src="assets/images/project_logo.png" width="auto" alt="Logo"> -->
<h1>AstrAI</h1>
<img src="assets/images/logo.png" width="auto" alt="Logo">
<p>
<strong>A lightweight Transformer training & inference framework</strong>
</p>
@ -14,7 +13,6 @@
<img src="https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fapi.github.com%2Frepos%2FViperEkura%2FAstrAI&query=%24.stargazers_count&label=stars&suffix=%20stars&color=76bad9" alt="stars">
<img src="https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fapi.github.com%2Frepos%2FViperEkura%2FAstrAI&query=%24.forks_count&label=forks&suffix=%20forks&color=76bad9" alt="forks">
</div>
<br>
<div align="center">
@ -22,7 +20,7 @@
<a href="assets/docs/README-zh-CN.md">中文</a>
<a href="https://github.com/ViperEkura/AstrAI/issues">Issue Tracker</a>
<a href="https://github.com/ViperEkura/AstrAI/discussions">Discussions</a>
<a href="https://huggingface.co/ViperEk/AstrAI">HuggingFace</a>
<a href="https://huggingface.co/ViperEk/">HuggingFace</a>
</div>
<br>
@ -130,7 +128,7 @@ For major changes, please open an issue first to discuss what you would like to
- **GitHub Issues**: [Issue Tracker](https://github.com/ViperEkura/AstrAI/issues)
- **Discussions**: [GitHub Discussions](https://github.com/ViperEkura/AstrAI/discussions)
- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk/AstrAI)
- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk)
### License

View File

@ -1,7 +1,6 @@
<div align="center">
<!-- <img src="../images/project_logo.png" width="auto" alt="Logo"> -->
<h1>AstrAI</h1>
<img src="../images/logo.png" width="auto" alt="Logo">
<div>
<a href="../../README.md">English</a>
@ -28,9 +27,8 @@
<a href="#chinese">中文</a>
<a href="https://github.com/ViperEkura/AstrAI/issues">问题追踪</a>
<a href="https://github.com/ViperEkura/AstrAI/discussions">讨论区</a>
<a href="https://huggingface.co/ViperEk/AstrAI">HuggingFace</a>
<a href="https://huggingface.co/ViperEk">HuggingFace</a>
</div>
<br>
## 📖 目录
@ -131,7 +129,7 @@ python scripts/demo/generate_ar.py
- **GitHub Issues**: [问题追踪](https://github.com/ViperEkura/AstrAI/issues)
- **Discussions**: [GitHub 讨论区](https://github.com/ViperEkura/AstrAI/discussions)
- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk/AstrAI)
- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk)
### 许可证

View File

@ -1,12 +1,54 @@
## Model Introduction
### 1. Model Architecture
This model uses the Transformer architecture with GQA mechanism (q_head=24, kv_head=4), which saves KV cache memory compared to traditional MHA (although KV cache is not currently implemented). The model is built by stacking 24 layers of Transformer blocks, with 1.0 billion parameters. Transformer is an autoregressive model that calculates the relationship between all previous tokens to obtain the probability distribution of the next token.
![structure](../images/structure.png)
```mermaid
flowchart TB
subgraph Layers["Transformer Layers"]
direction TB
A[Input Embedding] --> B[Transformer Block\nLayer 1]
B --> C[Transformer Block\nLayer ...]
C --> D[Transformer Block\nLayer 32]
D --> E[RMSNorm]
E --> F[Linear]
F --> G[SoftMax]
end
subgraph TransformerBlock["Transformer Block"]
direction TB
H[x] --> I[RMSNorm]
I --> J[Linear → Q/K/V]
J --> K[Q]
J --> L[K]
J --> M[V]
K --> N[RoPE]
L --> O[RoPE]
N --> P["Q @ K^T / sqrt(d)"]
O --> P
P --> Q[Masked SoftMax]
Q --> R[S @ V]
M --> R
R --> S[Linear]
S --> T[+]
H --> T
T --> U[RMSNorm]
U --> V[Linear]
V --> W[SiLU]
V --> X[×]
W --> X
X --> Y[Linear]
Y --> Z[+]
T --> Z
Z --> AA[x']
end
classDef main fill:#e6f3ff,stroke:#0066cc;
classDef block fill:#fff2e6,stroke:#cc6600;
class Layers main;
class TransformerBlock block;
```
What is an autoregressive model? After splitting a sentence into tokens, the model predicts the probability distribution of the next token. This means the model calculates the probability of the next possible token and its corresponding probability based on the given context (the sequence of tokens that have already appeared).

BIN
assets/images/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 281 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 590 KiB