docs: 更新说明内容
This commit is contained in:
parent
4ead0a20cf
commit
e7721eafc6
|
|
@ -1,41 +0,0 @@
|
|||
---
|
||||
name: Issue Report
|
||||
about: Report a bug or suggest a feature
|
||||
title: '[ISSUE] '
|
||||
labels: ''
|
||||
assignees: ''
|
||||
---
|
||||
|
||||
## Type
|
||||
Please select the type of this issue:
|
||||
- [ ] Bug Report
|
||||
- [ ] Feature Request
|
||||
|
||||
## Description
|
||||
A clear and concise description of the issue or suggestion.
|
||||
|
||||
## Steps to Reproduce (for bugs)
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Expected Behavior (for bugs)
|
||||
What you expected to happen.
|
||||
|
||||
## Actual Behavior (for bugs)
|
||||
What actually happened.
|
||||
|
||||
## Solution Proposal (for features)
|
||||
Describe the solution you'd like.
|
||||
|
||||
## Alternatives Considered (for features)
|
||||
Describe any alternative solutions or features you've considered.
|
||||
|
||||
## Environment (for bugs)
|
||||
- Python version:
|
||||
- AstrAI version (or commit hash):
|
||||
- OS:
|
||||
- GPU (if applicable):
|
||||
|
||||
## Additional Context
|
||||
Add any other context, screenshots, or logs here.
|
||||
|
|
@ -11,5 +11,6 @@
|
|||
|
||||
!LICENSE
|
||||
!pyproject.toml
|
||||
!.github/ISSUE_TEMPLATE/*
|
||||
!.github/workflows/lint.yml
|
||||
!.github/workflows/tests.yml
|
||||
|
|
@ -1,7 +1,6 @@
|
|||
<div align="center">
|
||||
<!-- <img src="assets/images/project_logo.png" width="auto" alt="Logo"> -->
|
||||
|
||||
<h1>AstrAI</h1>
|
||||
<img src="assets/images/logo.png" width="auto" alt="Logo">
|
||||
<p>
|
||||
<strong>A lightweight Transformer training & inference framework</strong>
|
||||
</p>
|
||||
|
|
@ -14,7 +13,6 @@
|
|||
<img src="https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fapi.github.com%2Frepos%2FViperEkura%2FAstrAI&query=%24.stargazers_count&label=stars&suffix=%20stars&color=76bad9" alt="stars">
|
||||
<img src="https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fapi.github.com%2Frepos%2FViperEkura%2FAstrAI&query=%24.forks_count&label=forks&suffix=%20forks&color=76bad9" alt="forks">
|
||||
</div>
|
||||
|
||||
<br>
|
||||
|
||||
<div align="center">
|
||||
|
|
@ -22,7 +20,7 @@
|
|||
<a href="assets/docs/README-zh-CN.md">中文</a> •
|
||||
<a href="https://github.com/ViperEkura/AstrAI/issues">Issue Tracker</a> •
|
||||
<a href="https://github.com/ViperEkura/AstrAI/discussions">Discussions</a> •
|
||||
<a href="https://huggingface.co/ViperEk/AstrAI">HuggingFace</a>
|
||||
<a href="https://huggingface.co/ViperEk/">HuggingFace</a>
|
||||
</div>
|
||||
|
||||
<br>
|
||||
|
|
@ -130,7 +128,7 @@ For major changes, please open an issue first to discuss what you would like to
|
|||
|
||||
- **GitHub Issues**: [Issue Tracker](https://github.com/ViperEkura/AstrAI/issues)
|
||||
- **Discussions**: [GitHub Discussions](https://github.com/ViperEkura/AstrAI/discussions)
|
||||
- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk/AstrAI)
|
||||
- **HuggingFace**: [Model Hub](https://huggingface.co/ViperEk)
|
||||
|
||||
### License
|
||||
|
||||
|
|
|
|||
|
|
@ -1,7 +1,6 @@
|
|||
<div align="center">
|
||||
<!-- <img src="../images/project_logo.png" width="auto" alt="Logo"> -->
|
||||
|
||||
<h1>AstrAI</h1>
|
||||
<img src="../images/logo.png" width="auto" alt="Logo">
|
||||
|
||||
<div>
|
||||
<a href="../../README.md">English</a> •
|
||||
|
|
@ -28,9 +27,8 @@
|
|||
<a href="#chinese">中文</a> •
|
||||
<a href="https://github.com/ViperEkura/AstrAI/issues">问题追踪</a> •
|
||||
<a href="https://github.com/ViperEkura/AstrAI/discussions">讨论区</a> •
|
||||
<a href="https://huggingface.co/ViperEk/AstrAI">HuggingFace</a>
|
||||
<a href="https://huggingface.co/ViperEk">HuggingFace</a>
|
||||
</div>
|
||||
|
||||
<br>
|
||||
|
||||
## 📖 目录
|
||||
|
|
@ -131,7 +129,7 @@ python scripts/demo/generate_ar.py
|
|||
|
||||
- **GitHub Issues**: [问题追踪](https://github.com/ViperEkura/AstrAI/issues)
|
||||
- **Discussions**: [GitHub 讨论区](https://github.com/ViperEkura/AstrAI/discussions)
|
||||
- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk/AstrAI)
|
||||
- **HuggingFace**: [模型中心](https://huggingface.co/ViperEk)
|
||||
|
||||
### 许可证
|
||||
|
||||
|
|
|
|||
|
|
@ -1,12 +1,54 @@
|
|||
## Model Introduction
|
||||
|
||||
|
||||
|
||||
### 1. Model Architecture
|
||||
|
||||
This model uses the Transformer architecture with GQA mechanism (q_head=24, kv_head=4), which saves KV cache memory compared to traditional MHA (although KV cache is not currently implemented). The model is built by stacking 24 layers of Transformer blocks, with 1.0 billion parameters. Transformer is an autoregressive model that calculates the relationship between all previous tokens to obtain the probability distribution of the next token.
|
||||
|
||||

|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Layers["Transformer Layers"]
|
||||
direction TB
|
||||
A[Input Embedding] --> B[Transformer Block\nLayer 1]
|
||||
B --> C[Transformer Block\nLayer ...]
|
||||
C --> D[Transformer Block\nLayer 32]
|
||||
D --> E[RMSNorm]
|
||||
E --> F[Linear]
|
||||
F --> G[SoftMax]
|
||||
end
|
||||
|
||||
subgraph TransformerBlock["Transformer Block"]
|
||||
direction TB
|
||||
H[x] --> I[RMSNorm]
|
||||
I --> J[Linear → Q/K/V]
|
||||
J --> K[Q]
|
||||
J --> L[K]
|
||||
J --> M[V]
|
||||
K --> N[RoPE]
|
||||
L --> O[RoPE]
|
||||
N --> P["Q @ K^T / sqrt(d)"]
|
||||
O --> P
|
||||
P --> Q[Masked SoftMax]
|
||||
Q --> R[S @ V]
|
||||
M --> R
|
||||
R --> S[Linear]
|
||||
S --> T[+]
|
||||
H --> T
|
||||
T --> U[RMSNorm]
|
||||
U --> V[Linear]
|
||||
V --> W[SiLU]
|
||||
V --> X[×]
|
||||
W --> X
|
||||
X --> Y[Linear]
|
||||
Y --> Z[+]
|
||||
T --> Z
|
||||
Z --> AA[x']
|
||||
end
|
||||
|
||||
classDef main fill:#e6f3ff,stroke:#0066cc;
|
||||
classDef block fill:#fff2e6,stroke:#cc6600;
|
||||
class Layers main;
|
||||
class TransformerBlock block;
|
||||
```
|
||||
|
||||
What is an autoregressive model? After splitting a sentence into tokens, the model predicts the probability distribution of the next token. This means the model calculates the probability of the next possible token and its corresponding probability based on the given context (the sequence of tokens that have already appeared).
|
||||
|
||||
|
|
|
|||
Binary file not shown.
|
After Width: | Height: | Size: 281 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 228 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 590 KiB |
Loading…
Reference in New Issue