This website requires JavaScript.
Explore
Help
Register
Sign In
ViperEkura
/
AstrAI
Watch
1
Star
0
Fork
You've already forked AstrAI
0
Code
Issues
Pull Requests
Packages
Projects
Releases
Wiki
Activity
cfa3cf7daa
AstrAI
/
khaosz
/
data
History
ViperEkura
7623b1e5fd
feat(khaosz/data/tokenizer): 优化BPE分词器的预处理和训练配置
2025-12-22 20:02:10 +08:00
..
__init__.py
feat(data): 重构数据集加载逻辑,修复计数错误
2025-11-28 20:59:24 +08:00
dataset.py
refactor(data): 将内存映射文件加载逻辑移至独立的 MmapFileHander 类
2025-12-15 09:12:42 +08:00
mmap.py
fix(mmap): 修复样本数与键值计算逻辑并增强错误处理
2025-12-15 09:27:29 +08:00
sampler.py
fix(data): 修改 Sampler 的长度计算方式, 避免提前初始化
2025-12-10 18:57:53 +08:00
tokenizer.py
feat(khaosz/data/tokenizer): 优化BPE分词器的预处理和训练配置
2025-12-22 20:02:10 +08:00