相关论文
base: 基础普适研究
- hourglass transformers: 2021. Hierarchical Transformers Are More Efficient Language Models | lucidrains/simple-hierarchical-transformer vanilla layers and shortened layers use GPT AR GLM 🤞
- 2023.5 MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers (分别在文本,图片,语音建模)| lucidrains/MEGABYTE-pytorch
PS: 想法和自己整理的simple LM相似~
扩展阅读:2024.12 Byte Latent Transformer: Patches Scale Better Than Tokens | paper code
audio speech: 场景研究
-
⭐️2023.10 UniAudio: An Audio Foundation Model Toward Universal Audio Generation (灵感来自 MEGABYTE,将其应用于语音模型)| paper code (代码可扩展任务进行训练, 已扩展了音乐数据)
-
⭐️2024. Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis (论文中介绍的Daul-AR, GFSQ, Firefly-GAN(FF-GAN) 对EVA-GAN改版,细节需要结合代码理解) | paper code
附:
- achatbot 接入fishspeech tts colab 笔记: https://github.com/weedge/doraemon-nb/blob/main/achatbot_fishspeech_tts.ipynb
- 操作笔记: https://github.com/weedge/doraemon-nb/blob/main/fish_speech_tts.ipynb