RWKV-4-World is the best model: generation & chat & code in 100+ world languages, with the best English zero-shot & in-context learning ability too. Raven 14B (finetuned on Alpaca+ShareGPT+.) Demo: So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state). You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode. You only need the hidden state at position t to compute the state at position t+1. RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). ![]() ![]() ![]() The RWKV Language Model (and my LM tricks) RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |