Apr 11, 2024
The size of LLMs
Since the release of ChatGPT in November 2022, major tech companies have quickly mobilized on their own competing models. Over their successive generations, each model tries to upgrade it's intelligence with each release. Through larger and larger parameter counts, the models have gotten stronger, more creative and more useful (or "intelligent") with it's given training data. Here is a look at the sizes of the main Large Language Models as of March 2024.
Dataset
Model | Lab | Parameters |
---|---|---|
(B),Announced | ||
▼ | Paper / Repo | Architecture |
Llama 3 | Meta AI | |
Meta GPT | Meta AI | 2000 |
Ajax GPT | Apple | 200 |
GPT-5 | OpenAI | 2000 |
GPT-6 | OpenAI | |
Olympus | Amazon | 2000 |
AuroraGPT (ScienceGPT) | ANL | 1000 |
Grok-2 | xAI | |
mixtral-8x22b | Mistral AI | 176 |
Sailor | Sail | 7 |
JetMoE-8B | MIT | 8 |
Eurus | Tsinghua | 70 |
Command-R+ | Cohere | 104 |
Viking | Silo AI | 33 |
OLMo-Bitnet-1B | Nous Research | 1 |
Aurora-M | International | 15.5 |
ReALM-3B | Apple | 3 |
Qwen1.5-MoE-A2.7B | Alibaba | 14.3 |
Grok-1.5 | xAI | 314 |
Jamba | AI21 | 52 |
DBRX | MosaicML | 132 |
Stable Code Instruct 3B | Stability AI | 2.7 |
EvoLLM-JP | Sakana AI | 10 |
RakutenAI-7B | Rakuten Group | 7 |
Parakeet | Independent | 0.378 |
RWKV-v5 EagleX | RWKV | 7.52 |
MM1 | Apple | 30 |
RFM-1 | Covariant | 8 |
Command-R | Cohere | 35 |
DeepSeek-VL | DeepSeek-AI | 7 |
AnyGPT | Fudan University | 7 |
Stable Beluga 2.5 | Stability AI | 70 |
Inflection-2.5 | Inflection AI | 1200 |
Apollo | SRIBD/CUHK | 7 |
Claude 3 Opus | Anthropic | 2000 |
Hawk | Google DeepMind | 7 |
Griffin | Google DeepMind | 14 |
BitNet b1.58 | Microsoft | 70 |
Samba-1 | SambaNova | 1400 |
Cosmo-1B | HF | 1.8 |
Poro | Silo AI | 34.2 |
StarCoder 2 | HF/ServiceNow | 15 |
530B | ByteDance | 530 |
175B | ByteDance | 175 |
Mistral Small | Mistral AI | 7 |
Mistral Large | Mistral AI | 540 |
Hanooman | Reliance | 40 |
Ask | Apple | 20 |
Reka Edge | Reka AI | 7 |
Reka Flash | Reka AI | 21 |
Gemma | Google DeepMind | 7 |
Gemini 1.5 Pro | Google DeepMind | 1500 |
Qwen-1.5 | Alibaba | 72 |
GOODY-2 | BRAIN | |
Natural-SQL-7B | ChatDB | 7 |
Sea-Lion | AI Singapore | 7.5 |
TimesFM | 0.2 | |
OLMo | Allen AI | 7 |
FLOR-6.3B | Cerebras | 6.3 |
Weaver | AIWaves.cn | 34 |
miqu 70b | Mistral AI | 70 |
iFlytekSpark-13B | iFlyTek | 13 |
Xinghuo 3.5 (Spark) | iFlyTek | 200 |
MGIE | Apple | 7 |
CodeLlama-70B | Meta AI | 70 |
RWKV-v5 Eagle 7B | RWKV | 7.52 |
MaLA-500 | LMU | 10 |
MambaByte | Cornell | 0.972 |
DeepSeek-Coder | DeepSeek-AI | 33 |
FuseLLM | Tencent | 7 |
Fuyu-Heavy | Adept | 120 |
GLM-4 | Zhipu AI (Tsinghua) | 200 |
DeepSeekMoE | DeepSeek-AI | 16 |
DeepSeek | DeepSeek-AI | 67 |
LLaMA Pro | Tencent | 8.3 |
TinyLlama | SUTD/Independent | 1.1 |
DocLLM | JPMorgan | 7 |
Unified-IO 2 | Allen AI | 7 |
WaveCoder-DS-6.7B | Microsoft | 6.7 |
YunShan | Huawei | 7 |
PanGu-Pi | Huawei | 7 |
YAYI 2 | Wenge | 30 |
Emu2 | BAAI | 37 |
MedLM | Google DeepMind | |
SOLAR-10.7B | Upstage AI | 10.7 |
DeciLM-7B | Deci | 7.04 |
Mistral-medium | Mistral AI | 180 |
mixtral-8x7b-32kseqlen | Mistral AI | 46.7 |
StripedHyena 7B | Together | 7.65 |
NexusRaven-V2 13B | Nexusflow.ai | |
Gemini Ultra 1.0 | Google DeepMind | 1500 |
Mamba | CMU | 2.8 |
LVM-3B | Berkeley/JHU | 3 |
SeaLLM-13b | Alibaba | 13 |
pplx-70b-online | Perplexity | 70 |
SeamlessM4T-Large v2 | Meta AI | 2.3 |
Q-Transformer | Google DeepMind | |
Yuan 2.0 | IEIT | 102.6 |
MEDITRON | EPFL | 70 |
Transformers-Arithmetic | Microsoft | 0.1 |
Starling-7B | Berkeley | 7 |
Inflection-2 | Inflection AI | 1200 |
Claude 2.1 | Anthropic | 130 |
TÜLU 2 | Allen AI | 70 |
Orca 2 | Microsoft | 13 |
Phi-2 | Microsoft | 2.7 |
Florence-2 | Microsoft | 0.771 |
Mirasol3B | Google DeepMind | 3 |
OtterHD-8B | NTU | 8 |
Gauss | Samsung | 7? |
Grok-1 | xAI | 314 |
Grok-0 | xAI | 33 |
Yi-34B | 01-ai | 34.4 |
GPT-4 Turbo | OpenAI | |
Kimi Chat | Moonshot AI | 100 |
jina-embeddings-v2 | Jina AI | 0.435 |
Fuyu | Adept | 8 |
ERNIE 4.0 | Baidu | 1000 |
Zephyr | Hugging Face H4 | 7.3 |
PaLI-3 | Google DeepMind | 5 |
Retro 48B | NVIDIA | 48 |
Ferret | Apple | 13 |
Lemur | XLANG Lab | 70 |
AceGPT | KAUST/Shenzhen | 13 |
Yasa-1 | Reka AI | |
RT-X | Google DeepMind | 55 |
MotionLM | Waymo | 0.09 |
GAIA-1 | Wayve | 9 |
Qwen | Alibaba | 72 |
Llama 2 Long | Meta AI | 70 |
LeoLM | Hessian AI/LAION | 13 |
Mistral 7B | Mistral AI | 7.3 |
Kosmos-2.5 | Microsoft | 1.3 |
Baichuan 2 | Baichuan | 13 |
BOLT2.5B | ThirdAI | 2.5 |
DeciLM | Deci | 5.7 |
MoLM | IBM | 8 |
NExT-GPT | Singapore | 7 |
Phi-1.5 | Microsoft | 1.3 |
UniLM | Apple | 0.034 |
Persimmon-8B | Adept | 8 |
FLM-101B | BAAI | 101 |
Falcon 180B | TII | 180 |
Hunyuan | Tencent | 100 |
phi-CTNL | Independent | 0.1 |
Jais | Inception | 13 |
Code Llama 34B | Meta AI | 34 |
IDEFICS | Hugging Face | 80 |
Raven | UI/NVIDIA | 11 |
DukunLM | AzaleAI | 13 |
WizardLM | Microsoft | 70 |
Platypus | Boston University | 70 |
Japanese StableLM Alpha 7B | Stability AI | 7 |
Stable Code 3B | Stability AI | 2.7 |
Med-Flamingo | Stanford | 8.3 |
Alfred-40B-0723 | LightOn | 40 |
LLaMA-2-7B-32K | Together | 7 |
Med-PaLM M | Google DeepMind | 540 |
BTLM-3B-8K | Cerebras | 3 |
Stable Beluga 2 | Stability AI | 70 |
Stable Beluga 1 | Stability AI | 65 |
Meta-Transformer | Shanghai AI Laboratory/CUHK | 2 |
Llama 2 | Meta AI | 70 |
WormGPT | (Undisclosed) | 6 |
Claude 2 | Anthropic | 130 |
LongLLaMA | IDEAS/DeepMind | 7 |
xTrimoPGLM | Tsinghua | 100 |
XGen | Salesforce | 7 |
Zhinao (Intellectual Brain) | 360 | 100 |
Yasa | Reka AI | |
Kosmos-2 | Microsoft | 1.6 |
AudioPaLM | 340 | |
Inflection-1 | Inflection AI | 120 |
Phi-1 | Microsoft | 1.3 |
InternLM | Shanghai AI Laboratory/SenseTime | 104 |
BlenderBot 3x | Meta AI | 175 |
Orca | Microsoft | 13 |
PassGPT | ETH Zürich | |
DIDACT | Google DeepMind | |
LTM-1 | Magic | |
GPT-4 MathMix | OpenAI | 1800 |
PandaGPT | Cambridge/Tencent | 13 |
Falcon | TII | 40 |
202305-refact2b-mqa-lion | Refact | 1.6 |
Guanaco | UW | 65 |
LIMA | Meta AI | 65 |
Formosa (FFM) | Asus/TWS | 176 |
CodeT5+ | Salesforce | 16 |
PaLM 2 | 340 | |
StarCoder | HF/ServiceNow | 15.5 |
MPT | MosaicML | 7 |
Pi | Inflection AI | 60? |
GPT-2B-001 | NVIDIA | 2 |
Titan | Amazon | 200 |
WizardLM | Microsoft | 7 |
MPT | MosaicML | 1.3 |
StableLM | Stability AI | 65 |
Dolly 2.0 | Databricks | 12 |
Pythia | EleutherAI | 12 |
Koala-13B | Berkeley | 13 |
C1.2 | Character.ai | 33 |
BloombergGPT | Bloomberg | 50 |
OpenFlamingo-9B | LAION | 8.3 |
GPT4All-LoRa | Nomic | 7 |
Cerebras-GPT | Cerebras | 13 |
PanGu-Sigma | Huawei | 1085 |
CoLT5 | 5.2 | |
Med-PaLM 2 | Google DeepMind | |
GPT-4 | OpenAI | 1760 |
Alpaca | Stanford | 7 |
Jurassic-2 | AI21 | 178 |
GPT-NeoX-Chat-Base-20B | Together | 20 |
Kosmos-1 | Microsoft | 1.6 |
LLaMA-65B | Meta AI | 65 |
MOSS | Fudan University | 16 |
Palmyra | Writer | 20 |
Luminous Supreme Control | Aleph Alpha | 70 |
Toolformer+Atlas 11B+NLLB 54B | Meta AI | 6.7 |
Multimodal-CoT | Amazon | 0.738 |
FLAME | Microsoft | 0.06 |
Med-PaLM 1 | Google DeepMind | 540 |
OPT-IML | Meta AI | 175 |
RL-CAI | Anthropic | 52 |
ERNIE-Code | Baidu | 0.56 |
RT-1 | 0.035 | |
ChatGPT (gpt-3.5-turbo) | OpenAI | 20 |
text-davinci-003 | OpenAI | |
GPT-JT | Together | 6 |
RWKV-4 | RWKV | 14 |
Galactica | Meta AI | 120 |
SED | DeepMind | |
mT0 | BigScience | 13 |
BLOOMZ | BigScience | 176 |
PACT | Microsoft | |
Flan-T5 | 11 | |
Flan-PaLM | 540 | |
U-PaLM | 540 | |
VIMA | NVIDIA | 0.2 |
OpenChat | Tsinghua | 13 |
WeLM | 10 | |
CodeGeeX | Tsinghua | 13 |
Sparrow | DeepMind | 70 |
PaLI | 17 | |
NeMo Megatron-GPT 20B | NVIDIA | 20 |
Z-Code++ | Microsoft | 0.71 |
Atlas | Meta AI | 11 |
BlenderBot 3 | Meta AI | 175 |
GLM-130B | Tsinghua | 130 |
AlexaTM 20B | Amazon Alexa AI | 20 |
6.9B FIM | OpenAI | 6.9 |
‘monorepo-Transformer’ | 0.5 | |
PanGu-Coder | Huawei | 2.6 |
NLLB | Meta AI | 54.5 |
J-1 RBG | AI21 | 178 |
BLOOM (tr11-176B-ml) | BigScience | 176 |
Minerva | 540 | |
GODEL-XL | Microsoft | 2.7 |
YaLM 100B | Yandex | 100 |
Unified-IO | Allen AI | 2.8 |
Perceiver AR | DeepMind | 1 |
LIMoE | 5.6 | |
GPT-4chan | Independent | 6 |
Diffusion-LM | Stanford | 0.3 |
UL2 20B | 20 | |
Gato (Cat) | DeepMind | 1 |
LaMDA 2 | 137 | |
OPT-175B | Meta AI | 175 |
Tk-Instruct | Hugging Face | 11 |
InCoder | Meta AI | 6.7 |
NOOR | TII | 10 |
mGPT | Sber | 13 |
PaLM-Coder | 540 | |
PaLM | 540 | |
SeeKeR | Meta AI | 2.7 |
CodeGen | Salesforce | 16 |
VLM-4 | LightOn | 10 |
CM3 | Meta AI | 13 |
Luminous | Aleph Alpha | 200 |
Chinchilla | DeepMind | 70 |
GPT-NeoX-20B | EleutherAI | 20 |
ERNIE 3.0 Titan | Baidu | 260 |
XGLM | Meta AI | 7.5 |
Fairseq | Meta AI | 13 & 1100 |
Gopher | DeepMind | 280 |
GLaM | 1200 | |
Anthropic-LM 52B | Anthropic | 52 |
RETRO | DeepMind | 7.5 |
BERT-480 | 480 | |
BERT-200 | 200 | |
Cedille FR-Boris | Coteries | 6 |
MT-NLG | Microsoft/NVIDIA | 530 |
FLAN | 137 | |
Command xlarge | Cohere | 52.4 |
PLATO-XL | Baidu | 11 |
Macaw | Allen AI | 11 |
CodeT5 | Salesforce | 0.7 |
Codex | OpenAI | 12 |
Jurassic-1 | AI21 | 178 |
BlenderBot 2.0 | Meta AI | 9.4 |
GPT-J | EleutherAI | 6 |
LaMDA | 137 | |
ruGPT-3 | Huawei/Sberbank | 1.3 |
Switch | 1600 | |
GPT-3 | OpenAI | 175 |
Megatron-11B | Meta AI | 11 |
Meena | 2.6 | |
T5 | 11 | |
RoBERTa | Meta AI | 0.355 |
GPT-2 | OpenAI | 1.5 |
BERT | 0.3 | |
GPT-1 | OpenAI | 0.117 |
ULMFiT | Fast.ai | 0.1 |
304 | ||
Key | ||
👥 Dialogue | ||
🆀🅰 Questions and answers | ||
🌋 Special | ||
⚛️ Synthetic data | ||
Viz | https | //lifearchitect.ai/models/ |
Get a comma separated list of columns A-D from this table using GPT-4 via Poe | ||
Superseded/outdated and alternative sheets |
Data sources
Dr. Alan D Thompson of https://lifearchitect.ai/
102
0
15.0K