Well this's very impressive!
Cerebras Systems Inference is capable of serving LLAMA 3.1 70B at 450 tokens/sec and LLAMA 3.1 8B at 1,850 tokens/sec. I don't even know how this's possible tbh.
Try and see how fast it is
• inference.cerebras.ai
#CerebrasSystems #LLAMA #LLM #AIML
@Dagmawi_Babi
Cerebras Systems Inference is capable of serving LLAMA 3.1 70B at 450 tokens/sec and LLAMA 3.1 8B at 1,850 tokens/sec. I don't even know how this's possible tbh.
Try and see how fast it is
• inference.cerebras.ai
#CerebrasSystems #LLAMA #LLM #AIML
@Dagmawi_Babi