Dagmawi Babi

Well this's very impressive!

Cerebras Systems Inference is capable of serving LLAMA 3.1 70B at 450 tokens/sec and LLAMA 3.1 8B at 1,850 tokens/sec. I don't even know how this's possible tbh.

Try and see how fast it is
• inference.cerebras.ai

#CerebrasSystems #LLAMA #LLM #AIML
@Dagmawi_Babi

998 viewsedited 16:50