Last week, Groq introduced the capability to make lightning-fast queries and perform other tasks with leading large language models (LLMs) directly on its website. This feature allows users to type or speak their queries and receives replies at a speed of around 1256.54 tokens per second, which is significantly faster than GPU chips from companies like Nvidia.
Groq can now process queries at around 1256.54 tokens per second, a speed that appears almost instantaneous. This is an improvement from the already impressive 800 tokens per second Groq demonstrated in April.
Groq's site engine uses Meta's open-source Llama3-8b-8192 as its default Large Language Model (LLM). This model is known for its fast and efficient performance in handling language queries and tasks. In addition to the default model, Groq also supports other models such as Llama3-70b, some Gemma (Google), and Mistral models, with plans to support more models in the future1.