What is an LLM?
A Large Language Model (LLM) is a type of neural network trained on large amounts of text to predict the most likely next token in a sequence. Despite the apparent simplicity of this training objective, models trained at sufficient scale on diverse data develop the ability to reason, write, code, translate, summarize, answer questions, and follow complex instructions — without being explicitly programmed for any of these tasks.
The “large” in LLM refers to two things: the size of the training dataset (typically hundreds of billions to trillions of words from the internet, books, code repositories, and other sources) and the number of parameters in the model (from billions to hundreds of billions of numerical weights that encode the model’s learned representations). Scale, it turns out, is not just quantitatively different — at certain thresholds, new capabilities emerge that weren’t present in smaller models.
Major LLMs include OpenAI’s GPT-4 series, Anthropic’s Claude series, Google’s Gemini, and Meta’s Llama. These models are typically accessed via API and serve as the foundation for thousands of downstream applications — from customer support chatbots to coding assistants to document analysis tools.
How LLMs Work
At the core of every modern LLM is the transformer architecture, introduced in the 2017 paper “Attention Is All You Need.” The key innovation was the attention mechanism — a way for each part of the input sequence to dynamically weight how much it should attend to every other part when predicting the next token. This allows the model to understand long-range dependencies in text far more effectively than previous architectures.
Training has two main phases:
- Pre-training: The model is trained on massive text corpora using next-token prediction as the objective. This is computationally expensive — frontier models cost tens to hundreds of millions of dollars to pre-train — but it produces a general-purpose model with broad knowledge and language capabilities.
- Fine-tuning and alignment: The pre-trained model is further trained using human feedback (RLHF — Reinforcement Learning from Human Feedback) and other techniques to make it more helpful, honest, and safe. This is what transforms a raw language predictor into a useful assistant.
When you send a message to an LLM, it processes your input as a sequence of tokens, computes attention across all of them, and generates the output one token at a time — each predicted token becoming part of the context for predicting the next one.
What LLMs Are Good At
The practical capabilities of LLMs that are reliably useful in business contexts:
- Text transformation: Summarizing, rewriting, translating, formatting, and extracting structured information from unstructured text. These tasks have high accuracy and high practical value.
- Code generation and review: Writing, explaining, debugging, and refactoring code across dozens of programming languages. Modern LLMs are effective coding collaborators for experienced engineers.
- Reasoning over provided documents: When given specific source material, LLMs can synthesize, compare, and extract insights reliably — especially with well-designed prompts.
- Following complex instructions: Multi-step, conditional instructions with many constraints are handled well by frontier models, which is what makes them useful as the reasoning core of agentic systems.
- Classification and routing: LLMs can reliably categorize inputs — support ticket triage, sentiment analysis, intent detection — at high throughput and lower cost than custom classifiers for many use cases.
What LLMs Get Wrong
Building useful AI products requires an unsentimental understanding of LLM failure modes. The most consequential ones:
- Hallucination: LLMs sometimes generate plausible-sounding but factually incorrect content — citations that don’t exist, statistics that are fabricated, quotes that were never said. This isn’t a bug to be patched; it’s a structural property of the next-token prediction approach. Any production use case where factual accuracy matters needs grounding mechanisms (retrieval, tool use, fact-checking) or human review.
- Knowledge cutoff: LLMs know nothing about events after their training data cutoff. They will answer questions about post-cutoff events with false confidence unless explicitly told they can’t know.
- Arithmetic and precise computation: LLMs are not calculators. They often get arithmetic wrong, especially with multi-step calculations. For numerical work, use code execution tools rather than asking the model to calculate.
- Sycophancy: Models are trained on human feedback, which means they can be optimized to say what users want to hear rather than what’s accurate. This manifests as agreeing with incorrect premises or changing answers when pushed back on, even when the original answer was right.
The practical implication: design AI systems with an assumption of LLM fallibility. Build in verification layers, ground outputs in source documents, and don’t use LLMs for tasks that require guaranteed precision without a human review step.
Related Terms and Concepts
Automation, Workflow Automation, SaaS, Disruption, Disruptive Technology, Product Development