搜索
Back to Posts

Getting Started with LLMOps: Turning LLMs into Real Productivity

22 min read4Max ZhangLLMOps
AI AgentDifyRAGPrompt Engineering

Two years ago, my company decided to integrate AI capabilities into our product line. A few backend engineers spent two months getting the GPT-4 API working and getting conversations running. And then? It crashed on day one — not because the model was bad, but because nobody knew how to modify prompts, control outputs, read logs, or switch models.

This is probably the typical situation most teams encounter when they first try to "put AI into production": the model is powerful, but the engineering capability to support it is lacking. This is exactly what LLMOps solves.


1. First, Understand What an LLM Is

Before talking about LLMOps, you need to understand what we're actually managing. What exactly is a large language model, and how does it work? Understanding these basics will help you avoid many detours in real applications.

1.1 Token: LLM's "Word Count"

When dealing with LLMs, the first thing to understand is token. You can think of a token as the "word count" in the LLM world — the model doesn't process text character by character, it processes by token.

Chinese example:
"你好世界" → about 4-6 tokens (depending on implementation)
"hello world" → about 2 tokens

English splits by word, Chinese splits by character/phrase

Tokens matter because billing is based on token usage. Inputting "Who are you?" costs money, and outputting "I am an AI assistant" also costs money. Drop in a few hundred-page PDF, and the token costs can be much higher than you'd expect.

Take OpenAI's GPT-4 as an example:

Input: ~$0.01 - $0.03 per 1K tokens
Output: ~$0.03 - $0.06 per 1K tokens

1K tokens ≈ 750 English words, or 400-500 Chinese characters

A medium-length Chinese paragraph (500 characters) consumes about 300-500 tokens

So in real projects, controlling token consumption is key to cost optimization. Common approaches include:

  • Summarizing input documents, keeping only key information
  • Setting max_tokens to limit output length
  • Using caching to avoid recomputing identical content

1.2 From Text to Numbers: Embedding

What an LLM actually does is predict the next token. It takes a bunch of text, converts it into numbers (vectors), then calculates a probability distribution based on those numbers and selects the most likely next token.

This conversion process is called Embedding — embedding text into a high-dimensional vector space. Semantically similar words are also close in distance in vector space.

"Cat" and "Dog" → close (both pets, both have four legs)
"Cat" and "Car" → far apart (completely different semantics)

But sometimes there are "surprises":
"Cat" (internet slang for programmer) and "996" → might be close

Embedding is the foundation of RAG (Retrieval-Augmented Generation). Retrieval is essentially finding the "closest" content in vector space.

Vector databases are tools specifically designed to store these embeddings. Common vector databases include:

  • Pinecone: Cloud service, hassle-free but paid
  • Milvus: Open source, can be deployed locally
  • Chroma: Lightweight, great for personal projects
  • Weaviate: Feature-rich, supports hybrid retrieval

1.3 The Model's "Thinking": Reasoning and Agents

The capability of a basic LLM is continuing text — you give it a passage, it continues it. But when you ask it "Help me book a flight to Shanghai tomorrow," it can only respond "Okay, let me help you book that," but cannot actually book the ticket for you.

This is where Agent comes in. An Agent adds to the LLM:

  • Tool calling capability: Can execute code, query databases, call APIs
  • Task planning capability: Can break complex tasks into subtasks
  • Memory capability: Short-term (context window), long-term (vector database)

How an Agent works roughly:

Traditional LLM:
User → Model → Text Response

Agent:
User → Model (Thinking: What does the user want? What should I do?)
         ↓
      Planning (Break task into steps)
         ↓
      Tool Calling (Search, query database, execute code...)
         ↓
      Observation (What did the tool return?)
         ↓
      Re-planning (Adjust next step based on results)
         ↓
      ... Loop until complete
         ↓
      Final Response

Agents matter because they solve the LLM's "ability to take action" problem. LLMs know a lot, but they can't directly manipulate the external world. Agents let LLMs truly "get moving" through tool calling.


2. Why Software Development Needs LLMOps

2.1 What Happens Without LLMOps

Without LLMOps, LLM application development probably looks like this:

1. Backend engineer studies API documentation
2. Writes a bunch of prompts embedded in code
3. Calls GPT-4 / Claude for functionality
4. Goes live and discovers unstable outputs
5. Manually modifies prompts, goes live again
6. Logs? None. Monitoring? None.
7. Want to switch models? Refactor everything
8. Users grow, don't know how to scale
9. Model provider raises prices, helpless

This isn't building a product — it's gambling — gambling that model outputs stay stable, that prompts don't need changes, that users won't ask tricky questions.

I've seen too many teams where engineers are on edge every day after the first AI feature goes live:

  • Don't know what questions users asked
  • Don't know what the model responded with
  • Don't know why it suddenly went off-topic
  • Can only guess when accidents happen

2.2 What LLMOps Solves

LLMOps (Large Language Model Operations) is a lifecycle management platform for LLM-based applications, covering development, deployment, monitoring, maintenance, and more.

Its core value is letting the platform handle the complex stuff, so users get the simple stuff:

StepWithout LLMOpsWith LLMOpsTime Saved
Frontend App DevelopmentIntegrate and wrap LLM capabilities, significant dev timeUse LLMOps backend directly, API/WebApp-based dev-80%
Prompt EngineeringDebug via API or Playground onlyVisual prompt editor, WYSIWYG-25%
Data Prep & EmbeddingWrite code for long text processing, EmbeddingUpload text/files on platform-80%
App Logs & AnalyticsWrite code to view logs, access databasePlatform provides real-time logs & analytics-70%
AI Plugin DevelopmentWrite code to create/integrate AI pluginsPlatform provides visual tools for quick integration-50%
AI Workflow DevelopmentWrite code for each workflow stepVisual workflow orchestration-80%
Model SwitchingChange code, refactor API callsOne-click model switch on platform-90%
Performance MonitoringRoll your own monitoring, not professionalPlatform自带各项指标监控-60%

In other words, LLMOps encapsulates all the engineering grunt work so developers can focus on business logic.

2.3 Core Features of an LLMOps Platform

A complete LLMOps platform typically includes these core features:

Application Management

  • Create, edit, delete AI applications
  • App configuration (model selection, parameter tuning)
  • Multi-app management and comparison

Prompt Engineering

  • Visual prompt editor
  • Prompt version control
  • Prompt template marketplace
  • A/B testing capability

Knowledge Base Management

  • Document upload and parsing
  • Multiple chunking strategies
  • Embedding configuration
  • Knowledge base versioning

Data & Analytics

  • Conversation log recording
  • Token consumption statistics
  • Answer quality evaluation
  • User feedback collection

Model Integration

  • Multi-model support (OpenAI, Claude, local models...)
  • Model performance comparison
  • Cost analysis

3. Core Concepts Overview

3.1 Prompt Engineering

A Prompt is how we communicate with LLMs, but it's not that simple. Good prompts and bad prompts can produce vastly different output quality.

Bad Prompt:
"Translate"

Good Prompt:
"You are a professional Chinese-English translator. Translate the following Chinese paragraph into English, maintaining accuracy of professional terminology, with a formal but not stiff tone. For proper nouns, add the original text in parentheses after the translation."

Input: [User Text]

Good prompts need:

  • Role definition: What role should the model play ("You are a senior architect")
  • Task description: What to do ("Review this code for me")
  • Output format: What format of result ("Output in a table with issue location, severity, fix suggestion")
  • Constraints: Any limitations ("Do not exceed 500 words")

Several practical Prompt Engineering tips:

1. Few-shot prompting: Show the model examples

Don't just say "Classify this text"
Instead:
"Text: This phone's battery life is terrible, needs charging three times a day. Classification: Negative

Text: The screen display is amazing, great for watching movies. Classification: Positive

Text: The packaging box is quite elegant. Classification:"

After seeing examples, the model better understands what you want

2. Chain of Thought: Let the model think step by step

Don't ask: "What's wrong with this code?"

Instead ask:
"Please analyze this code following these steps:
1. First understand the code's functionality
2. Identify potential security risks
3. Evaluate performance impact
4. Propose improvements

Please answer step by step"

3. Structured Output: Make output more controllable

Don't ask: "Summarize this article"

Instead ask:
"Please summarize the article in JSON format, including:
{
  "title": "Article Title",
  "summary": "Summary no more than 100 characters",
  "keywords": ["Keyword 1", "Keyword 2", "Keyword 3"],
  "sentiment": "Positive/Negative/Neutral"
}"

3.2 RAG: Let the Model "Understand" Your Documents

RAG (Retrieval-Augmented Generation) is currently the most popular LLM application architecture.

Why do we need RAG? Because an LLM's knowledge is static — it doesn't know about your company's internal affairs or the latest product documentation content. You can't make the model "remember" all the knowledge in the world, but you can let it "look up references" when answering questions.

The RAG approach is simple:

1. Prepare your documents (PDF, web pages, database...)
2. Split documents into chunks
3. Embed each chunk, store in vector database
4. When user asks a question, embed the question too
5. Find the "most relevant" content in the vector database
6. Send relevant content + user question to LLM together
7. LLM generates answer based on this context

A classic RAG flow:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Document   │───▶│  Document   │───▶│  Embedding  │
│    Store    │    │   Splitting │    └──────┬──────┘
└─────────────┘    └─────────────┘           │
                                              ▼
                         ┌──────────────────────────┐
                         │    Vector Database        │
                         │ (Stores document vectors) │
                         └──────────────────────────┘
                                              ▲
┌─────────────┐    ┌─────────────┐           │
│  User Query │───▶│ Query Embed │───────────┘
└─────────────┘    └─────────────┘           │
                                              ▼
                         ┌──────────────────────────┐
                         │  Retrieve Most Relevant  │
                         └──────────────────────────┘
                                              │
                                              ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Final Answer│◀───│  LLM Generate│◀───│  Context    │
└─────────────┘    └─────────────┘    └─────────────┘

Common RAG application scenarios:

  • Enterprise knowledge base Q&A: Employees ask about company policies, processes, bot finds answers from internal documents
  • Customer service assistance: Users ask about products, find relevant info from product manuals
  • Technical documentation Q&A: Developers ask about API usage, retrieve from technical docs

3.3 Fine-tuning and LoRA

Prompt Engineering and RAG are both about using models, but sometimes you need to train the model — this is Fine-tuning.

Ways to use models:
- Prompt Engineering: Adjust input, don't touch the model
- RAG: Give the model "cheat sheets", don't touch the model

Ways to train models:
- Fine-tuning: Train the model with specific data, let it "learn" new knowledge
- LoRA: Low-Rank Adaptation, more efficient fine-tuning method

When to use fine-tuning instead of RAG?

ScenarioRecommended Approach
Need model to learn a speaking styleFine-tuning
Need model to "remember" large amounts of knowledgeRAG
Need model to perform specific task formatsFine-tuning
Knowledge updates frequentlyRAG
Training data is easy to obtainFine-tuning
Need real-time knowledgeRAG

The core idea of LoRA (Low-Rank Adaptation) is: don't modify the model's main backbone parameters, but additionally train a small set of "adapter" parameters. This greatly reduces the cost and time of fine-tuning.

To use an analogy: the model is a building's framework, LoRA is hanging some hooks on the framework — hang clothes on the hooks and you can change outfits. No need to rebuild the entire building, just change the hooks.

3.4 AI Agent Interaction Patterns

Traditional software interaction pattern: Human → Software → Data. Each software has different interfaces and operation methods, users need to learn.

AI Agent era: Human → Agent → Data. The interaction interface is unified, users just need to talk to the Agent.

Traditional mode:
User → Excel (handle spreadsheets)
User → Photoshop (handle images)
User → Email client (handle emails)
User → Data analysis tool (make charts)
User → PPT tool (make presentations)
... Need to learn each software

Agent mode:
User → Agent → "Analyze this month's sales data, make charts, and generate a PPT"
User → Agent → "Remove the background from this photo, then adjust the color"
User → Agent → "Send an email to client A explaining the order delay, tone should be sincere"

The Agent receives natural language, breaks down tasks, calls appropriate tools, and returns results. The underlying complexity is hidden; what users perceive is just an intelligent assistant that understands human language and helps them get things done.


4. Build Your First AI Application with Dify

After all that theory, let's get practical. Dify is one of the most popular open-source LLMOps platforms today, with the philosophy of "making AI application development as simple as building with blocks."

Dify's characteristics:

  • Open source, can be deployed locally
  • Supports multiple models (OpenAI, Claude, local open-source models...)
  • Visual workflow orchestration
  • Complete logging and analytics

4.1 Dify's Core Modules

Dify breaks AI applications into several core modules:

Application Module:
├── Prompt
│   └── Supports variables, templates, conditionals
├── Memory
│   ├── Short-term (conversation context window)
│   └── Long-term (vector database)
├── Tools
│   ├── Built-in tools (Google search, Wikipedia, calculator...)
│   └── Custom tools (call your own API)
├── Knowledge Base
│   └── Supports PDF, text, web pages, Notion...
├── Opening / Suggested Questions
│   └── Improves user interaction experience
└── Content Moderation
    └── Built-in sensitive word filtering

Workflow Module:
├── Start Node
├── End Node
├── LLM Node (call model)
├── Knowledge Retrieval Node
├── Code Executor Node (online Python/JS execution)
├── Conditional Branch Node
├── HTTP Request Node
├── Template Transformation Node
├── Variable Aggregation Node
└── ...

4.2 A Simple Chatbot

Steps to make a chatbot with Dify:

Step 1: Create Application

1. Select "Chat Assistant" type
2. Set app name: "Tech Support Assistant"
3. Set opening: "Hello, I'm tech support. What technical questions can I help you with?"
4. Set suggested questions: "How to reset password?", "What to do if API call errors out?"...

Step 2: Configure Prompt

You are a professional technical support engineer familiar with all aspects of our company's products.

When users ask questions:
1. First understand the user's question; if unclear, ask for more details
2. If the question involves product features, prioritize retrieving relevant info from the knowledge base
3. If you can't find the answer, honestly tell the user and suggest contacting human support
4. Keep answers concise, professional, and easy to understand; avoid excessive technical jargon
5. If the question involves code, provide complete runnable code examples

Remember: You represent the company's image; every answer should demonstrate professionalism.

Step 3: Connect Knowledge Base

1. Create knowledge base: "Tech Support Knowledge Base"
2. Upload documents:
   - Product user manual (PDF)
   - FAQ (Markdown)
   - API documentation (web scrape)
3. Configure chunking strategy:
   - Recommend 300-500 characters per chunk
   - Preserve paragraph context
4. Click "Embed," system processes automatically

Step 4: Publish

1. Click "Publish"
2. Choose publishing method:
   - WebApp: Generate a webpage, can share directly with users
   - API: Generate API address and key, let your system call it
3. Set access permissions (public/require login/whitelist)

That's it — just these steps, and you have a customer service bot backed by a product knowledge base. No code writing needed, no need to understand AI internals.

4.3 A More Advanced Workflow

Suppose you want to build an "Article Analysis Assistant": user drops in an article URL, the assistant automatically fetches content, summarizes key points, extracts keywords, and generates image suggestions.

Workflow Design:

Start
  │
  ▼
HTTP Request (fetch URL content)
  │
  ▼
LLM (extract article body, remove ads and irrelevant content)
  │
  ▼
┌──┬──┬──┐
▼  ▼  ▼  ▼
│  │  │  └─▶ LLM (generate image suggestions)
│  │  │      Output: suggested image style and description
│  │  │
│  │  └────▶ LLM (keyword extraction)
│  │          Output: 3-5 keywords
│  │
│  └────────▶ LLM (generate summary)
│              Output: summary within 100 characters
│
└────────────▶ Variable Aggregation (assemble final result)
                    │
                    ▼
┌─────────────────────────────────────┐
│  Final Output:                       │
│  - Summary                           │
│  - Keywords                          │
│  - Image suggestions                  │
└─────────────────────────────────────┘
                    │
                    ▼
                   End

Drag and drop to implement in Dify — no code required. Complex workflows are broken into simple nodes, each doing one thing, then combined into complete functionality.

4.4 Real Case: Customer Service Bot Results

I previously built an internal customer service bot for my team using Dify with pretty good results:

Scenario: Answering employee questions about company IT systems

Knowledge base content:
- IT help center documents (50+ articles)
- Conference room booking guide
- Printer usage guide
- VPN connection guide
- ...

Pre-launch concerns:
❌ Problem: Employee questions might not be in the knowledge base
✅ Solution: When "answer not found," guide to human support and log the question

❌ Problem: Answers might be inaccurate, misleading employees
✅ Solution: Enable "content moderation," sensitive answers need human confirmation

❌ Problem: Employees ask about real-time info (like today's server status)
✅ Solution: Connect to company Status Page API, Agent can query in real-time

Post-launch results:
- 70% of questions resolved self-service
- Average response time reduced from 30 minutes (human) to 1 minute
- Employee satisfaction improved
- IT colleagues have more time for complex issues

5. Common AI Application Architectures

Depending on the scenario, LLM application architectures vary. There's no best architecture, only the most suitable one.

5.1 Simple Conversation Type

Best for: Customer service chat, Q&A bots, casual conversation

User Input → [Prompt + Conversation History] → LLM → Response

Characteristics:

  • Simplest architecture, good for beginners
  • No external knowledge base involved
  • Depends on model's own capabilities
  • Fast response

Limitations:

  • Model doesn't know latest information
  • Answers may be inaccurate (hallucination problem)
  • Can't access user's private data

5.2 RAG Type

Best for: Document Q&A, knowledge base queries, enterprise intranet assistants

User Question
    │
    ▼
Embed User Question
    │
    ▼
Vector Retrieval ←── Document Split + Embedding
    │               (preprocessed vector database)
    ▼
Get Top-K Relevant Documents
    │
    ▼
Concatenate: User Question + Relevant Docs → LLM → Response

Characteristics:

  • Can leverage external knowledge
  • Answers traceable to sources
  • Supports real-time knowledge base updates
  • Relatively simple to deploy

Limitations:

  • Depends on retrieval quality
  • Context window length constraints
  • Document chunking strategy affects results

5.3 Agent Type

Best for: Complex tasks, automated workflows, multi-step operations

User Request
    │
    ▼
Agent (LLM + Tools + Planning)
    │
    ├──▶ Plan next step
    ├──▶ Select and call tools
    │     - Web search
    │     - Database query
    │     - API calls
    │     - Code execution
    ├──▶ Observe tool return results
    │
    └──▶ Decide whether to continue or end
              │
              ▼
           Loop until complete
              │
              ▼
           Final Response

Characteristics:

  • Can execute multi-step tasks
  • Can access external systems
  • Supports complex reasoning
  • Can handle open-ended tasks

Limitations:

  • Higher cost (multiple calls)
  • Uncertain execution time
  • Need to design fallback strategies

5.4 Multi-Agent Collaboration Type

Best for: Complex systems, cascading tasks, requiring multi-domain expert collaboration

                    User Request
                        │
                        ▼
               ┌─────────────────┐
               │  Orchestrator   │
               │ Agent (Understand│
               │ task, delegate)  │
               └────────┬────────┘
                        │
         ┌──────────────┼──────────────┐
         │              │              │
         ▼              ▼              ▼
   ┌──────────┐  ┌──────────┐  ┌──────────┐
   │ Data     │  │Analysis  │  │ Report   │
   │ Agent    │  │Agent     │  │Agent     │
   │(get data)│  │(analyze) │  │(generate)│
   └──────────┘  └──────────┘  └──────────┘
         │              │              │
         └──────────────┴──────────────┘
                        │
                        ▼
               ┌─────────────────┐
               │   Final Output  │
               └─────────────────┘

Characteristics:

  • Clear division of labor, each Agent specializes
  • Can execute independent tasks in parallel
  • Easy to scale and maintain
  • Suitable for complex business processes

6. Practical Pitfall Guide

6.1 Prompt-Related Pitfalls

Pitfall 1: Prompt written but doesn't work

Sometimes prompts work great in testing but are unstable in production because LLM outputs have randomness. Solutions:

① Add output format constraints
   Not: "Summarize this passage"
   But: "Summarize in JSON format, including summary and keywords fields"

② Specify a few output examples (Few-shot)
   Show model 2-3 examples first, then have it process your question

③ Generate multiple results and pick the best
   Set temperature lower, or generate multiple and vote

④ Explicitly require model to "think out loud"
   "Before answering, list your reasoning steps"

Pitfall 2: Context too long

Too many embedded documents, context fills up, model "can't read it all." Solutions:

① Document chunking should be reasonable
   - Not越小越好; preserve semantic integrity
   - Recommend 300-500 characters per chunk
   - Overlap between chunks to avoid context breaks

② Use Re-ranking to filter
   First retrieve 20 candidates via vector search
   Then use a more precise model to filter to most relevant 3-5

③ Limit context length
   Set max_tokens to prevent overly long output
   Summarize input to control length

6.2 Knowledge Base Pitfalls

Pitfall 3: Can't retrieve relevant content

Embedding model and query semantics may not match. Solutions:

① Try different embedding models
   Chinese scenarios推荐:
   - text-embedding-3-small (OpenAI)
   - BGE (BAAI open source, good Chinese performance)
   - M3E (Moonshot open source)

② Hybrid retrieval
   Combine vector retrieval + keyword retrieval (BM25)
   Take union or intersection of both results

③ Optimize document structure
   - Make each chunk semantically self-contained
   - Avoid one chunk covering multiple topics
   - Add clear headings and summaries

Pitfall 4: Knowledge base updates are troublesome

After documents update, need to re-embed; costs add up. Solutions:

① Incremental updates
   Only reprocess changed documents
   Use version control to track which documents updated

② Separate dynamic and static
   Static knowledge (product manuals) → Embed
   Dynamic info (inventory counts) → Real-time database queries

③ Design knowledge base with update frequency in mind
   High-frequency content managed separately
   Don't mix with low-frequency content

6.3 Cost Pitfalls

Pitfall 5: Token costs skyrocket

No rate limiting, no caching, no context length control. Solutions:

① Load test before launch
   Estimate average token consumption per request
   Project total consumption for daily active users

② Implement request caching
   Same question (hash match) → return cached result
   Cache hit rate can reach 30-50%

③ Set token limits
   Per-request max_tokens limit
   Per-user daily token limit
   Auto-degrade or prompt when exceeded

④ Monitoring and alerts
   Set cost threshold alerts
   Auto circuit-break when over budget

6.4 Reliability Pitfalls

Pitfall 6: Model provider goes down

Relying on a single model is too risky. Solutions:

① Design fallback strategies
   Primary: GPT-4
   Backup: Claude 3 / Gemini Pro
   Fallback trigger: primary model timeout / error / circuit break

② Use platform to manage multiple models
   Dify supports configuring multiple models
   One-click switch, no code changes

③ Monitor model health status
   Follow API status pages
   Set up automatic alerts

7. Next Stop: Making AI a Real Colleague

LLMOps made me realize something: AI isn't here to take jobs, but to take over tasks that are highly repetitive and low in creativity.

Try using AI now:

  • Help me search for problematic code in the codebase (used to grep for半天)
  • Auto-generate unit tests (those boring edge cases)
  • Help me think through architecture for new features (chatting leads to inspiration)
  • Help me write code review comments (things I tend to overlook)

Current AI applications are still pretty primitive — most products are just "calling APIs." But as Agent capabilities grow and LLMOps platforms mature, the real changes are still ahead.

If you haven't started experimenting with AI applications yet, my advice is:

  1. Start playing: Find a platform (Dify, Coze, LangFlow) and build a few demos to feel what AI can really do
  2. Find your scenario: Think of something you do daily that's repetitive and see if AI can help
  3. Go deep on one thing: Whether prompt engineering, RAG, or Agent development — pick one and go deep
  4. Focus on engineering: How to deploy, monitor, iterate — this is what separates "toy" from "product"

Appendix: Quick Reference Glossary

TermFull NameOne-Line Explanation
LLMLarge Language ModelCan understand and generate natural language
TokenTokenBasic unit of text processed by models; English≈word, Chinese≈character
EmbeddingEmbeddingConverting text to vectors; semantically similar words have similar vectors
PromptPromptInput instructions to the model; core human-LLM interaction method
RAGRetrieval-Augmented GenerationCombines knowledge base with model for domain-specific understanding
AgentAgentIntelligent system that autonomously plans, calls tools, executes complex tasks
Fine-tuningFine-tuningTraining model with specific data to adapt to new tasks
LoRALow-Rank AdaptationLow-rank adaptation; efficient model fine-tuning method
AIaaSAI as a ServiceAI capabilities as public services
LLMOpsLLMOpsLarge language model operations; AI application engineering platforms and practices
Vector DBVector DatabaseDatabase storing embeddings for retrieval
Re-rankingRe-rankingUsing more precise methods to filter retrieval results
TemperatureTemperatureParameter controlling model output randomness; lower = more deterministic
Few-shotFew-shotShow examples to model to teach style/format
CoTChain of ThoughtHaving model think step by step before answering
HallucinationHallucinationModel confidently stating false information
Context WindowContext WindowMaximum tokens a model can process in one go

Comments

0/1000

No comments yet. Be the first to share your thoughts!