CLI Reference¶

Complete reference for the ragnarok command-line interface.

Global Options¶

ragnarok [OPTIONS] COMMAND [ARGS]

Option	Description
`--version`, `-v`	Show version and exit
`--json`	Output in JSON format
`--no-color`	Disable colored output
`--pii-mode`	PII handling: `hash` (default), `redact`, `full`
`--help`	Show help message

Commands¶

version¶

Show the current version.

ragnarok version
# ragnarok-ai v1.5.0

evaluate¶

Evaluate a RAG pipeline against a test set.

ragnarok evaluate [OPTIONS]

Option	Description
`--demo`	Run demo with NovaTech dataset
`--config`, `-c`	Path to ragnarok.yaml config file
`--testset`, `-t`	Path to testset JSON file
`--output`, `-o`	Output file path for results
`--fail-under`	Fail if average score below threshold (0.0-1.0)
`--limit`, `-n`	Limit number of queries
`--seed`	Random seed for reproducibility

Examples:

# Demo evaluation
ragnarok evaluate --demo

# With config file
ragnarok evaluate --config ragnarok.yaml

# With threshold
ragnarok evaluate --demo --fail-under 0.7

# Limited queries
ragnarok evaluate --demo --limit 5

# Save results
ragnarok evaluate --demo --output results.json

# JSON output for CI
ragnarok evaluate --demo --json

generate¶

Generate a synthetic test set from documents.

ragnarok generate [OPTIONS]

Option	Description
`--demo`	Use NovaTech example dataset
`--docs`, `-d`	Path to documents (JSON or directory)
`--num`, `-n`	Number of questions to generate (default: 10)
`--output`, `-o`	Output file path (default: testset.json)
`--model`, `-m`	Ollama model (default: mistral)
`--seed`, `-s`	Random seed for reproducibility
`--validate`	Validate generated questions
`--dry-run`	Show what would be generated
`--ollama-url`	Ollama API URL

Examples:

# From demo dataset
ragnarok generate --demo --num 10

# From documents directory
ragnarok generate --docs ./knowledge/ --num 50

# From JSON file
ragnarok generate --docs documents.json --model llama3

# Dry run
ragnarok generate --demo --dry-run

benchmark¶

Track benchmark history and detect regressions.

ragnarok benchmark [OPTIONS]

Option	Description
`--demo`	Run demo with simulated runs
`--list`, `-l`	List all recorded configurations
`--history`, `-H`	Show history for a config name
`--output`, `-o`	Output file for results
`--fail-under`	Fail if average below threshold
`--dry-run`	Show what would be benchmarked
`--storage`, `-s`	Path to storage file

Examples:

# Run demo
ragnarok benchmark --demo

# List configurations
ragnarok benchmark --list

# View history
ragnarok benchmark --history my-rag-config

# With threshold
ragnarok benchmark --demo --fail-under 0.7

judge¶

Evaluate responses using LLM-as-Judge.

ragnarok judge [OPTIONS]

Option	Description
`--context`, `-c`	Context text for evaluation
`--question`, `-q`	Question to evaluate
`--answer`, `-a`	Answer to evaluate
`--file`, `-f`	JSON file with items to evaluate
`--criteria`	Comma-separated criteria (default: all)
`--model`, `-m`	Ollama model (default: Prometheus 2)
`--fail-under`	Fail if average below threshold
`--output`, `-o`	Output file for results
`--ollama-url`	Ollama API URL

Criteria:

faithfulness — Is the answer grounded in context?
relevance — Does the answer address the question?
hallucination — Does the answer contain fabricated info?
completeness — Are all aspects covered?
all — All criteria (default)

Examples:

# Single evaluation
ragnarok judge \
  --context "Paris is the capital of France." \
  --question "What is the capital of France?" \
  --answer "Paris"

# From file
ragnarok judge --file items.json

# Select criteria
ragnarok judge --file items.json --criteria faithfulness,relevance

# With threshold
ragnarok judge --file items.json --fail-under 0.7

# JSON output
ragnarok judge --file items.json --json

dataset¶

Manage and compare dataset versions.

ragnarok dataset COMMAND [OPTIONS]

dataset diff¶

Compare two dataset versions to detect changes.

ragnarok dataset diff <v1_path> <v2_path> [OPTIONS]

Option	Description
`--key`, `-k`	Field to use as item key (default: metadata.id or content hash)
`--ignore-metadata`	Ignore metadata changes in comparison
`--show`, `-n`	Limit number of items shown (default: 10)
`--output`, `-o`	Export diff report to JSON file
`--fail-on-change`	Exit with error if changes detected (for CI)

Examples:

# Compare two testset versions
ragnarok dataset diff testset_v1.json testset_v2.json

# Ignore metadata changes
ragnarok dataset diff v1.json v2.json --ignore-metadata

# Export diff report
ragnarok dataset diff v1.json v2.json --output diff_report.json

# CI gating: fail if dataset changed
ragnarok dataset diff baseline.json current.json --fail-on-change

Output:

  RAGnarok-AI Dataset Diff
  ========================================

  v1: testset_v1.json
      hash=a1b2c3d4e5f6g7h8  items=50
  v2: testset_v2.json
      hash=x9y8z7w6v5u4t3s2  items=52

  ----------------------------------------
  Summary
  ----------------------------------------
    Added:     2
    Removed:   0
    Modified:  3
    Unchanged: 47

dataset info¶

Show dataset metadata and statistics.

ragnarok dataset info <path>

Example:

ragnarok dataset info testset.json

plugins¶

Manage and list available plugins.

ragnarok plugins [OPTIONS]

Option	Description
`--list`, `-l`	List all available plugins
`--type`, `-t`	Filter by type: llm, vectorstore, framework, evaluator
`--local`	Only show local adapters
`--info`, `-i`	Show info for a specific plugin

Examples:

# List all plugins
ragnarok plugins --list

# Filter by type
ragnarok plugins --list --type llm

# Local only
ragnarok plugins --list --local

# Plugin info
ragnarok plugins --info ollama

JSON Output¶

All commands support --json for machine-readable output:

ragnarok evaluate --demo --json

Response envelope:

{
  "command": "evaluate",
  "status": "pass",
  "version": "1.5.0",
  "data": { ... },
  "warnings": [],
  "errors": []
}

Status values:

pass — Evaluation passed threshold
fail — Evaluation failed threshold
success — Command completed successfully
error — Command failed
dry_run — Dry run completed

Exit Codes¶

Code	Meaning
0	Success
1	Runtime failure, threshold not met
2	Invalid arguments, missing files

Configuration File¶

Create ragnarok.yaml:

testset: ./testset.json
output: ./results.json
fail_under: 0.8

metrics:
  - precision
  - recall
  - mrr
  - ndcg

criteria:
  - faithfulness
  - relevance
  - hallucination
  - completeness

ollama_url: http://localhost:11434

Use with:

ragnarok evaluate --config ragnarok.yaml

CLI options override config file values.

monitor¶

Production monitoring daemon commands.

monitor start¶

Start the monitoring daemon.

ragnarok monitor start [OPTIONS]

Option	Description
`--port`, `-p`	Port to listen on (default: 9090)
`--host`	Host to bind to (default: 0.0.0.0)
`--db`	Path to SQLite database
`--retention`	Days to keep raw traces (default: 7)
`--foreground`, `-f`	Run in foreground

Examples:

# Start in background
ragnarok monitor start

# Start on custom port
ragnarok monitor start --port 8080

# Run in foreground for debugging
ragnarok monitor start --foreground

monitor stop¶

Stop the running daemon.

ragnarok monitor stop

monitor status¶

Show daemon status and basic metrics.

ragnarok monitor status

Output:

Monitor Status: RUNNING
------------------------------------
  PID:              12345
  Uptime:           2h 34m
  Traces collected: 12,566
  Success rate:     99.8%
  Latency P50:      234ms
  Latency P99:      1234ms

monitor stats¶

Show detailed statistics.

ragnarok monitor stats [OPTIONS]

Option	Description
`--period`, `-p`	Time period: 1h, 24h, 7d (default: 24h)

Examples:

# Last 24 hours (default)
ragnarok monitor stats

# Last hour
ragnarok monitor stats --period 1h

# Last 7 days, JSON output
ragnarok monitor stats --period 7d --json

Environment Variables¶

Variable	Description
`OLLAMA_HOST`	Ollama API URL
`NO_COLOR`	Disable colored output

Next Steps¶

Monitoring Guide — Production monitoring setup
GitHub Action — CI/CD integration
Quick Start — Getting started