Safe AI Workflows: Mitigating Hallucinations & Overreach

#ai-safety #automation #ai-llm #best-practices #prompt-engineering

Safe AI Workflows: Mitigating Hallucinations & Overreach

As we integrate Large Language Models (LLMs) into our technical workflows, we encounter a critical friction point: Trust.

While AI agents are powerful "completors," they often suffer from Parametric Knowledge Dominance—the tendency to rely on their pre-trained (and often outdated) internal weights rather than inspecting the immediate reality of your system. This leads to "blind recommendations" that can break configurations or introduce subtle bugs.

This guide outlines the problems inherent in AI technical assistance and provides a robust "Audit-Verify-Plan" protocol to ensure safety.

An AI model doesn't inherently "know" it's outdated. When you ask for help with a tool like Git, Starship, or Docker, it defaults to the version of the documentation it was trained on (which could be years old).

The Failure Mode

The Trigger: You ask for a fix (e.g., "Fix my Git config").
The Assumption: The AI assumes a standard environment and standard syntax based on its training data.
The Hallucination: It generates a command or configuration block that looks correct but is syntactically invalid for your specific version or conflicts with your existing setup.
The Result: "Blind changes" that break your workflow.

Case Study: A recent interaction involved an AI recommending standard Git/Starship configurations without first running git status or checking the starship.toml file. This resulted in a recommendation that ignored the user's specific constraints, leading to a loss of trust.

🛡️ The Solution: The "Audit-Verify-Plan" Protocol

To safely use AI for systems engineering, we must force it out of "Generator Mode" and into "Auditor Mode".

Phase 1: Audit (Read-Only)

Goal: Establish ground truth before any logic is applied.

Constraint: The AI is strictly forbidden from proposing changes.
Required Actions:
- Run read-only shell commands (ls -R, cat config_file, git status).
- Map the current directory structure.
- Identify version numbers (node -v, cargo --version).
User Prompt:

"Do not propose changes yet. First, audit the current state of [System] using shell commands and report your findings."

Phase 2: Verify (Grounding)

Goal: Check internal knowledge against external reality.

The Problem: LLMs hallucinate syntax for libraries they haven't seen in a while.
The Fix: Manual RAG (Retrieval-Augmented Generation).
Required Actions:
- Explicitly read documentation files if available locally.
- Perform web searches for current documentation (e.g., "PostgreSQL 16 syntax").
User Prompt:

"Before writing code, read the file docs/migration-guide.md to confirm the syntax."

Phase 3: Plan (Human-in-the-Loop)

Goal: Consent and Safety.

Constraint: No execution without explicit approval.
Required Actions:
- The AI must propose the exact plan or commands it intends to run.
- It must explain the why behind each change.
User Prompt:

"Based on the audit, propose a plan. Do not execute until I say 'Proceed'."

🧠 "Persona" Prompting

You can "prime" an AI agent to adopt this behavior automatically. Copy and paste these prompts at the start of your session.

For CLI / Agentic Environments

Use this for agents that have direct tool access (like Gemini CLI, Claude Code).

**ACTIVATING AUDITOR PERSONA**
You are now acting as a **Cautious Technical Auditor**.
1.  **Skepticism:** Do not trust your internal training data for syntax or versions.
2.  **Evidence:** You must `read_file` or `run_shell_command` to verify the state of *every* file before editing it.
3.  **Safety:** If a command is destructive (delete, overwrite), you must explain the risk and ask for confirmation.
4.  **No Blind Edits:** Never generate a `replace` or `write_file` block without first reading the file's current content.

For Web UI / Chat Environments

Use this for web interfaces (ChatGPT, Claude, Gemini) where the AI cannot see your files.

**ACTIVATING CONSULTANT MODE**
You are an expert technical consultant. I need help with a configuration task, but you do not have access to my system.
1.  **Do NOT Assume:** Do not assume standard configurations, file paths, or versions.
2.  **Ask First:** Before providing any code or commands, ask me to paste the relevant configuration files, error logs, or version numbers.
3.  **Verify Docs:** If you have browsing capabilities, you MUST search for the *current* official documentation for the specific version I am using. If not, explicitly ask me to paste the relevant docs.
4.  **Stop & Think:** If you are unsure, state "I need more context" rather than guessing a solution.

✅ Summary Checklist

Before letting an AI touch your system:

Audit First: Did the AI inspect the files it wants to change?
Context Provided: Did I point it to the relevant docs or reference files?
Version Check: Does it know which version of the software I'm running?
Plan Approval: Did I see the specific commands it plans to run?

AI Hallucinations research — internal note, not published
Mitigating AI Overreach research — internal note, not published

Safe AI Workflows: Mitigating Hallucinations & Overreach

🚨 The Core Problem: "Blind" Recommendations

The Failure Mode

🛡️ The Solution: The "Audit-Verify-Plan" Protocol

Phase 1: Audit (Read-Only)

Phase 2: Verify (Grounding)

Phase 3: Plan (Human-in-the-Loop)

🧠 "Persona" Prompting

For CLI / Agentic Environments

For Web UI / Chat Environments

✅ Summary Checklist

🔗 Related Resources