Safe AI Workflows: Mitigating Hallucinations & Overreach

Safe AI Workflows: Mitigating Hallucinations & Overreach

As we integrate Large Language Models (LLMs) into our technical workflows, we encounter a critical friction point: Trust.

While AI agents are powerful "completors," they often suffer from Parametric Knowledge Dominance—the tendency to rely on their pre-trained (and often outdated) internal weights rather than inspecting the immediate reality of your system. This leads to "blind recommendations" that can break configurations or introduce subtle bugs.

This guide outlines the problems inherent in AI technical assistance and provides a robust "Audit-Verify-Plan" protocol to ensure safety.


🚨 The Core Problem: "Blind" Recommendations

An AI model doesn't inherently "know" it's outdated. When you ask for help with a tool like Git, Starship, or Docker, it defaults to the version of the documentation it was trained on (which could be years old).

The Failure Mode

  1. The Trigger: You ask for a fix (e.g., "Fix my Git config").
  2. The Assumption: The AI assumes a standard environment and standard syntax based on its training data.
  3. The Hallucination: It generates a command or configuration block that looks correct but is syntactically invalid for your specific version or conflicts with your existing setup.
  4. The Result: "Blind changes" that break your workflow.

Case Study: A recent interaction involved an AI recommending standard Git/Starship configurations without first running git status or checking the starship.toml file. This resulted in a recommendation that ignored the user's specific constraints, leading to a loss of trust.


🛡️ The Solution: The "Audit-Verify-Plan" Protocol

To safely use AI for systems engineering, we must force it out of "Generator Mode" and into "Auditor Mode".

Phase 1: Audit (Read-Only)

Goal: Establish ground truth before any logic is applied.

Phase 2: Verify (Grounding)

Goal: Check internal knowledge against external reality.

Phase 3: Plan (Human-in-the-Loop)

Goal: Consent and Safety.


🧠 "Persona" Prompting

You can "prime" an AI agent to adopt this behavior automatically. Copy and paste these prompts at the start of your session.

For CLI / Agentic Environments

Use this for agents that have direct tool access (like Gemini CLI, Claude Code).

**ACTIVATING AUDITOR PERSONA**
You are now acting as a **Cautious Technical Auditor**.
1.  **Skepticism:** Do not trust your internal training data for syntax or versions.
2.  **Evidence:** You must `read_file` or `run_shell_command` to verify the state of *every* file before editing it.
3.  **Safety:** If a command is destructive (delete, overwrite), you must explain the risk and ask for confirmation.
4.  **No Blind Edits:** Never generate a `replace` or `write_file` block without first reading the file's current content.

For Web UI / Chat Environments

Use this for web interfaces (ChatGPT, Claude, Gemini) where the AI cannot see your files.

**ACTIVATING CONSULTANT MODE**
You are an expert technical consultant. I need help with a configuration task, but you do not have access to my system.
1.  **Do NOT Assume:** Do not assume standard configurations, file paths, or versions.
2.  **Ask First:** Before providing any code or commands, ask me to paste the relevant configuration files, error logs, or version numbers.
3.  **Verify Docs:** If you have browsing capabilities, you MUST search for the *current* official documentation for the specific version I am using. If not, explicitly ask me to paste the relevant docs.
4.  **Stop & Think:** If you are unsure, state "I need more context" rather than guessing a solution.

✅ Summary Checklist

Before letting an AI touch your system: