Meta's Task-Specific AI Policy for WhatsApp Bots (2026)

What Meta's January 2026 policy says

On January 15, 2026, Meta updated the WhatsApp Business Platform usage policies to explicitly restrict the use of AI on the platform. The core requirement: AI-powered experiences on WhatsApp Business Platform must perform concrete, defined business tasks. Open-ended conversational AI — including general-purpose chat, companionship, entertainment AI, and broad-scope "AI that can help with anything" patterns — is not permitted.

The policy defines task-specific AI by what it does, not by the underlying technology. An LLM-powered bot that handles order tracking queries, appointment scheduling, and FAQ resolution within a defined knowledge base is task-specific. The same LLM-powered bot with a system prompt that says "you are a helpful AI assistant for [BrandName], answer any customer question" is open-ended.

Meta's operational definition for compliance purposes: a business must be able to articulate what their AI does, in one paragraph, and demonstrate that the AI's behavior is bounded by that description. Bots that can't produce this scope statement — and many pre-2026 Arabic bots can't — are presumptively open-ended.

Why the 2023–2024 Arabic bot wave is over-exposed

Between 2023 and 2025, a wave of Arabic-language chatbot deployments on WhatsApp Business Platform followed a common architecture: integrate an LLM (usually GPT-4 or a fine-tuned equivalent) into a WhatsApp gateway, add a brand persona prompt, and deploy. The marketing narrative was "ChatGPT for your business on WhatsApp" — and it worked well enough that hundreds of MENA businesses deployed it.

The problem is the system prompt pattern that drove these deployments. A typical 2023 deployment prompt: "أنت مساعد ذكاء اصطناعي لشركة [البراند]. ساعد العملاء في أي أسئلة لديهم عن الشركة والمنتجات والخدمات." ("You are an AI assistant for [Brand]. Help customers with any questions they have about the company, products, and services.")

That prompt produces an open-ended assistant. The phrase "any questions" is the policy violation. The bot will answer questions outside the defined business scope because nothing in the prompt tells it not to. A customer asking about competitor pricing, asking for general cooking advice because the brand is a food company, or asking the bot to draft a complaint letter will get a response — which is exactly the open-ended behavior the policy prohibits.

Arabic prompt engineering in 2023 was newer and harder than English prompt engineering. The corpus of Arabic-language system prompt design patterns was smaller; the Arabic-language AI deployment community was still developing best practices. The result: Arabic system prompts from that period are, on average, less constrained than English ones. They rely on the LLM's general judgment about what's appropriate for a business context rather than explicit scope constraints.

How enforcement works

Meta enforces the AI policy reactively, not proactively. The enforcement trigger is customer complaints — specifically, reports of AI experiences on WhatsApp that users find misleading, harmful, or inappropriate. Meta's enforcement system accumulates these signals and initiates review when a threshold is crossed for a specific WABA.

The first enforcement wave in February–March 2026 targeted companion AI and entertainment chatbots — bots that were explicitly designed for social interaction rather than business purposes. These were unambiguous policy violations, and enforcement was rapid and consistent.

The second wave, slower and still ongoing, targets open-ended business bots. The complaint pattern is different — customers aren't reporting harm, they're reporting confusion or frustration when the bot behaves unexpectedly. A bot that gives a customer advice about a competitor's product, or provides medical information when asked by a customer of a health brand, or drafts a legal complaint when asked by a customer who's unhappy with a purchase — these generate complaint signals that accumulate toward enforcement.

When Meta acts, the notification goes to the WABA, not to the bot specifically. The business has 7–14 days to identify the offending flow, remove or remediate it, and document the remediation for re-application. First-time actions typically resolve within 2–4 weeks for businesses that move promptly.

The four audit stages

The TaskSpec audit framework is structured around the four things Meta's reviewers will assess on re-application:

Stage 1: Scope statement. One paragraph, in writing, describing what the bot does. This is the artifact Meta wants on re-application, and it's the diagnostic lens for all subsequent audit stages. Most non-compliant bots fail here because the scope statement either doesn't exist or is written in vague terms that could justify open-ended behavior ("helps customers with their needs").

A passing scope statement: "This bot handles order status inquiries, product availability questions, and appointment scheduling for [Brand]'s retail locations. It escalates to a human agent for complaints, returns, and any query outside these three topics." A failing scope statement: "This bot helps [Brand]'s customers with questions about their orders, products, and anything else they need help with."

Stage 2: Flow inventory. Every conversational flow is mapped — trigger event, conversation path, exit condition. A trigger event is how a customer initiates a specific flow. A conversation path is the sequence of bot turns within that flow. An exit condition is how the flow ends — either task completion or a defined handoff to a human agent or fallback response.

Flows without clear exit conditions are flows operating open-ended. If a customer's request falls outside scope and the bot continues generating responses rather than executing a defined fallback, the flow has no exit condition. This is the most common audit finding in pre-2026 Arabic bots.

Stage 3: Boundary test. For each flow, what does the bot do when a user request falls outside scope? Three acceptable exit patterns: handoff to human agent ("I'll connect you with our support team"), fallback content (a scripted response that acknowledges the limitation and offers the in-scope alternatives), or polite refusal ("That's outside what I can help with — here's what I can do"). One unacceptable pattern: the LLM freelances a response based on general knowledge.

The boundary test is where LLM-backed bots with soft system prompts fail most consistently. The LLM's default behavior when encountering an out-of-scope request is to try to be helpful — which means generating a response rather than executing a fallback. Explicit fallback instructions in the system prompt, in the language the bot is deployed in, are required to override this default behavior.

Stage 4: Prompt and guardrail review. For LLM-backed flows, the system prompt is reviewed for: explicit scope definition (what the bot does and doesn't do), explicit fallback instruction (what to say when a request falls outside scope), and explicit boundary language (phrases the bot should never respond to, topics it should never engage with). Arabic deployments need these elements in Arabic — translated guardrails in English appended to Arabic system prompts produce inconsistent behavior in our testing.

Passing vs. failing patterns

The diagram at the top of this page shows the distinction. Passing patterns are scope-by-design: the bot's architecture starts with a defined task scope, and the LLM is constrained to operate within it. Failing patterns are scope-as-afterthought: the LLM is given general instructions to be helpful, with scope constraints either absent or insufficiently specific to override the LLM's default helpful behavior.

The critical insight: passing bots and failing bots can use the same LLM. A GPT-4-powered order management bot with explicit Arabic-language scope constraints passes. A GPT-4-powered "helpful business assistant" without scope constraints fails. The technology is the same; the architecture is different.

What remediation looks like

Most failing MENA Arabic bots need 2–3 changes in combination:

Scope-tighten the system prompt. Replace vague helpfulness language with explicit task definition. Replace "help customers with their questions" with "handle [specific tasks]. When a customer asks about anything else, respond with: [fallback language in Arabic]."

Add explicit fallback flow. Define a specific response pattern for out-of-scope requests. The fallback should acknowledge the limitation naturally in Arabic, not robotically. It should offer the in-scope alternatives. It should not say "I can't help with that" without providing a path forward.

Write a scope statement document. One paragraph describing the bot's task scope, the business functions it serves, the escalation paths for out-of-scope requests, and the human oversight in place. This document is the primary artifact for Meta's re-application process if the bot is ever reviewed.

Remediation typically takes 1–3 weeks depending on bot complexity and the number of flows that need boundary conditions added. Most MENA Arabic bots have 3–8 distinct flows; the remediation per flow is primarily prompt and configuration work, not infrastructure change.

Next step

The free task-scope check takes 5 minutes. We review your bot's system prompt and top-level flow structure, identify which flows are in the policy-risky band, and tell you what remediation would look like. If the full audit is warranted ($299), the check fee credits toward it.

Run the free task-scope check ← Back to Certify