How to Block Google AI Training Chat Bots on your Website

This article will cover how to block Google AI training on your website content.

Artificial Intelligence (AI) has undergone many improvements in understanding and interacting with human language in recent years. A prime example of this evolution is Google’s chat bots, which are trained to converse with users naturally and intuitively. However, as webmasters, there could be instances where you might want to block such AI from training on your website. Here’s why and how you might want to take such a step.

Understanding Google Chat Bots

We’ll first understand what Google chatbots are before we learn how to block Google AI Training.

Google chatbots are virtual assistants that can communicate with users, much like a human would. They are designed to answer queries, provide information, and assist in different tasks. Utilizing advanced AI, these chatbots can understand and respond to natural language inputs, making them a handy tool for everyday inquiries. Whether getting weather updates or finding a good restaurant nearby, Google chatbots aim to make your digital experience smooth and effortless.

Introducing Google Bard Chat Bot

Google Bard is a fresh face in chatbots, designed by Google to hold more natural conversations with users. Unlike a standard search engine, Bard aims to engage in back-and-forths with users, making information gathering feel like a friendly chat. It initially rode the wave of LaMDA language models, later transitioning to the PaLM model. This new buddy was Google’s answer to rising stars in the AI chatbot scene (like ChatGPT), aiming to keep the tech giant’s foothold firm in the ever-evolving digital landscape.

Advancing to Future Google Chat Bots

Google Gemini is a next-gen foundation model by Google, aiming to compete with OpenAI’s GPT-4 model. It’s a collection of large-language models powering various applications based on user preferences, from chatbots to text summarization and generation. Gemini can also assist in coding and creating original images per user requests. Initially available to selected developers, it’s planned to be offered to companies through Google Cloud Vertex AI service, marking a significant step in enhancing Google’s AI capabilities and competitive edge in the generative AI domain.

How does Google Obtain Information to Train its Chat Bots?

Google has developed a specific crawling bot, the Google-Extended user agent, to collect data for training its AI models and chatbots. This crawler scours the web to gather vast amounts of text and information, which are then used to improve the understanding and responses of Google’s AI chatbots like Bard and Gemini. By continuously feeding new data into these models, Google aims to keep them updated and effectively handle various user queries.

The crawler user agent is


You can check all the Google Crawler Bots user agents.

Reasons to Block or Not to Block Google AI Crawler

The Google AI training crawler is different from the Google Indexing crawler. That is why they have separate user agents. There is no impact on SEO by blocking the Google AI Training crawler. Read the ChatGPT Blocking article for ethical reasons about blocking or not the AI Training bots.

How to Block Google AI Training Bot Crawling

You will need to add a directive in your “robots.txt” file (for WordPress, you can use the Yoast SEO plugin):

User-agent: Google-Extended
Disallow: /

These lines will block the “Google-Extended” bot that Google created for AI training from crawling your website.

If you add it through the Yoast SEO robot.txt feature, it should look like this:

User-agent: *
Disallow: /wp-content/uploads/wpo-plugins-tables-list.json

# ---------------------------
User-agent: *

Sitemap: https://www.yourdomain.com/sitemap_index.xml
# ---------------------------

User-agent: Google-Extended
Disallow: /

If you want to block Google AI Training and ChatGPT crawler altogether:

User-agent: *
Disallow: /wp-content/uploads/wpo-plugins-tables-list.json

# ---------------------------
User-agent: *

Sitemap: https://www.yourdomain.com/sitemap_index.xml
# ---------------------------

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

If you find any mistakes or have ideas for improvement, please follow the email on the Contact page.

