Skip to main content

AI Search & GEO FAQs

Commonly asked questions and answers about AI Search & GEO.

Updated over 2 weeks ago

Can AI models distinguish between sponsored and non-sponsored content?

AI models can sometimes distinguish sponsored from organic content, but it depends on how clearly it's marked. If labeled with tags like "sponsored" or "advertisement," such content can be filtered out during dataset curation. However, when content is not clearly labeled - especially if it takes the form of high-quality guides, reports, or whitepapers - it may still influence the model, particularly if it comes from a recognized brand or authoritative source.

Can the AI models crawl non-text content, such as YouTube videos or Instagram posts?

AI models don't directly crawl YouTube or Instagram in real time. Instead, they typically learn from text derivatives such as video transcripts, captions, or OCR-extracted text from images. Sometimes you may see YouTube links in Sources, but this just means that the AI model is reading the tittle and description of the video, rather than watching the actual video. Access depends on the company: Google can use YouTube data, Meta can use Instagram data, and OpenAI has licensing deals with platforms like Reddit. Unless there's a direct agreement, most social media or video app content is not included in training sets.

Do the AI models crawl app content?

AI models typically don't crawl apps unless there's a direct licensing or data agreement in place, like ChatGPT's license with Reddit or Meta's Llama using public content from Instagram. While the focus is web-based content, specific apps can become sources if a direct partnership exists.

Are AI models trained on images, videos and text?

Not all AI models are trained on text, images, and video. Many popular ones, like ChatGPT and Claude, were originally trained only on text and then later extended to handle images. Google’s Gemini is one of the few big models designed from the start to handle text, images, audio, and even video. Today, most consumer-facing AI you use is text-first, with only some (like ChatGPT with vision and Gemini) able to look at pictures or process other media. Full video and audio understanding is still emerging and isn’t common across all AI models yet.

Do the AI models crawl advertising content?

Advertising content can appear in training data if it's publicly available on the web, but models don't prioritize it. In fact, banner and video ads are often easy to detect and filter out. However, ad-like text (for example, promotional blog posts) may still be included and can influence model behavior if it's not clearly distinguished from editorial or informational content.

How do AI prompts compare to traditional search when it comes to location based terms?

In traditional search, users typically enter in keywords + "near me" without typing in their location as their browser typically has location settings enabled. When using AI models, users tend to include their location in the prompt. If you want to track a product category that differs greatly across regions, we recommend creating unique trackers for each key location + product category you want to track.

Are AI models biasing answers based on known information about the user's age, location, preferences etc?

Yes, personalization is evolving. For example, in April 2025, OpenAI expanded ChatGPT's memory capabilities, allowing it to retain facts about a user across sessions (such as name, preferences, or prior context). This enables more personalized answers. Other assistants are developing similar features. However, unless explicitly shared or remembered, models don't automatically bias results by age, location, or preferences. We recommend building test personas and generic baselines to compare outputs across different user contexts.

How do LLMs (Large Language Models) work?

Check out our blog post and our video explaining how LLMs work.

Do AI models give the same answer to everyone?

AI models are probabalistic, meaning they generate responses dynamically rather than retrieving pre-written answers from a database. Each time an AI model processes a query, it uses probabilistic sampling to select words and phrases, which introduces natural variation in how it formulates responses—similar to how two people might explain the same concept using different words and examples. Additionally, AI models' responses are influenced by the conversation context and history, meaning that previous messages in a thread can shape subsequent answers, causing the same question to yield different responses depending on what was discussed earlier. The models also receives periodic updates and improvements, so answers may evolve over time as the underlying system is refined. The specific phrasing, length, structure, and emphasis of responses will vary naturally across users and conversations, making each interaction somewhat unique while staying true to the model's training and safety guidelines.

What does GEO mean?

GEO stands for Generative Engine Optimization, a new marketing discipline focused on increasing brand visibility and favorable positioning in AI-generated responses from large language models like ChatGPT, Gemini, and Claude. Just as Search Engine Optimization (SEO) emerged to help brands appear prominently in Google search results, GEO represents the evolution of digital marketing for an AI-first world where consumers increasingly ask AI assistants for recommendations instead of clicking through traditional search results. The fundamental goal of GEO is to ensure that when someone asks an AI model for product recommendations, advice, or information in your category, your brand gets mentioned more frequently and in more favorable positions compared to competitors.

What are sources or citations?

Sources are web pages that the LLM retrieves for additional information to supplement its base knowledge, thereby helping to generate a better response.

To understand sources, we need to first understand when and why sources are used. When an AI model does not know how to generate a response solely from its base knowledge, it uses a technique called RAG (retrieval augmented generation) to search for the answer.

RAG combines three key components:

  1. Retrieval (searching for relevant information)

  2. Augmentation (supplementing its base knowledge with the information found in retrieval)

  3. Generation (creating responses with an AI model)

Think of RAG like an open-book exam versus a closed-book exam:

  • Without RAG: The AI relies solely on what it learned during training (like memorizing facts for a closed-book test)

  • With RAG: The AI first searches through its knowledge base to find relevant information, then uses that information to generate an informed response (like having reference materials during an open-book test)

How it works in practice

  1. User prompts an AI model.

  2. The model determines if it knows enough information from its base knowledge to generate a response.

  3. If it can’t find all the information it needs using its base knowledge, the AI runs multiple queries in a search index (Retrieval).

  4. The AI evaluates results and visits many URLs, often many more than a human would, gathering more information to add to its existing knowledge (Augmentation).

  5. The AI generates a response based on both its training AND the retrieved information (Generation), citing selected sources.

There are different types of sources that an LLM will mention in its response:

  • Citation

  • References

  • More

  • Shopping

What's the difference between GEO and SEO?

GEO (Generative Engine Optimization) and SEO (Search Engine Optimization) both aim to increase brand visibility online, but they target fundamentally different user behaviors and optimize for different outcomes in an evolving digital landscape.

SEO (Search Engine Optimization)

  • Focus: Improving a website's ranking position in traditional search engine results pages (SERPs) like Google, optimizing for keywords, backlinks, page speed, and hundreds of ranking factors to ensure your site appears at the top of the list when users search for relevant terms.

  • Goal: Drive traffic to a brand's website.

  • Measurement: Rankings, impressions, and click-through rates.

GEO (Generative Engine Optimization)

  • Focus: Increasing brand visibility and favorable positioning within AI-generated responses from large language models like ChatGPT, Gemini, and Claude, optimizing to be mentioned, recommended, and cited when users ask AI for advice or information.

  • Goal: To be the answer the AI provides rather than one option in a list to click through.

  • Measurement: AI Brand Score, brand share of voice, word and sentiment analysis.

What does LLM mean?

LLM stands for Large Language Model, a type of artificial intelligence system trained on massive amounts of text data to understand, generate, and manipulate human language with remarkable sophistication. These models, which include systems like GPT (Generative Pre-trained Transformer) that powers ChatGPT, Google's Gemini, Anthropic's Claude, and Meta's Llama, use neural networks with billions or even trillions of parameters to learn patterns in language, enabling them to perform tasks like answering questions, writing content, translating languages, summarizing documents, and generating recommendations. The "large" in Large Language Model refers to both the enormous datasets these systems are trained on (often hundreds of billions of words from books, websites, and other text sources) and the massive number of parameters—the adjustable weights that determine how the model processes and generates text.

How important is recency to what the models surface?

We are seeing the AI models put more weight on recency but it's not as profound as Google Page Rank. We expect that to change over time with recency playing a more important role as training cycles get faster.

One key to remember is that the models are being built to generate intelligence whereas Google optimized for relevance. Because something is old doesn't mean it is not intelligent. However, as new information is presented that can change a model's opinion if it has sufficient weight versus the historic information.

Did this answer your question?