SEO was built for a world where humans type queries and scan blue links. That world is dissolving. AI retrieval systems don’t rank your pages — they build a representation of who you are and what you think. Most organizations are still solving the old problem.

I had a conversation recently with our SEO marketing team about getting professional content — articles, frameworks, tools — surfaced in AI search results. Good instinct. Exactly the right problem to be working on. But when I pushed in on the mechanics — are LinkedIn articles visible to AI systems? Is that content getting indexed anywhere? — they couldn’t really answer. And that’s not a knock on the team. It’s a symptom of how fast the ground shifted under everyone’s feet.

Here’s the uncomfortable truth: most of what we know about search engine optimization is optimizing for a retrieval model that’s rapidly becoming secondary.

"Ranking is about being found. Representation is about being understood. AI retrieval is a representation problem."

The LinkedIn Membrane Problem

LinkedIn Pulse articles exist in what I’ve started calling a semi-permeable membrane. They’re technically public. Google can sometimes crawl them. But LinkedIn’s robots.txt is selectively protective, and AI crawlers — GPTBot, Claude-Web, PerplexityBot — frequently hit walls trying to ingest that content in full.

I tested this directly while building out a canonical writing hub for my own work. Claude’s web fetch tool, asked to retrieve a LinkedIn article I’d written, returned a robots disallowed error. The content exists. It’s publicly accessible to humans with a browser. But the AI retrieval infrastructure that increasingly mediates how ideas spread? It can see the door but not open it.

The Core Distinction

Distribution layer — where your audience finds you today. LinkedIn, Medium, newsletters, social. High reach, platform-controlled, variable AI accessibility.

Citation infrastructure — where AI systems find you and represent you in answers. Your own domain, cleanly crawlable, with machine-readable authorship signals. This is what you need to build or strengthen.

The strategic response isn’t to abandon LinkedIn. It’s to stop treating it as your publication of record. Publish canonically on a domain you control. Use LinkedIn as the trailer, not the movie.

What AI Retrieval Systems Actually Need

Traditional search finds pages that match your query and ranks them by authority signals. The human evaluates a list and clicks what looks promising.

AI retrieval identifies sources it has already built representations of, retrieves relevant chunks, synthesizes an answer, and may or may not cite sources. The human never sees a ranked list. They see a synthesized response. If you’re not in the representation, you’re not in the answer.

The inputs to that representation are different from traditional SEO signals. AI systems care deeply about:

  • Crawlability — can the content actually be read, or is it behind a wall?
  • Authorship clarity — is it unambiguous who wrote this and what their credentials are?
  • Thesis legibility — is the core argument stated clearly in the first paragraph?
  • Semantic coherence — does the content hang together as a coherent perspective?
  • Canonical signals — is there a clear, stable home for this content?

Notice what’s not on that list: keyword density, H2 tag optimization, meta keyword stuffing.

Enter llms.txt — The robots.txt for AI

In 2024, an emerging convention appeared that deserves to be in every content strategist’s toolkit: llms.txt. If you’re familiar with robots.txt — the file at the root of every website that tells search crawlers what they can and can’t access — this is the conceptual equivalent for AI systems.

Where robots.txt controls access, llms.txt guides comprehension. It’s a plain-text markdown file that tells AI systems: here’s who this person or organization is, here’s what they’ve written and built, here’s how to understand the relationships between their work.

Think of it as the introduction you’d want an AI to read before it tries to represent you in someone else’s conversation.

A few things worth noting about an effective llms.txt:

The opening paragraph is the most important element — write it as if you’re briefing a very smart assistant who is about to speak on your behalf. Be specific about what you actually think, not just what you do.

Article entries should lead with thesis, not title. “Article about AI adoption challenges” is useless. “The real barrier to AI adoption in enterprise is the metacognitive shift required to go from executing tasks with AI to directing systems with AI” — that’s something an AI system can work with.

The disambiguation note may feel awkward to write, but it’s genuinely useful. My name is Allen Partridge. There is also a famous fictional British radio presenter named Alan Partridge. Without a disambiguation signal, AI systems conflate the two more than you’d expect — and the results are tonally inconsistent with my actual work.

Structured Data: Making Authorship Machine-Readable

llms.txt handles AI crawlers proactively. But there’s a parallel infrastructure problem on the pages themselves: JSON-LD schema is how you make machine-readable assertions about who wrote something, when, and why it should be trusted.

The Person schema and Article schema are your identity infrastructure in the AI era. The sameAs array in the Person schema tells Google and AI systems that this domain, this LinkedIn profile, and this GitHub handle all refer to the same human entity. The description field is literally the sentence AI systems will use to introduce you when they can’t find a more specific answer. Write it with that use case in mind.

The Thesis-First Writing Principle

AI retrieval systems weight the opening paragraph of your content very heavily. Not your headline. Not your meta description. The actual first paragraph of prose.

Traditional web writing advice was to hook the reader, build context, and reveal the thesis mid-piece. That made sense when humans were reading. An AI system extracting a chunk to synthesize into an answer needs your thesis to be legible immediately, or it may represent your content inaccurately — or not at all.

Practical Test

Take your last published article. Read only the first paragraph. Could someone accurately represent your core argument from just that paragraph? If not, rewrite the opening. Your thesis should be unmistakable in the first 100 words.

The Canonical Domain Principle

Everything above points to a single strategic conclusion: you need a canonical domain — a place you control, that AI systems can freely crawl, that carries your authorship schema, and that exists as the stable home of your ideas regardless of what any platform decides to do.

This doesn’t have to be elaborate. A clean, fast, static site is arguably better than a heavy CMS for this purpose — faster load, cleaner markup, more predictable crawl behavior. The workflow: publish canonically on your domain first. Post excerpts and links to LinkedIn, wherever your audience lives. The platform distribution gets you readers today. The canonical domain gets you representation in AI answers over time.

The Larger Shift

This is a transition from discoverability to representation. Being discoverable meant appearing in a ranked list. Being well-represented means that when someone asks an AI system about your domain of expertise, the system has an accurate, coherent model of your perspective.

That rewards clearly-stated ideas, unambiguous authorship, accessible content, and a body of work that hangs together as a coherent intellectual project. In other words, it rewards good thinking and good writing — and punishes keyword-optimized engagement bait.

There’s something clarifying about that. The organizations that will be well-represented in AI-mediated answers are the ones that have something genuine to say, say it clearly, and say it somewhere the AI can actually read.

That’s not a technical problem. It’s a thinking problem. The infrastructure just has to not get in the way.