...Are we heading into a future where websites become more semantic and AI-friendly, rather than purely optimized for human experience?
I’m working on a Chrome extension to make AI feel more integrated into the browser experience. The problem is that parsing a site’s DOM is not very AI-friendly, and it can get expensive fast when most of the tokens are spent just trying to figure out what the “real content” is inside messy markup.
This isn’t only the result of years of JavaScript dependency. It’s also because, until now, the web has been built almost entirely around humans interacting with UIs. That’s starting to change. We now have LLMs as a third “consumer” of the web, and they need a more structured, content-focused way to interact with pages.
So are we heading into a future where websites become more semantic and AI-friendly, rather than purely optimized for human experience? Most likely. But even if that shift is inevitable, it won’t happen overnight. Until new paradigms deeply change how sites are developed, we still need to deal with old-school scraping.
To be honest, my idea isn’t radically new. It’s an extension that lets me use the LLM of my choice to interact with websites in Chrome. The real challenge is making that interaction run on an optimized, content-focused version of the page rather than throwing the raw DOM at the model.
I’m especially worried about token consumption, both as a cost problem (for customers) and as an environmental one. If we want AI features to be affordable and scalable, we need smarter ways to extract what matters from a page without burning tokens on noise. That means building efficient ways to parse and clean site DOMs.
Firefox’s readability.js is a great example. It works wonderfully for news. But its scope is limited, and my extension aims to handle a much wider range of websites and platforms. For that, the solution needs to be more robust.
My approach is to rely on AI to identify patterns across common platforms and tech stacks. Technically, this is “simple” work, creating rules for predictable layouts and structures, but it’s extremely tedious, and almost impossible to do manually at the scale the web demands. This is exactly the kind of task where AI can help: not by inventing a totally new concept, but by expanding something we already know works, across far more scenarios.
I’m building a protocol for local training with manual supervision, with the goal of covering around 70% of Chrome traffic by targeting the most common platforms and recognizable stacks and expand from there. So far, the progress looks promising. The extension (and the framework behind it) is still in progress, but I’m hoping to release it soon.
Stay tuned and happy holidays!