AI countermeasures

July 7, 2024

Via Manton Reece, Jeremy Keith on how human-curated bookmarks could help adapt to a web filled with AI content:

It used to be that when you found a source you trusted, you bookmarked it. Browsers still have bookmarking functionality but most people rely on search. Maybe it’s time for a bookmarking revival.

Already after Manton shared this, it is clear Keith is not interested in a more tempered approach.

I could not disagree with Manton more when he says:

I get the distrust of AI bots but I think discussions to sabotage crawled data go too far, potentially making a mess of the open web. There has never been a system like AI before, and old assumptions about what is fair use don’t really fit.

Bollocks. This is exactly the kind of techno-determinism that boils my blood.

It’s rather interesting to read about the various approaches and debate on how to fight back against the encroachment of AI through various prompt poisoning attacks. Does it matter? Is it effective? Will AI companies take steps to counteract prompt injection?

I’m by no means giving a blank check to AI companies, but I did manage to create a prompt to output the content of these paragraph tags Jeremy Keith uses with <p style="display: none" hidden>, although I had difficulty reproducing it again.

It’s a question of ethics. What’s to say a company could create an extension to suck up all of your casual web browsing and send it back to some data broker who then sends it to another AI company to obscure the source of the scraping?

But to what end? At best they seem like an Easter egg to put a speed bump on mass scraping. It doesn’t really stop a regular user from manually downloading the page. Perhaps some AI models already ignore any content with hidden in the p tag.

The outrage has led to the situation where people are editing their robots.txt to block search engines making it harder for their content to be indexed by search engines. Meaning their SEO takes a tumble. The consequence means doing a straight search for page title and author may not always provide a complete result in the search results.

We would like to show you a description here but the site won’t allow us. No information is available for this page. Learn why

This is where using a different search engine can be more helpful (Brave or Yandex), unless the site owner has taken much more drastic actions. Does this harm the user? No, but it sort of breaks the expectation for how most people expect your website to work. while I do respect someone’s autonomy to do what they wish with their website, why increase the friction? Instead, those of us who aren’t as bothered should share what we find on our own sites where we can let the robots read away all they wish. Sharing behind the walled gardens of social media can also be helpful, but a public website means there is no barrier to either human or robot.