colonial@lemmy.world to

Programming@programming.dev · 2 years ago

Intentionally corrupting LLM training data?

26

Intentionally corrupting LLM training data?

colonial@lemmy.world to

Programming@programming.dev · 2 years ago

Inspired by the comments on this Ars article, I’ve decided to program my website to “poison the well” when it gets a request from GPTBot.

The intuitive approach is just to generate some HTML like this:

<p>
// Twenty pages of random words
</p>

(I also considered just hardcoding twenty megabytes of “FUCK YOU,” but that’s a little juvenile for my taste.)

Unfortunately, I’m not very familiar with ML beyond a few basic concepts, so I’m unsure if this would get me the most bang for my buck.

What do you smarter people on Lemmy think?

(I’m aware this won’t do much, but I’m petty.)

Chat

heeplr@feddit.de
link
fedilink
arrow-up
11·
2 years ago

show the 20 pages of random words to your users, right?

any dev worth it’s salt is going to check the agent string for GPTBot.

That said, it’s a perfect receipe for getting companies to spoof browsers.
- TootSweet@lemmy.world
  link
  fedilink
  English
  arrow-up
  4·
  2 years ago
  Yeah, and even if OpenAI uses user agents that identify that bot as GPTBot, there’s no guarantee other scrapers will be so kind.

Programming@programming.dev

programming@programming.dev

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !programming@programming.dev

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

310 users / day
1.23K users / week
2.96K users / month
7.77K users / 6 months
1 local subscriber
21.8K subscribers
2.25K Posts
34.7K Comments
Modlog