News
In Bluesky, they also use your data to train AI, even though the company claims not to do so.
A researcher published a dataset with scraped data from Bluesky users, but has already deleted it.
- November 28, 2024
- Updated: December 2, 2024 at 11:53 AM
Bluesky, the decentralized social network, is at the center of controversy following the recent publication of a dataset on Hugging Face, a community platform for artificial intelligence.
According to 404 Media, the dataset contained one million posts along with user information, obtained by researcher Daniel van Strien through a technique known as scraping, using the Firehose API. Van Strien justified the use of the data to “develop artificial intelligence models, analyze trends in social networks, and study posting patterns,” although he ended up deleting the dataset after realizing that “this approach violated the principles of transparency and consent in data collection.”
The dataset included sensitive metadata, such as users’ decentralized identifiers (DIDs) and specific search tools, which raised concerns for many about the possible misuse of such information. Although Bluesky assures that it does not train AI models with its users’ data, it admits that “it cannot enforce this policy outside of our systems” and that the decision lies with external developers. The company also promises to continue working with engineers and lawyers to address the issue.
Brief update on our ongoing efforts to allow users to specify consent (or not) for AI training: 🧵
— Bluesky (@bsky.app) November 27, 2024, 2:52
And the open and decentralized nature of Bluesky, based on the Authenticated Transfer (AT) protocol, facilitates third parties to access content publicly. An approach that contrasts with platforms like Twitter, where Elon Musk restricted and increased the cost of access to its API to supposedly curb indiscriminate scraping.
Publicist and audiovisual producer in love with social networks. I spend more time thinking about which videogames I will play than playing them.
Latest from Pedro Domínguez
- You will be able to interact in the Broadcast Channels of your favorite creators thanks to this new Instagram feature
- This is Genie 2, the new model from Google DeepMind capable of generating interactive 3D worlds
- This incredible AI is capable of generating ultra-realistic 3D environments with just one image
- Veo, Google's video-generating AI, arrives at Google Cloud with Vertex AI
You may also like
 - Intel presents the graphics cards that will compete with Nvidia and AMD in the entry-level range: “A good GPU for only $249”- Read more 
 - Eight simulation games that look spectacular on the iPhone or iPad- Read more 
 - League of Legends joins the card games with Project K- Read more 
 - The best version of Blasphemous has just been released, and it’s for smartphones- Read more 
 - Marvel Rivals becomes a real hit on Steam and breaks records- Read more 
 - The 30th anniversary of PlayStation comes loaded with discounts and offers- Read more