Updates

Introducing the BeeHive Moderation Service (BMS) for Bluesky

Feb 15, 2025

Introduction

"Trust and safety" has quickly become an unreliable cornerstone of the platforms we socialize on, and as a result, users are losing trust in their safety.

When we use social media platforms, we should be able to use them under many good faith beliefs, like...

That users are contributing quality content with the intent to engage over it with us
That users are sharing content that is not only authentic, but contextualized
That users are honoring and prioritizing transparency, truth, and accuracy

But once you come back to reality you quickly understand that's...not at all, how it works. Social media platforms are simply that - platforms, for anyone's gain and manipulation; of which the smartest actors know how to utilize for their own benefit. As a result, most social media platforms are absolutely flooded with inauthentic or unwanted activity - spam, advertisements, porn, harassment, phishing; you name it.

Over time, different platforms have made claims of "improving authenticity" or "removing bots" but few have been truly successful. When Elon Musk for example acquired X (formerly Twitter), he pledged to rid it of bots, saying

“We will defeat the spam bots or die trying”

Musk tweeted this in 2022, a few months before he officially bought the social media platform. Since the commitment and subsequent purchase, Musk has bragged about "notable improvements" yet as a user, I'm still seeing bots and it's 2025; 3 years post-promise.

The larger consumer problem with trust and safety

First benchmark to address is, what is "trust and safety"? What does it mean, what does it do, what's it supposed to do?

Let me try to explain this in a relatable, creative way.

When you get, a babysitter for your kids, ideally you're not supposed to think about the babysitter. You at some point in time vetted their qualifications or got a trustworthy vouch for them, you met them, trusted that they didn't intend to murder/molest/abduct your child enough to leave them alone with them, now you're supposed to let the babysitter do their thing while you go to dinner/see a movie/attend that conference and if everything goes to plan, you'll come home to all things in order and nobody scarred for life.

Trust and safety associates are in many ways, the babysitters of social media platforms. In the background they sit, reviewing reports, monitoring trends, and (ideally) trying to keep their platform a place you feel is worth engaging healthily. In the most ideal circumstances, you should be able to trust "Trust and Safety" because they should be invested in protecting your user experience in ways that have positive outcomes for you, the user.

Over the last couple years however, varying levels of misuse of "Trust and Safety" efforts have lead platform users to begin to lose trust in, the safety results generated by these efforts.

They Zuck'd it up

In a letter dated Aug. 26 2023, Mark Zuckerberg (Meta) told the U.S. House of Representatives Judiciary Committee that the Biden administration had pressured the company to "censor" COVID-19 content during the pandemic.

"In 2021, senior officials from the Biden Administration, including the White House, repeatedly pressured our teams for months to censor certain COVID-19 content, including humor and satire, and expressed a lot of frustration with our teams when we didn't agree. I believe the government pressure was wrong, and I regret we were not more outspoken about it. I also think we made some choices that, with the benefit of hindsight and new information, we wouldn't make today."

Notice the part of that, quote, "including humor and satire"? This would turn out to become one of the biggest trust-losers for "Trust and Safety".

During the pandemic, told to stay inside their homes, not travel and obscure their faces with masks, Americans turned to the internet to entertain and educate them. They turned to Facebook, to keep up to date with the lives they weren't able to be a part of. They turned to Instagram, to share some of their loneliest moments in all the photogenic bliss possible. They turned to Messenger, to tell their loved ones all the things they were missing.

Meta's trust and safety team in response used the privilege to abuse them. They called their memes "misinformation", they called the snarky remarks shared mid-morning-dirt-snake "medical disinformation". They suppressed those critical of the government, they muted and hid those terrified of a world where we never leave lockdown; in no disagreeable terms, Meta's participation with the Biden administration broke the "trust" in trust and safety, in-between them and millions of users around the world, and on behalf of platforms and companies around the world.

While we've thankfully moved past the COVID lockdowns, the experience many Americans encountered with suppression on these platforms, has raised apprehension towards "trust and safety" in the hands of large companies. Users no longer post content without thinking, "Is this going to be misconstrued? Is this going to get me banned or restricted? Is this going to be moderated against an agenda?". Users aren't signing up for platforms anymore assuming that the platform cares in any genuine way about them. Users don't hear "trust and safety" and think "wow, an entire team working to keep me safe here <3".

"Trust and safety" has become the internet's "HR", and users are making different choices of what platforms they use as a result of the impediment in trust.

So what's different about Bluesky?

Bluesky is a decentralized social media platform that gives users more control over their data and interactions. In a decentralized system, control isn’t held by one central company (like X or Meta), but rather distributed among users and different servers. This allows people to make more decisions about how their data is used and how they engage with the platform.

For example, on Bluesky

Users can select or create their own servers (known as “instances”) with specific rules and communities.
They can customize their experience by choosing algorithms that show them the content they prefer, rather than having content dictated by a central algorithm like on X.
Moderation is stackable, modular, and configurable.

What's different about moderating on Bluesky?

Bluesky did a great job of explaining this, which I must unfortunately cannibalize to a degree.

Bluesky is open

Bluesky employs the AT Protocol, which is an open network of services accessible to anyone, allowing for the backend architecture of a large-scale social network to be more transparent and participatory. The services create a pipeline through which data travels from its hosting location, passes through a data firehose, and reaches various application indexes. This process involves data moving from independent account hosts into a firehose before being directed to applications.

core-architecture-19f45ff4a5049c3ef7d390c351576e29

This event-driven architecture is similar to other high-scale systems, where you might traditionally use tools like Kafka. However, Bluesky's open system allows anyone to run a piece of the backend. This means that there can be many hosts, firehoses, and indexes, all operated by different entities and exchanging data with each other.

core-architecture-many-providers-3a1381c3cd9798d8586e2c6f02c6b4da

Why would you want to run one of these services?

You’d run a PDS (Personal Data Server) if you want to self-host your data and keys to get increased control and privacy.
You’d run a Relay if you want a full copy of the network, or to crawl subsets of the network for targeted applications or services.
You’d run an AppView if you want to build custom applications with tailored views and experiences, such as a custom view for microblogging or for photos.

Moderation is decentralized

In conventional social media platforms, moderation is frequently integrated with other components of the system, including hosting, algorithms, and the user interface. This integration can diminish the adaptability of social networks when businesses undergo ownership changes or when policies are altered due to financial or political influences, thereby limiting users' options to either accept the changes or discontinue using the service.

Decentralized moderation provides a safeguard against these risks. It relies on three principles.

Separation of roles. Moderation services operate separately from other services – particularly hosting and identity – to limit the potential for overreach. A moderation service cannot harm your platform identity as the result of a moderative action, and your identity-keeper cannot use your identity to punish you for moderative cause.
Distributed operation. Multiple organizations providing moderation services reduces the risk of a single entity failing to serve user needs.
Interoperation. Users can choose between their preferred clients and associated moderation services without losing access to their communities and content.

In the AT Protocol, the PDS stores and manages user data, but it isn’t designed to handle moderation directly. A PDS could remove or filter content, but Bluesky chose not to rely on this for two main reasons.

First, users can easily switch between PDS providers thanks to the account-migration feature. This means any takedowns performed by a PDS might only have a short-term effect, as users could move their data to another provider and bring it back online.
Second, data hosting services aren't always the best equipped to deal with the challenges of content moderation, and those with local expertise and community building skills who want to participate in moderation may lack the technical capacity to run a server.

This is different from ActivityPub servers (Mastodon, if you're not familiar), which manage both data hosting and moderation as a bundled service, and do not make it as easy to switch servers as the AT Protocol does.

By separating data storage from moderation, each service provider can focus on doing what it does best without conflict or overlap concerns to cure.

Where moderation is applied

Moderation on Bluesky is done by a dedicated service called the Labeler (or “Labeling service”).

Labelers produce “labels” which are associated with specific pieces of user-generated content, such as individual posts, accounts, lists, or feeds. These labels make an assertion about the content, such as whether it contains sensitive material, is unpleasant, or is misleading.

These labels get synced to the AppViews where they can be attached to responses at the client’s request.

full-architecture-59e8ffa2d4a4426815352f827710c25e

The clients read those labels to decide what to hide, blur, or drop. Since the clients choose their labelers and how to interpret the labels, they can decide which moderation systems to support. The chosen labels do not have to be broadcast, except to the AppView and PDS which fulfill the requests. A user subscribing to a labeler is not public, though the PDS and AppView can privately infer which users are subscribed to which services.

Bluesky hard-codes it's own labeling system, which is the only mandatory moderation service. Past that, you can (and Bluesky recommends you do) add additional moderation services.

app-subscriptions-d25e050a9e46597d3c23aa2a28febb0f

Enter, the BeeHive Moderation Service

A request from a pre-existing client working in governance and public safety, the BeeHive Moderation Service is our effort at connecting with users of Bluesky and in the long term, other decentralized social media networks.

Through the BeeHive Moderation Service, we hope to provide context where it matters.

Defining trust and safety through context, not force

In the cases of most centralized social media platforms, trust and safety operations have a finite and forceful result.

What I mean by this, is if the content reviewer at X or Meta doesn't think your meme is funny, they can just...hit the delete button. And that's that, it's gone, for everyone. Sure you can appeal it in some cases, they can also just ignore you (and typically will). If you're unsatisfied, you can't get rid of them - they're "built in" to the platform. You're stuck with them, no matter what. You're only allowed to see and share, what they indirectly are okay with you seeing and sharing, subject to paid/unpaid bias naturally.

Bluesky's approach to moderation is comparatively different, which is part of why we saw value in creating a moderation service for it. Moderation labels on Bluesky are opt-in/opt-out, and controllable.

Here's how this works; we as a moderation service publish label titles and definitions, along with some other metadata, which is picked up by the AppView and exposed to the user as controls.

CpY7utG

Users can then control what action occurs when a label is applied by us - whether the content is hidden, marked with a visual warning, shown with a content badge, or ignored. This lets users benefit from our labels only to the level of commitment they agree with our decisions at. If a user uses us for a few weeks, and comes to find they agree with what we flag as a scam but less often agree with what we flag as misinformation, they're able to turn off our misinformation flags at their own wish, adopting that liability for themselves.

As we see content ourselves that requires moderation, and our service users report potentially unlabeled/mislabeled content to our attention, we contextualize it using a series of labels; things like "missing-context", "hate-speech", "unverified-cause", and more. This translates in the AppView, to the advisory labels our users begin to see attached to content and accounts. If a user is confused what we mean by a label, they can tap it to get more information. This also caters better into, what we do as active services; investigating, exploring data, and finding fact.

How do I use it?

First, visit the Bluesky account assigned to the BeeHive Moderation Service.

Then, either tap or click the "Subscribe to Labeler" button.

You could also leave a ❤️ if you'd like to be extra kind.

sq4dJtd

Once you subscribe, Bluesky will unlock our labels for you and allow you to configure your moderation/filtration preferences. If you want to use this how we recommend you don't need to make any changes, but you're welcome to at your own choice.

And that's all for now!

We're still making changes to the labels we apply and why we apply them, and working to harden the policies that back those decisions. As we moderate on Bluesky, we'll be sharing transparency-related information here; it's all for the positive in the long term. We hope you enjoy the additional level of moderation and overwatch we can provide through this, and invite you to let us know what more you'd like to see.

Updates

The toolkit that works for you and your people

Managed antivirus

Managed DNS

Endpoint security

Vulnerability scanning

Brand protection

Incident response

Cloud security

From the people who make all things possible

Investigative services

Bluesky moderation

What we're experimenting with

BeeHive Weather

OpenBanlist

MemoryLane

Sentri

Introducing the BeeHive Moderation Service (BMS) for Bluesky

Introduction

The larger consumer problem with trust and safety

They Zuck'd it up

So what's different about Bluesky?