Twitter’s latest robo-nag will flag “harmful” language before you post

Enlarge / Before you tweet, you might be asked if you meant to be so rude.

reader comments

27 with 20 posters participating, including story author

Want to know exactly what Twitter’s fleet of text-combing, dictionary-parsing bots defines as “mean”? Starting any day now, you’ll have instant access to that data—at least, whenever a stern auto-moderator says you’re not tweeting politely.

On Wednesday, members of Twitter’s product-design team confirmed that a new automatic prompt will begin rolling out for all Twitter users, regardless of platform and device, that activates when a post’s language crosses Twitter’s threshold of “potentially harmful or offensive language.” This follows a number of limited-user tests of the notices beginning in May of last year. Soon, any robo-moderated tweets will be interrupted with a notice asking, “Want to review this before tweeting?”

Enlarge / A screenshot sample of how the notice will look in action. The feature appears to specifically target replies on the site.

Earlier tests of this feature, unsurprisingly, had their share of issues. “The algorithms powering the [warning] prompts struggled to capture the nuance in many conversations and often didn’t differentiate between potentially offensive language, sarcasm, and friendly banter,” Twitter’s announcement states. The news post clarifies that Twitter’s systems now account for, among other things, how often two accounts interact with each other—meaning, I’ll likely get a flag for sending curse words and insults to a celebrity I never talk to on Twitter, but I would likely be in the clear sending those same sentences via Twitter to friends or Ars colleagues.

Additionally, Twitter admits that its systems previously needed updates to “account for situations in which language may be reclaimed by underrepresented communities and used in non-harmful ways.” We hope the data points used to make those determinations don’t go so far as to check a Twitter account’s profile photo, especially since troll accounts typically use fake or stolen images. (Twitter has yet to clarify how it makes determinations for these aforementioned “situations.”)

encourage users to “read” an article linked by another Twitter user before “re-tweeting” it. In other words: if you see a juicy headline and slap the RT button, you could unwittingly share something you may not agree with. Yet this change seems like an undersized bandage to a bigger Twitter problem: how the service incentivizes rampant, timely use of the service in a search for likes and interactions, honesty and civility be damned.

And no nag notice will likely fix Twitter’s struggles with how inauthentic actors and trolls continue to game the system and poison the site’s discourse. The biggest example remains an issue found when clicking through to heavily “liked” and replied posts, usually from high-profile or “verified” accounts. Twitter commonly bumps drive-by posts to the top of these threads’ replies, often from accounts with suspicious activity and lack of organic interactions.

Perhaps Twitter could take the lessons from this nag notice roll-out to heart, particularly about weighting interactions based on a confirmed back-and-forth relationship between accounts. Or the company could get rid of all algorithm-driven weighting of posts, especially those that drive nonfollowed content to a user’s feed and go back to the better days of purely chronological content—so that we can more easily shrug our shoulders at the BS.

Article Tags:

featured

Article Categories:

Technology

Twitter’s latest robo-nag will flag “harmful” language before you post

Share this story

Related Articles