Predictive Model Identifies Wikipedia Arguments that Will Never Get Resolved

Computer keyboard with two gesturing hands keys

A joint study involving researchers from MIT, the University of Michigan and the Wikimedia Foundation has identified why so many Wikipedia disputes unresolved and developed predictive tools to help improve editorial deliberations. In a paper presented at the recent ACM Conference on Computer-Supported Cooperative Work and Social Computing, Jane Im from U Michigan's School of Information, Amy Zhang and David Karger from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Christopher Schilling from Wikimedia presented a model that predicts whether a Request for Comment (RfC) will go stale with a 75 percent accuracy rate within a week of dispute initiation.

Content disputes on the vast general-reference source can be handled in a number of ways. But at any point, editors who disagree can use RfCs by writing up a proposal or question on the relevant article talk page and inviting comment by the broader community by posting to various noticeboards. It was this approach that was the research focus.

Any editor can initiate an RfC and any editor — usually, an experienced one — who didn't participate in the discussion and is considered neutral can close the discussion. After 30 days, a bot automatically removes the RfC template, with or without resolution. RfCs can close formally with a summary statement by the closer, informally due to overwhelming agreement by participants, or be left "stale," meaning removed without resolution.

The researchers compiled a database consisting of 7,316 RfCs from English Wikipedia dating from 2011 to 2017. Those included closing statements, author account information and general reply structure. They also conducted interviews with 10 of the website's most frequent "closers" to better understand their motivations and considerations when resolving a dispute.

In an analysis of the dataset, the researchers found that about 58 percent of RfCs were formally closed. Of the remaining 42 percent, more than three-quarters (78 percent) had no participant activity to informally end the RfC; in other words, a full third of all RfCs in the dataset were left stale.

Major issues included "poorly articulated initial statements by inexperienced discussion initiators, lack of interest from third-party experienced Wikipedia editors and excessive bickering or contentiousness during the discussion," according to the paper.

"It was surprising to see a full third of the discussions were not closed," said Zhang, a Ph.D. candidate in CSAIL, in a statement. "On Wikipedia, everyone's a volunteer. People are putting in the work, and they have interest ... and editors may be waiting on someone to close so they can get back to editing. We know, looking through the discussions, the job of reading through and resolving a big deliberation is hard, especially with back and forth and contentiousness. [We hope to] help that person do that work."

The "help" provided by the team came in the form of a machine learning model to predict whether a given RfC would close or go stale. The model was developed through an analysis of 60-plus features of the text, Wikipedia page and editor account information. Those details included information on the number of comments, the maximum and average age of participants as well as the difference in their ages, the cognitive tone of the RfC and the sum of edit counts of participations, among many other aspects.

When "trained and tested" on the entire dataset, the best model achieved a 75 percent accuracy, an improvement of 8 percent over a baseline of simply predicting that a given RfC wouldn't go stale.

One day, the researchers predicted, the model could be used by RfC initiators to track the discussion as it unfolds. "We think it could be useful for editors to know how to a target their interventions," Zhang said. "They could post [the RfC] to more [Wikipedia forums] or invite more people if it looks like it's in danger of not being resolved."

The model could also be used for other community platforms involving large-scale discussions and deliberations, the researchers noted, such as planning forums for community projects, where participants weigh in on various proposals. As Zhang explained, "People are discussing [the proposals] and voting on them, so the tools can help communities better understand the discussions ... and would [also] be useful for the implementers of the proposals."

As an outcome of their project, the researchers have introduced Wikum, which helps users break down a large threaded discussion into manageable chunks to tag, group and summarize. "The work of closer is pretty tough," Zhang said, "so there's a shortage of people looking to close these discussions, especially difficult, longer, and more consequential ones. This could help reduce the barrier to entry [for editors to become closers] and help them collaborate to close RfCs."

The paper on the research project is openly available through researcher Jane Im's website.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • DeepSeek on AWS

    AWS Offers DeepSeek-R1 as Fully Managed Serverless Model, Recommends Guardrails

    Amazon Web Services (AWS) has announced the availability of DeepSeek-R1 as a fully managed serverless AI model, enabling developers to build and deploy it without having to manage the underlying infrastructure.

  • The AI Show

    Register for Free to Attend the World's Greatest Show for All Things AI in EDU

    The AI Show @ ASU+GSV, held April 5–7, 2025, at the San Diego Convention Center, is a free event designed to help educators, students, and parents navigate AI's role in education. Featuring hands-on workshops, AI-powered networking, live demos from 125+ EdTech exhibitors, and keynote speakers like Colin Kaepernick and Stevie Van Zandt, the event offers practical insights into AI-driven teaching, learning, and career opportunities. Attendees will gain actionable strategies to integrate AI into classrooms while exploring innovations that promote equity, accessibility, and student success.

  • college student working on a laptop, surrounded by icons representing campus support services

    National U Launches Student Support Hub for Non-Traditional Learners

    National University has launched a new student support hub designed to help online and working learners balance career, education, and family responsibilities as they pursue their education. Called "The Nest," the facility is positioned as a "co-learning" center that provides wraparound support services, work and study space, and access to child care.

  • laptop displaying a glowing digital brain and data charts sits on a metal shelf in a well-lit server room with organized network cables and active servers

    Cisco Introduces AI-First Approach to IT Operations

    At its recent Cisco Live 2025 event, Cisco announced AgenticOps, a transformative approach to IT operations that integrates advanced AI capabilities to enhance efficiency and collaboration across network, security, and application domains.