adding in the overflow users

define retention scores

looping it

sophisticated survival analysis

making account periods df

Blog post

Wikimedia's ORES platform has long been using AI to score revisions on their quality. However there is no way to score a user, rather than their edits, which is why I created the meta-classifier built on top of ORES to judget a session. One application of being able to judge from their edits is that after just a few edits we can start to indentify which users look strong and give personalized personalized mentoring to them. Such a bot already does this on Wikimedia, known as "HostBot" which each day invites editors who recently joined to a mentoring forum called the "TeaHouse". In order not to overwhelm the human-respondents at the Tea House, HostBot used heuristics to pick a select few. I built a AI-powered version of HostBot to perform the same function, and with community blessing we conducted a 3 month A/B test between HostBot and HostBot-AI. Here are those results.

Classifier Measure

Restricting Samples for Balance

Retention measures

The number one concern that Wikipedia for Wikipedia are to "retain" newcomers, that is to have them continue editing after their intitial joining—this is also known as "survival". The official retention measures from the Wikimedia Foundation are defined by a "trial period" (the inital spurt) the "suvival period" (when they return) and how many edits in the survival period define a return. (See a detailed explanation)[] . We looked at 10 retention measures for different paramter definitions of retention. Finally, for each measure, we conduct an independent t-test between the retention data (binary outcomes) of HostBot versus HostBot-AI. The table below outlines the results.

2015 Paper Measures

ai_heur_per_diff ai_heur_p_value
is_surviving_1_1_1 -1.076 0.467
is_surviving_1_3_1 0.912 0.026
is_surviving_3_1_1 1.750 0.000
is_surviving_3_1_5 0.620 0.053
is_surviving_4_4_1 2.935 0.000
is_surviving_4_4_5 2.425 0.000
is_surviving_8_16_1 -2.261 0.010
is_surviving_8_16_5 -1.459 0.026


-- interpretation: in the short and medium term measures AI is better, but heuristic is better long term.

Survival in Window


Survival % over Time