Wikipedia talk:Autopatrolled/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Mass stub creation by autopatrolled editors

A couple of recent cases (particularly this one) have highlighted that some editors have created vast numbers of single-source problematic stubs which it is very hard to believe would have survived auto-patrolling. These articles were produced at a rate of 50, 200, or in one case 1000 articles per day, and typically consisted of a single sentence in which a few words were changed. I think it might be a good idea to have some kind of maximum quota on auto-patrolled articles - I have no idea if this is technically feasible, but is it possible to limit autopatrolling to a set number of articles in a given period (e.g., 25 articles in 24 hours), after which the articles are not marked as auto-patrolled? Alternatively, could the NPP feed be set up to flag high-speed and/or near-identical article creation?
My concern is that autopatrolled may be used to avoid WP:MASSCREATION/WP:MEATBOT. Both of the recent cases involved editors working in low-traffic parts of Wiki where NPP is the only place such behaviour would be likely be detected. I also think mass, high-speed article creation may be a symptom of unhealthy/compulsive editing that we should try to address early. FOARP (talk) 12:44, 1 April 2021 (UTC)

We have Wikipedia:List of Wikipedians by article count and Wikipedia:Database reports/Top new article reviewers, but is there anywhere for "top new article creators"? These articles are the reason that autopatrolling was created, and NPP isn't necessarily going to find all problems - if an editor is creating large numbers of near-identical articles it would be better to have editors with relevant knowledge check a small number of them more closely and if there are errors, unreliable sources, or content that would require ongoing maintenance, give advice on how they can be fixed. Peter James (talk) 15:46, 1 April 2021 (UTC)
Hi Peter James, I am concerned particularly by what's happened with two of the top three editors on the List of Wikipedians by article count in the past week (the other one is no longer very active on Wiki). I agree that analysis by knowledgeable editors is always best, but in the case of Carlossuarrez46's Iranian "village" articles this analysis only happened years after their creation, after tens of thousands of them - which to a native Persian speaker were obviously not real villages - had been created. I think this was far too late and some intervention earlier would have been better. NPP isn't going to find all problems, but we would hopefully have seen some flagging up of unreliable sourcing, dubious statements in the articles about no population being reported, and general lack of content. Also, the energy of these editors might have been guided along more healthy, collaborative, and productive avenues through earlier interaction with other editors. FOARP (talk) 08:55, 2 April 2021 (UTC)
The sources are not obviously reliable or unreliable, and where there are differences with the Turkish Wikipedia, the English articles I checked were closer to the population stated in a government website from 2020. For the places in Iran, it isn't clear that a place with no population reported is not a village, particularly as most of the places that can be identified look like villages in Google satellite view; in the UK since 2011 some parishes don't have the population reported and instead it is added to others (because an output area can cover multiple parishes or parts of parishes) so it could be something similar. Their names look more plausible than some in the United States or even the UK; how likely is it that the reviewer would be a native Persian speaker? Instead of limiting autopatrol, which is likely to lead to WP:MEATBOT-like patrolling or a backlog, I suggest taking a small number of articles and looking at them more closely, with input from editors involved in content creation in the relevant areas who are not involved in NPP. Peter James (talk) 11:04, 2 April 2021 (UTC)
I'm not trying re-litigate the Turkish village issue (though an earlier conversation about that would obviously have been better) or even the Iranian "village" issue (though, again, earlier would have been better). The consensus in the AN/ANI discussion about both were anyway very clear. I am concerned about mass creation in general. Looking at the top ten all-time creators of articles on EN-wiki:
  • 1) Not editing much any more, appears to regret their mass-created articles.
  • 2) Posted a very concerning message on their talk page and hasn't edited since is now back.
  • 3) Retired, facing Arbcom proceedings to desysop.
  • 4) Not editing much any more.
  • 5) A deactivated bot, written by someone who has quit.
  • 6) Blocked indef for massive Copyvio.
  • 7) A deactivated bot, written by someone who doesn't edit much any more.
  • 8) Desysopped for inactivity, hasn't edited since 2019.
  • 9) Still active.
  • 10) Blocked indef for sockpuppeting.
Even allowing for natural attrition, I cannot look at the above (editors who between them wrote 562,179 articles) and not see evidence that mass article creation may be a symptom of underlying problems, and that at least some level of checking/intervention might have directed them along more productive avenues. Autopatrolled in at least two of the above cases prevented that. I agree that knowledgeable input is better - but how is that ever going to happen if the creation activity is taking place on a very low-traffic part of wiki and the user is auto-patrolled?
(PS - for comparison, of nos 300-309 on that list, two had quit and one was indef blocked) FOARP (talk) 12:38, 2 April 2021 (UTC)
I've looked at the patrol log and it's likely that these articles would have survived just as other stubs such as Great Work, Cornwall have. Peter James (talk) 17:27, 2 April 2021 (UTC)
The present patrol log doesn't flag mass-creation, if it did, the editors might ask if the requirements of WP:MASSCREATION are met. Without it being flagged, no-one will know that the article is mass-created. Looking at what's happened to the top-ten articles creators, all of whom were mass-creators, can we really say that this should not be an issue of concern? FOARP (talk) 19:28, 2 April 2021 (UTC)
The patrol log is not the place to flag it; a separate list is needed. Peter James (talk) 19:40, 3 April 2021 (UTC)
@FOARP, "natural attrition" means that about 70% of people who create an account never make any edits at all. Of the ones who manage to make their first edit, only 30% make a second edit. Many of the people (and bots) in this list were here in the early days, and very few people have continued to be active for 15 or more years. I don't see this list as evidence of a problem. I see this list as having a surprising proportion of success stories: bots that weren't needed any longer, people who contributed significantly and then found other things to do with their lives, and editors who are still here. WhatamIdoing (talk) 05:53, 3 April 2021 (UTC)
I honestly don't see how anyone can look at that list (at most 2 still active, 2 indef blocks, 2 desysops, 4 retirements/disengagements, out of ten of our - by one measure - "best" editors) and see "success". FOARP (talk) 08:08, 3 April 2021 (UTC)
Only one desysop, and that was for inactivity. One of the indef blocks was not the user's main account. Bots used for specific tasks and not ongoing maintenance have to stop when their tasks are complete. Peter James (talk) 19:40, 3 April 2021 (UTC)
Carlos is almost certain to be desysopped given the current votes at Arbcom and has anyway retired which means he will be desysopped anyway. FOARP (talk) 20:25, 3 April 2021 (UTC)
  • I'm not sure what gets past NPP these days but surely the content and sourcing of Ab Azhdun would have raised some sort of flag, even among editors unfamiliar with the region? –dlthewave 23:09, 3 April 2021 (UTC)
    A reliable source was used, and the content was thought to be verifiable, but what happened was that the source was misunderstood and it was assumed that all of the places were populated places or at least formerly populated (even if the census doesn't record the population) and that the census was sufficient for official recognition. The Persian article for the relevant concept is linked to our article Hamlet (place) but that isn't the definition used for the census. Some of the places satisfy the relevant guidelines, it's just that we don't have a source to say which fit the usual definition of populated places and which are only farms or industrial sites. Peter James (talk) 10:08, 4 April 2021 (UTC)
To be clear, this place (the name of which appears to mean something like "Father Abzhan" in Arabic, which since this is Khuzestan appears to be the relevant language) might well have been somewhere where people did live or were recorded as living, it just isn't an actual community, it could well just be a house. But if someone is banging away creating 1 every 90 seconds for 1-2 hours a day for years, is that really something that should be allowed to go on without any possibility of intervention? Is autopatrolled a licence to create 50-100 articles a day for years with no consensus or sense-check? FOARP (talk) 10:33, 4 April 2021 (UTC)
There are villages with similar names. Articles can be created without consensus (most new articles are), there are just the content policies and guidelines. How likely is it that a patroller would know that something typically translated as "hamlet" has another definition in the census? Peter James (talk) 11:38, 4 April 2021 (UTC)
It's understandable that the "vilage" mistranslation would have slipped through, but surely the patroller would have known that being listed in a census table doesn't constitute "legal recognition"? –dlthewave 12:32, 4 April 2021 (UTC)
Depends what the census says it is. Peter James (talk) 12:43, 4 April 2021 (UTC)
  • These kind of mass creations are basically the reason autopatrolled exists, with the assumption that the person getting it can be trusted not to abuse it. Many editors mass-create articles on e.g. geographic places or sportspeople with no issues, and dumping all of these into the new page queue would completely overwhelm NPP. The problem is that over the years, the minimum criteria has been significantly lowered with almost no discussion (see #Revising the minimum criteria); and that for a time some admins were in the habit of giving it to any "trusted" user with minimal review. This is exposes an inherent flaw with the permission, in that an editor can be given it because they created 25 fine articles on subject X, then go on to create 1,000 terrible articles on subject Y and nobody will notice. I was involved in several of the cases listed above and a common theme is that they were given autopatrolled way-back-when and then nobody ever checked their creations again. It worries me that with over 5,000 users now having autopatrolled, it only takes a very small percentage of them "going rogue" to create a huge mess.
I think we could substantially reduce the risk associated with autopatrolled by a) revising the minimum criteria, as I suggested above, so that it again is only given to editors who are "highly active"; and b) making it by default a time-limited grant, i.e. for the length of time the editor expects to be creating articles in volume, which is periodically reviewed.
We could also consider a WP:SWEEP-style review of the users who already have autopatrolled to see whether they still need it. – Joe (talk) 08:41, 5 April 2021 (UTC)
If we are going to review the criteria, lets focus on the problem. As I understand it we are concerned about longterm autopatrollers who either don't raise their standards in line with rising community standards, or who become more "efficient and productive" at producing articles, but do so by lowering their standards for sourcing etc. It may be that this is something we can predict by looking at someone's fortieth to fiftieth articles. I suspect it is more likely to be something we can check for by getting a list of currently very active article creators in order by when they became autopatrolled, and check some random articles of their's to see if they are still working at an acceptable standard. Alternatively, change the Autopatrolled system so that Autopatrolled only means that 99 out of every 100 of your articles are Autopatrolled and get a report on autopatrolled editors whose hundredth article is proving contentious/getting deleted. Once we've done that for a while we can review the stats to see if there are any patterns among the autopatrollers who turn out to be problematic other than the ones who start cutting corners and mass producing articles. I'm assuming that new page patrol could handle 1% of currently autopatrolled articles coming its way, but we will likely need to call out for some extra volunteers at NPP. ϢereSpielChequers 09:05, 5 April 2021 (UTC)
That's one problem, and such a check sounds like a good idea. But I think there are other, related vulnerabilities in the current system, e.g.: editors who don't actually understand the core content policies, but that isn't shown in their first 25 creations; editors who let their standards slip after getting autopatrolled; POV-pushing or paid editors who deliberately create "clean" articles to get the right, so their subsequent creations are less visible. No doubt all of them are very rare in practice, but I've seen instances of them all, and having a large number of autopatrolled editors turns that small percentage into a bigger risk. If we see NPP as a kind of content quality "firewall" and follow the principle of least privilege, it doesn't make sense to maintain a lifetime exception from that firewall for users whose volume of creation has a negligible practical effect on the size of the NPP queue, or who needed it once but don't any longer. – Joe (talk) 09:58, 5 April 2021 (UTC)
Having one percent of your articles not be autopatrolled would address all the other problems you mention, or at least give us a chance of spotting such editors. Especially if the 1% was a random 1% of someone's creations rather than strictly every 100th. As for the long tail of people who create a trickle of articles, individually yes NPP wouldn't notice any one of them losing the right. But there are over four thousand accounts with the autopatrolled right, if you took the autopatrolled right away from thousands of the less active ones you would deal with tens of thousands more articles through NPP each year, and you might lose some of the existing reviewers because you've just swamped them. I suspect that the damage from speedy patrolling and delays in patrolling would be greater than any gain. When we tightened the BLP rules I remember a bunch of editors lost Autopatrolled status because they were creating unreferenced BLPs. I suspect we could do something similar if we searched for Autopatrolled articles usin particular unreliable sources. Or alternatively get someone to create a list of people who have the autopatrolled flag, have created an article in the last month, but who don't have pending changes reviewer. Then call for some admin volunteers to go through that list and change one or other flag. ϢereSpielChequers 15:57, 5 April 2021 (UTC)
Yes, I meant to say that I think that's a good idea (though I wouldn't know how to go about implementing it), but that because the concern is not just users who got autopatrolled a long time ago, we also need to make changes to the current criteria for granting it to avoid these problems simply reoccurring. – Joe (talk) 17:49, 5 April 2021 (UTC)
It would require an IT change, not a big one I assume, but a WMF one or at least phabricator one, and that's a site I have long given up on. But I'm sure we could get a bot to sample a random 1% of new autopatrolled articles into a separate list or unseen category. You then need to recruit people to review the list and either remove entries or flag problems. I'm guessing there would be some initial enthusiasm if there was a signpost article or a mention in the admins newsletter, but if the ratio of good to bad articles was as high as I suspect it could be, people will quickly lose interest. I'm not sure we need to change the current criteria, even if there are people who pass the current criteria and then lower their standards. It is reasonable to point out to an admin who gave autopatroller to someone who was creating substandard articles, but who could spot the person who was going to lower their standards after getting autopatrol? I have appointed a lot of Autopatrollers, but I do so from lists of people who have created lots of articles and don't have the right. Call me a cynic, but I am much more suspicious of those who ask for the right (and I assume most of those who get it that way are fine). I would hope that badfaith article creators who qualify for Autopatrol were rare. But there may be ways to pick them up with key word searches, or lists of new articles on living people and extant companies that only exist on EN. In particular there are various peacock terms that one could trawl through, we have tools for that. What we really need is an AI bot that searches for likely bad articles in mainspace - we already have bots that pick up copyvio. I'm pretty sure the technology exists, but getting the WMF to run with it won't be easy. ϢereSpielChequers 21:30, 5 April 2021 (UTC)
Agree that any sampled articles from autopatrolled editors would have to be part of the normal NPP stream otherwise enthusiasm would dry up. Agree that 1% would be a decent ratio (I hope that a random sampling of Carlos's 40,000+ Iranian geostubs would have turned up the problems with them, I also hope that a random sampling of Ruigeroeland's massive copyvios would have detected them as well). FOARP (talk) 19:11, 6 April 2021 (UTC)

Autopatrolled user creating low-quality GEOstubs

USER:Belanov87 has created nearly 800 stub articles all (?) related to Spanish localities and was recently granted the Autopatrolled bit. I've reviewed a number of these articles lately and seen the following patterns:

  • The articles are cited to only two sources, one of which is 2ua.org, a site of dubious reliability.
  • The other source is the INE database, however some of the articles do not have listings at this website (e.g., Colldarnat), and others show zero population on this database (e.g., Egulbati).
  • Reviewing the articles, they look very much like they were bot-produced or WP:MEATBOT-produced. The English is odd, the rate of production is on the order of a ~2 minutes, as discussed above the sources don't always actually mention the locality.

I discussed this with Hog Farm and SandyGeorgia and we thought we ought to raise these concerns here. None of the above appears to be what is expected of someone with the autopatrolled bit. Of course it is very possible that there is a reasonable explanation for all of this, in which case it would be good to hear from Belanov87 here. FOARP (talk) 20:33, 3 December 2021 (UTC)

Agree with the above. I looked at some of these, and some have quite lengthy articles in other-language wikipedias, indicating that they are probably notable, but agree that a lot of these look doubtful. I see a lot of them include the phrasing in [name] province, Spain, Spain, which indicates that these are either being (semi-?) automated or thrown together formulaicly without looking at this content, due to the senseless repetition of Spain. With the fast rate of creation, I think it's obvious that minimal effort are going into these. I don't think autopatrolled is designed for high-quantity, low-quality situations. Hog Farm Talk 20:40, 3 December 2021 (UTC)

When I reviewed contributions prior to assigning autopatrolled, I didn't check for the 2-minute rate. I guess I should have and agree that bot-produced work isn't what autopatrolled is for. I shall remove this for now. If there are dissenting views, please say so and we'll discuss it. Schwede66 03:26, 4 December 2021 (UTC)
  • (Edit conflict) I think this is a good lesson in red flags to look out for at WP:NPP and WP:RFP/A; in this case an admin noticed the mass stub creation and thought it was a non-issue. I'm wondering if experienced editors are working off of an older understanding of how these stubs are handled even though the current GEOLAND guideline has been in place for many years. And Schwede66, I trust you'll take this as constructive feedback, I'm not trying to single you out. –dlthewave 03:34, 4 December 2021 (UTC)
    Dlthewave, no worries. Nobody is flawless (well, I for one am not) and there is nothing wrong with pointing something out in a fair way so that others can learn from it. Schwede66 03:39, 4 December 2021 (UTC)

Administrators will no longer be autopatrolled

A recently closed Request for Comment (RFC) reached consensus to remove Autopatrolled from the administrator user group. If you are an administrator, you may, similarly as with Edit Filter Manager, choose to self-assign this permission to yourself. This will be implemented the week of December 13th, but if an administrator wishes to self-assign they may do so now. Additionally, there is some agreement among those discussing implementation to mass message admins a version of this message. To find out when the change goes live, or if you have questions, please go to the Administrator's Noticeboard . Best, Barkeep49 (talk) 19:58, 7 December 2021 (UTC)

Project page move with potential functional repercussions

Per consensus in the move request for the page, I have moved Wikipedia:New pages patrol/Redirect whitelist to Wikipedia:New pages patrol/Redirect autopatrol list. I anticipate that it is possible that there may be bots or other tools that rely on the contents at the former title, and encourage anyone maintaining such properties to update them accordingly. BD2412 T 06:08, 23 April 2022 (UTC)