From Wikipedia, the free encyclopedia

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).

InternetArchiveBot notices about nothing but archive-url additions

Where's the place to "seek out a community discussion" on "a community bot" (whatever that means) about getting a bot to stop leaving a particular kind of pointless message? The InternetArchiveBot does various things, and even leaves some helpful messages, but when it leaves a note on an articles talk page that all it did was provide an archive-url to a cite that didn't have one, this is pointless, annoying bot-spam. We don't need to know that it did something trivial that no one sane would question, and we already know – anyone watching the article already saw the edit, so now they're getting a second watchlist hit for the same thing for no reason.

I went to the bot's talk page, and it isn't editable except by admins. I got to the author/operator's page, which directed me to file a ticket about it a Phabricator. So I did [1]. The response to that was a testy "The bot is currently approved to run with these message.", which is a silly thing to say. All the bots are approved to do what they do or their operator would be in trouble and the bot would be blocked. I was told "The last discussion regarding them had no consensus for change", which means it has been discussed before and other people are tired of these messages, too. "If you feel the bot should stop leaving messages, please seek out a community discussion. This is a community bot". I see a bot requests page, which seems to be only for asking for bots to do stuff not to stop doing them, and isn't really a discussion page; and the noticeboard, which appears to be for reporting bugs and policy violations.

So, I'm not really sure what the process or venue is. PS: This isn't about ALL InternetArchiveBot notices, just the the no-one-will-care pointless ones.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  12:50, 4 October 2017 (UTC)

@SMcCandlish: Can you please provide some example Diffs below of the edits you have concerns with? — xaosflux Talk 12:58, 4 October 2017 (UTC)
Sure, any of these [2]. The bot leaves various messages we do care about, but this one is just wrong. Its instructions that we need to go look at what it did when all it did is add an archive URL (and the same page watchers already saw it do that) makes no sense. (Same goes if it did that and also marked a dead original URL as such). We do want to go look when the bot flags actual problems.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:26, 4 October 2017 (UTC)
And to answer your other question, per Wikipedia:Bot_policy#Appeals_and_reexamination_of_approvals, this is the approriate venue to reexamine bot task approvals if your are at an impasse with the operator. — xaosflux Talk 13:10, 4 October 2017 (UTC)
  • WP:VPP would be a good place to discuss. You are proposing a change to IABot after all. I am more than happy to make any changes, but given the runtime of IABot, the changes should have a consensus.—CYBERPOWER (Chat) 13:20, 4 October 2017 (UTC)
    If you think proposing a minor change is an "appeal", then ok. I also don't think this needs any bureaucracy; it's just a common sense matter.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:26, 4 October 2017 (UTC)
    VPR would probably be better - this doesn't appear to be about changing policy, or at least a link there to here - my initial concern would be if there are out-of-scope edits being made (it does not sound like that is occurring). I agree if this is really just a request to change the currently approved scope it needs to have general community consensus measured. — xaosflux Talk 13:48, 4 October 2017 (UTC)
    I was going to guess VPT, but whatever. Why does this need to be a big process at all, though? "Stop spamming us with double watchlist hits" isn't something we really need to hash over at length is it?  :-) Anyway, thats' four separate venues suggested so far (VPR, VPP, bot appeals, and this board).  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:26, 4 October 2017 (UTC)
    (edit conflict)Because other users do not see it as spam. They see it as a meaningful message with a quick link to verify that the bot placed a meaningful archive there, as well as a quick message on how to best deal with bot mistakes.—CYBERPOWER (Chat) 14:29, 4 October 2017 (UTC)
    We can make a big thread out of it if you want, but see the thread immediately above this one. The argument that watchlist spamming is no big deal has been firmly shot down by the community.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:33, 4 October 2017 (UTC)
    (edit conflict)No argument from me that watchlist spamming is a problem, however, up above was a bot task I approved under the misconception that it was a high priority task, when it turns out it wasn't VS a task that has been approved and been in operation for 2 years doing what it does. Plus it's not really spamming a person's watchlist unless they have every article on Wikipedia watchlisted.—CYBERPOWER (Chat) 14:38, 4 October 2017 (UTC)
    I understand the nature of the thread above; what I was referring to was, specifically, the admonition to take community concerns about pointless watchlist-hitting seriously.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  15:15, 4 October 2017 (UTC)
    Argument from me. There are tools that have been built for the supposed watchlist spamming; "I don't want to use it" is not a valid opposing argument. What I will agree with is the talk page spamming itself, as sometimes they can get inundated with bot messages. I don't have a strong opinion of it in either direction though since it does provide a better avenue to check the edit's validity. Nihlus 14:52, 4 October 2017 (UTC)
    Much of that sounds like exactly the reasoning that was just shot down again the thread above this one.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  15:16, 4 October 2017 (UTC)
    @SMcCandlish: It is my view on the matter and has not been "shot down" in any capacity. Please, learn how to politely disagree with someone. Nihlus 16:51, 4 October 2017 (UTC)
    You also reacted with offense when someone in that thread gave you sound advice and a case to examine for why the advice was sound, then you continued to react with umbrage when it was suggested you were misinterpreting the advice as some kind of personal insult. So, I'll pass on this game.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  22:59, 4 October 2017 (UTC)
  • Here is an example of the type of diff SMcCandlish is referring to. It is a notification placed immediately after this edit was made. Primefac (talk) 14:35, 4 October 2017 (UTC) All righty then. Primefac (talk) 14:47, 4 October 2017 (UTC)
    • That's not actually an example, since it has other notices in it, including a claim to have corrected two URLs, which someone may want to check. The example I provided above is, well, an example of what I mean. Here it is again: [3].  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:43, 4 October 2017 (UTC)
  • @Pyxis Solitary:: pinging user who's report about this issue at WT:TPG inspired this change request in the first place. PS: Since apparently any of four venues will work and this is one of them, I think we can consider this the demanded discussion being open.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:39, 4 October 2017 (UTC)
Thank you. Pyxis Solitary talk 05:27, 5 October 2017 (UTC)
  • Clearer statement of the issue: The bot's "I added an archive-url" notice (that the bot did something useful, routine, and virtually unbreakable) is directly equivalent to the watchlist notice of the edit itself, and the talk edit makes another watchlist hit, so that's three notifications about an edit that will never be dangerous. In the particular case of "notify upon archive-url", it grossly violates the spirit though not the exact letter of WP:COSMETICBOT – not in making the actual archive url edit, but in pestering us about it. The entire reason we have the COSMETICBOT rule is the pestering effect, and just the watchlist hits alone were annoying enough to cause this rule to be adopted. Now add talk page spamminess, which impedes talk page usability, wastes editors' time, increases talk page archival maint. overhead, etc. Again, I want to stress that this not about IAB notices that may actually require human review/intervention. Still want those.

    Simple pseudocode fix: if $CHANGESBOTMADE == ($ARCHIVEURL or ($ARCHIVEURL + $DEADURLYES)) then $POSTABOUTIT = no – i.e., if it's done anything at all other than that trivia (including that trivia and something non-trivial), then go ahead and post a notice.
     — SMcCandlish ¢ >ʌⱷ҅ʌ<  15:04, 4 October 2017 (UTC), clarified 23:05, 4 October 2017 (UTC)

    That is not how I interpret cosmeticbot at all. Cosmeticbot applies to cluttering watch lists with changes that render no visual output change to page. IABot is making visible changes to the page. COSMETICBOT does not apply, IMO.—CYBERPOWER (Chat) 15:18, 4 October 2017 (UTC)
    My prediction that someone would focus on the exact wording rather than the underlying reasoning and intent of the rule is exactly why I said "the spirit if not the exact letter of WP:COSMETICBOT". I've now clarified that with some emphasis in my earlier post.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  23:03, 4 October 2017 (UTC)
  • For clarity, completely disagree: archivebot's talk page messages are fine as far as I'm concerned. In other words SMcC has no consensus whatsoever on this. See prior (still open) discussion at Wikipedia talk:Talk page guidelines#Deleting bot notices. I'd suggest to close this Bots noticeboard thread for obvious forumshopping. Anyway, I'm not prepared to discuss this same issue in two different places at the same time. --Francis Schonken (talk) 15:23, 4 October 2017 (UTC)
It isn't forum shopping for the simple reason that the topic arose at a relatively narrow locale Talk:TPG and @SMcCandlish: correctly sought wider input via an appropriate noticeboard, and left a pointer at the original thread saying he had done so. I do the same thing, though I usually mark the original thread closed to prevent this sort of misperception/accusation. I also note there was a couple hours or so between opening this thread and later adding the pointer. I try to post both in quick succession to further reduce avoidable controversy. NewsAndEventsGuy (talk) 16:34, 4 October 2017 (UTC)
I'd forgotten about the other thread; that's why the ping to the original raiser of the issue came so late as well. Whether consensus will emerge to make a minor change to this bot's output will be determined by the discussion; it isn't, as Francis seems to suggest, a necessary precondition for the discussion to happen. And no, obviously not forum shopping, since WT:TPG isn't a venue for proposing bot changes (even if respondents to this thread aren't entirely clear what is a good venue for this kind of request). Francis and I have had an on-again-off-again personality dispute dating back to the 2000s, and I've learned not to react much to these jabs from him. PS: The two discussions are actually distinct: this is is about whether to tweak the bot's messaging; the TPG one is about whether to archive or just delete the bot messages when they're old (the off-topic part of it has been closed and pointed at this discussion).  — SMcCandlish ¢ >ʌⱷ҅ʌ<  22:28, 4 October 2017 (UTC)
  • All of the proposed alternative venues have been notified of this discussion, to centralize.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  22:51, 4 October 2017 (UTC)

I am the editor that created the discussion in Wikipedia talk:TPG about bot notices in talk pages. In regards to the InternetArchiveBot and its announcement about modification of external links: what's the point of keeping these notices on an article's talk page after an editor has checked the links and found them okay? Pyxis Solitary talk 05:27, 5 October 2017 (UTC)

I for one find the InternetArchiveBot notices useful. Though the bot is getting better, it doesn't always pick the right archive URL and sometimes misclassifies links as dead; it's also quite possible that a link that it detects needs manual updating. The talk page notices, which as far as I know only show up when the bot adds a (possibly invalid) archive URL, are a useful way of keeping track of what it does, and serve to show that a human has indeed OK'd the bot's changes. The notices also serve as a handy way to get to the bot's interface, which I've used several times. Graham87 08:05, 5 October 2017 (UTC)

If an editor checks the modification(s) by the archive bot and finds them to be okay, the value in the bot notice is changed to "true" which indicates that the modifications were correct. If the editor finds a problem, the value is changed to "false".
So ... if the modifications are reviewed and checked as being okay ... what's the point of keeping the notice on the talk pages? (I changed the indentation of my comment directly above yours because it's easily overlooked.) Pyxis Solitary talk 04:05, 6 October 2017 (UTC)
If the editor finds a problem, the value isn't changed to false ... the checked= value only notes whether the links have been checked or not. As for keeping the notices on the pages, I think they can be treated as any other discussion ... and archived if necessary. I feel a bit uneasy about the idea of removing them, as I do for any talk page message ... but I probably have a more extreme view that talk pages should be a record of all discussions than most of the Wikipedia community. Graham87 15:45, 6 October 2017 (UTC)
The "checked" value can be changed to "failed", if some links are found wanting. In the not too distant past, when I was constantly checking the bot's output that affected articles on my watchlist, I used that value as well as listing the failed links underneath. My practice must have been an outlier, but it was due to the prompting of the talk page message that I checked so carefully, and found much that needed checking, before CyberbotII/IABot improved considerably. In any case, I am someone else who thinks the bot's talk page messages helpful, although they might seem verbose, especially when so many of them do go unchecked, and perhaps are less needed as the bot and the internet archives themselves improve. Dhtwiki (talk) 05:41, 7 October 2017 (UTC)
What's disturbing is that I'm still getting false positives. Hawkeye7 (discuss) 05:52, 7 October 2017 (UTC)
False positive in that IABot declares a link dead when it isn't, or that it thinks an archive snapshot is useful when it isn't? Both can happen and, I think, are both instances of false positives. Dhtwiki (talk) 21:28, 7 October 2017 (UTC)
Both are happening. The former is still too common. It happens when the Bot thinks a site is down but it isn't. It used to occur when a site went down temporarily, but now we're getting into weird cases where the Bot cannot connect but apparently the rest of us can. This is usually the Bot's fault, but not always; in one recent case a site was returning at HTML error code but still rendering the page okay. The second is less common but does happen; usually it is the internet archive's fault. Hawkeye7 (discuss) 21:15, 17 October 2017 (UTC)

Agree that these posts are a waste of time and bandwidth I notice ClueBot NG doesn't do the same thing whenever it reverts vandalism. It simply leaves a link in the edit summary asking others to report false positives. I don't see why something similar can't be implemented here - in the example SMC provides, the summary for the edit the bot is referring to simply reads, "Rescuing 1 sources and tagging 0 as dead. #IABot (v1.5.4)". There's plenty of space in there for a link like ClueBot NG leaves. It's one thing to alert users to edits like these, but there's a better way to do it, if it needs to be done at all. Zeke, the Mad Horrorist (Speak quickly) (Follow my trail) 14:13, 7 October 2017 (UTC)

One difference between ClueBot NG and IABot is that the former bot (usually) decides on the basis of, and reports, vandalism that both fits on one screen and whose vandalistic attributes are immediately apparent (e.g. article text replaced by the word "poopy"). IABot is apt to report many decisions per edit, with changes to text that are apt to be widely strewn throughout the article, and whose validity isn't readily apparent. Therefore, IABot's talk page messages bring a needed synopsis that ClueBot NG usually doesn't need. The idea of having a reporting link, though, is a good one, as I'm not sure that reporting failed links to the talk page ever served well as feedback. Dhtwiki (talk) 21:43, 7 October 2017 (UTC)
  • Agree that it may be worth checking IABot's archive links, so talk page message is not spam. Also the changes should be very close together in time, so should not result in multiple watchlist "events". All the best: Rich Farmbrough, 20:28, 17 October 2017 (UTC).
    At present the error rate for the IABot is low, but still too high to trust it, so its actions still really do need checking. Hawkeye7 (discuss) 21:15, 17 October 2017 (UTC)
  • Close as Working as intended. The bot does have misfires from time to time, so putting the summary on the talk page (to try and flag down human attention) is appropriate. However once the report has been reviewed by a human and corrected, there's no need for the post on the talk page any more, so it can safely be archived. This seems like a rule creep that is going to open a lot of worm cans that would be best left alone. Hasteur (talk) 02:21, 18 October 2017 (UTC)
Most reports aren't checked. Even I don't do that any more. And when I did do it, I'd leave notes as a reply, sometimes to encourage further checking. So, no need to archive these messages, at least not faster than normal talk page archiving. Dhtwiki (talk) 22:37, 19 October 2017 (UTC)

suggest WP:BOTAPPEAL for communications issues. ... when the bot was doing a straightforward edit the talk page seemed completely over the top. That is not to say the bot isn't useful and by the look of it working well. But I think there are a number WP:BOTCOMM issues.

  • The bot's talk page message refers to itself 'I' thereby impersonating a human. This is irksome to at least me one discover its a bot.
  • The bots asking to have its work checked ... but starts referring to external tools in and not remaining in wikipedia.
  • It refers to 'this template'. But if you edit you see this means 'source'.
  • I dont like the fact this Bots/Noticeboard discussion was not mentioned on the bots talk page.

I think as it is a bot it would be better if it admitted to being a bot and gave precise instructions. Rather than [this diff] I think I'd prefer to see something along the lines of:
The Internet Archive BOT has made the following changes:
  URL1 (dead) -> Archived URL
It would be helpful if the modifications can be manually reviewed and set checked=true in the sourcecheck template if the edit was successful or failed if not. For detailed information on InternetArchiveBot see **HOWTO** .. the HOWTO going back to a BOT page or subpage and ensuring the FAQ/HOWTO covered the case of manually checking BOT work first.

  • After someone has checked it with checked=yes the template should simply say:An editor has reviewed this edit and fixed any errors that were found. if you find any issues DOTHIS. but do not leave information splatterd around as seems to happen at present when checked=yes

In summary the BOT looks to be doing some great work but I think its really tricky not to fall foul with WP:BOTCOMM and I think that area needs an improvement. It prefer it didn't make a talk page entry for simple edits but understand that *might* considered necessary.Djm-leighpark (talk) 23:07, 9 November 2017 (UTC)

We're not at a stage of WP:BOTAPPEAL. Follow WP:BOTISSUE first. Headbomb {t · c · p · b} 00:30, 10 November 2017 (UTC)
Apologies if I appear not to be following WP:BOTISSUE ... however did try to discuss on user homepage but told 'no consensus' which brings me to this noticeboard. Following WP:BOTISSUE and after reviewing this discussion I feel I have reasonable concerns for expressing this BOT no longer has concensus for its task due to bot communications edits on the article talk page. I suppose an alternative would be to hide the bot from my watchlist but that only partially solves the issue. I am against a close as working as intended as the BOT's communications are annoying me likely not what was intended ... and likely other community members as well. I was tempted to say agree these posts are a waste of time and bandwidth but I tried to have a hard look at what the edits were saying and am feeling they need to reorganise what they say in more clear and concise and different way that is not offputting. In essence I am strongly suggesting the BOT and sourcecheck template be changed in how they interface/interact on the article talk page and hope that might bring concensus. Thankyou.Djm-leighpark (talk) 05:51, 10 November 2017 (UTC)

Helping the vandals do their work

A provocative title for a cautionary tale. Please see User_talk:Ladsgroup#Helping the vandals do their work for the details of a minor episode of janitors cleaning up the crime scene too quickly / early / inappropriately. Shenme (talk) 23:34, 22 October 2017 (UTC)

This isn't the bot's fault. It did it's job correctly. The fault lies with the user who failed to revert the vandal all the way. Obviously it was a mistake on the user's part, but you can't blame the bot for something it was programmed to do simply because someone else didn't clean up the mess entirely without noticing.—CYBERPOWER (Trick or Treat) 02:08, 23 October 2017 (UTC)
It often happens that inexpert vandal fighters, bots, and editors who make edits without even noticing the vandalism make it harder to undo, rollback, or easily restore the last good version; but that shouldn't ever really make the vandalism "permanent". The altered or lost text can always be restored. It sometimes just takes more investigation, which should give the investigating editor a better grasp of what makes the article work. So, the time spent doing that shouldn't be regarded as a complete waste, and the vandalism as even having contributed to bettering the article. Dhtwiki (talk) 06:40, 23 October 2017 (UTC)
To be honest, I feel IP editors are becoming more and more malicious. The ratio of good IPs and bad ones, is leaning more and more towards bad. At this point I feel like we have more disruptive IPs that productive ones. IMO, we should disable IP editing and require the registration of accounts to edit. It would still be the encyclopedia any one can edit, but you just need to register first. This would also go a long way to counteracting block evasion and sockpuppetry.—CYBERPOWER (Trick or Treat) 13:35, 23 October 2017 (UTC)
My recent estimation of IPs is they very often do expert work. My worst experience was with the IP represented by this account, whose persistence, combined with dynamic addressing, could have been tamped down by aggressive range blocking. In the example given here, putting IP edits under automatic pending review status if, as here, there are large changes without edit summaries, might be doable. There might be the possibility of rollbacks to edits that aren't most recent, etc. If there are hard-case IPs that do everything to circumvent restrictions placed on them, would banning IP editing help? How would it stop sockpuppetry via registered acounts combined with ISP hopping? Dhtwiki (talk) 22:09, 30 October 2017 (UTC)
Well there is the recent cookie block system that was implemented. Familiar hoppers that have a block on their browser cookie are still blocked. If IP editing was disallowed, we could ramp up the aggressiveness a little by auto-hard blocking the IP address the hopper ends up on because of their cookie. This in turns may flush out possible sock puppets. In addition to that, we could administrators another shiny new button. I don't even think we'd need a CU for it. "Block accounts on this IP address." If an administrator clicks it, it gives the potential of swiftly blocking the sleeper accounts if used correctly without having to spill the actual IP address. Just my thoughts.—CYBERPOWER (Trick or Treat) 23:14, 30 October 2017 (UTC)
Using cookies sounds like something that's easily circumventable, as cookie files are easily and frequently edited by their users. What is the feasibility of putting problem IP edits up for review, as is presently done on a per-article basis? The last time I saw a discussion of preventing IP editing altogether, I got the impression that IP editing is practically a fundamental tenet of Wikipedia, at least for some people. Dhtwiki (talk) 00:00, 7 November 2017 (UTC)

Data mining and use

Hello there, I was wondering if anyone could direct me as to how to get started using and mining Wikipedia database dumps? I have downloaded the latest pages-articles.XML.bz2 version. The goal is to mine for a particular string in order to figure out the relative need for a bot and to build a list of pages that would need to be edited if the string is present within the namespace. (Xaosflux sent me here). Thank you for your help. --TheSandDoctor (talk) 16:15, 24 October 2017 (UTC)

@TheSandDoctor: AutoWikiBrowser has a database scanner in it that can utilize those XML files. You can also use Help:CirrusSearch for some regex phrases as well, as that is built into the site software. Nihlus 18:19, 24 October 2017 (UTC)
Thank you for the ping Nihlus as well as for pointing me in the right direction. While I am not yet ready to file a BRFA (just discovered some minor kinks to work out in bot's code/need to make it slightly more intelligent), I now have an idea roughly how many pages will be affected so that I can fill out that field when it is time for the BRFA and I have been able to compile a list of articles that would need changes (so a win-win-win) Face-smile.svg. --TheSandDoctor (talk) 19:27, 24 October 2017 (UTC)
Symbol wait.svg BRFA filed Thanks again for your help Nihlus, BRFA has now been filed. --TheSandDoctor (talk) 04:04, 26 October 2017 (UTC)

G13 helper scrit/semi automated deletion

Hey, all, I was thinking of writing a script to automatically assess G13 speedy deletion requests, after an influx of them today. Basically, I'd write a script that would automatically scan Category:Candidates_for_speedy_deletion#Pages_in_category where, for each page there in the Draft namespace, check to see if the CSD nomination is G13 (probably by testing for inclusion in Category:Candidates for speedy deletion as abandoned drafts or AfC submissions and the second-most recent edit (i.e. the edit before the nomination) is more than 6 months prior, and if so, provide a deletion link on the page. But I don't know if such assistance is too close to MEATBOT-like automation, especially given the use of admin tools, so I figured I'd ask here first in case people think that would need some kind of approval. I figure G13 is low-impact enough (not article space, free refund) and has a simple enough inclusion criteria that it isn't a big deal. Any thoughts? Writ Keeper  18:05, 27 October 2017 (UTC)

You should advertise at WT:CSD and WT:MFD (discussion is prob best had at WT:CSD but could be here as well). — xaosflux Talk 19:03, 27 October 2017 (UTC)
Writ Keeper, are you writing a script to delete the page, or simply add a "delete this page" link in the G13 notice? Because I was under the impression Twinkle did the first and the template did the second. In other words, I think I'm badly misinterpreting what you're asking... Primefac (talk) 12:09, 28 October 2017 (UTC)

IABot v1.6

I thought I would point everyone to the significance of v1.6 of IABot. (Trick or Treat) 21:46, 28 October 2017 (UTC)

Cluebot reversion of good-faith edits

I tried to start a discussion regarding Cluebot on the Cluebot talk page and my comments were archived by the bot without response. I'm concerned about Cluebot reverting good-faith edits, and the effect this may have on potential contributors.

Reading through the Cluebot pages and considering the lack of response, and rapid archiving, of my comment -- it is my feeling that discussions of this nature are not welcomed by the bot operator. It seems to me that the wider community ought to have a voice in how Cluebot is operated and should be entitled to review Cluebot's work on an ongoing basis and discuss the bot's settings and edits without having to fill out forms and have the discussion fragmented. I am concerned that the characterization of the 0.1% "false positive rate" used by the bot's proponents, though useful technically, belies the substantial number of good-faith edits this bot is reverting. Since it has been some years since the bot was approved, I think it's appropriate to review the work it is doing in light of the current editing climate and the evolution of the bot itself (and its settings) over the years.

At a minimum, I believe that the bot's operators and proponents have an obligation to take these concerns seriously enough to discuss them.

While mistaken reverts can be undone, the frustration they may cause to a well-meaning, fledgling contributor cannot.

The Uninvited Co., Inc. 19:52, 3 November 2017 (UTC)

Seems Cobi (talk · contribs), the bot's owner, hasn't edited since July 2017. Someone may want to send him an email. Headbomb {t · c · p · b} 20:17, 3 November 2017 (UTC)
In the meantime, did you report the false positive? Headbomb {t · c · p · b} 20:19, 3 November 2017 (UTC)
(edit conflict × 2) @UninvitedCompany: There is a notice on that page that says to report false positives at User:ClueBot NG/FalsePositives and not on that page (this is also in every edit summary for the bot). That's how they track issues and make improvements to the coding of the bot. I see no reason to create a protracted discussion. Nihlus 20:20, 3 November 2017 (UTC)

() To answer your two specific questions:

How have the decisions been made over what edits the bot will revert?

— The Uninvited Co., Inc.
The bot uses an artificial neural network to score each edit, and the bot reverts at a threshold calculated to be less than 0.1% false positives. See User:ClueBot NG#Vandalism Detection Algorithm, User:ClueBot NG/FAQ#Why did ClueBot NG classify this edit as vandalism or constructive?, User:ClueBot NG/FAQ#I think ClueBot NG has too many false positives. What do I do about it?.
ClueBot NG Edit Flow.png

What is the best way to have an open discussion about the way this automation is being conducted and its effect on new contributors?

— The Uninvited Co., Inc.
By giving specific, actionable suggestions whose merits can be discussed and the community can come to a consensus.

-- Cobi(t|c|b) 23:03, 3 November 2017 (UTC)

To reply to your comments here:

I tried to start a discussion regarding Cluebot on the Cluebot talk page and my comments were archived by the bot without response. I'm concerned about Cluebot reverting good-faith edits, and the effect this may have on potential contributors.

— The Uninvited Co., Inc.

False positives are an unfortunate technical inevitability in any system that automatically categorizes user content. Human editors suffer from this as failing as well. The only thing that can be done is to figure out where the trade-off should be made. I am certainly open to discussing where that trade-off is, but as you haven't made a proposal yet, I am happy with where it currently is.

Reading through the Cluebot pages and considering the lack of response, and rapid archiving, of my comment

— The Uninvited Co., Inc.

It's the same 7 day archival period you have on your talk page. I was busy and your message at the time didn't appear particularly urgent in nature, and in the 7 days no one else had any thoughts on the matter and so the bot archived it.

it is my feeling that discussions of this nature are not welcomed by the bot operator.

— The Uninvited Co., Inc.

This is a hasty generalization.

It seems to me that the wider community ought to have a voice in how Cluebot is operated and should be entitled to review Cluebot's work on an ongoing basis and discuss the bot's settings and edits without having to fill out forms and have the discussion fragmented.

— The Uninvited Co., Inc.

Free-form discussion is encouraged on the bot's talk page. Or here.

I am concerned that the characterization of the 0.1% "false positive rate" used by the bot's proponents, though useful technically, belies the substantial number of good-faith edits this bot is reverting.

— The Uninvited Co., Inc.

False positive rates are used as standard metrics for any kind of automated classification system. <0.1% means less than one edit is falsely categorized as vandalism out of every thousand edits it examines.

Since it has been some years since the bot was approved, I think it's appropriate to review the work it is doing in light of the current editing climate and the evolution of the bot itself (and its settings) over the years.

— The Uninvited Co., Inc.

Review is always welcome so long as it comes with concrete, actionable changes of which the merits can be properly discussed. Pull requests are even better.

At a minimum, I believe that the bot's operators and proponents have an obligation to take these concerns seriously enough to discuss them.

— The Uninvited Co., Inc.

We do.

While mistaken reverts can be undone, the frustration they may cause to a well-meaning, fledgling contributor cannot.

— The Uninvited Co., Inc.

Of course, but that is hard to measure objectively. Do you have any good metrics on the frustration caused to well-meaning, fledgling contributors? I'd love to see that data, and be able to tweak things to help those metrics go in the direction we want. -- Cobi(t|c|b) 23:39, 3 November 2017 (UTC)

I sense an attitude that the bot is essentially part of "settled policy" and the burden of change falls upon the shoulders of those individuals raising concerns. I don't think that's appropriate for any bot, let alone one that is so prolific, wide-ranging, and discretionary in what it does. I don't see where there has ever been any informed consent by the editing community at large that the tradeoffs made in the design of the bot are appropriate, let alone any ongoing discussion as the bot has evolved.
In response to your question, I did report the edit using the interface provided.
The fact that the "false positive rate" is a standard metric for systems with similar architecture does not mean that it is the most appropriate or only metric that should be used in community discussion of the bot's performance. I think it would be valuable for the community and the bot operators/designers alike to be aware of other metrics such as the number of good-faith edits reverted by the bot per unit time. It would be interesting to see whether that figure matches the projection one might make using the theoretical false positive rate and the gross reverts per unit time. The Uninvited Co., Inc. 18:01, 6 November 2017 (UTC)
Absolute numbers are not useful, that's why we discuss error rate, which includes both false positives and false negatives. Your discussion does not include the latter. There is a balance between reverting too many valid edits versus leaving too many bad edits. Hypothetically, if 10 in every 1000 reverts over some time period are false positives and we up the threshold and bring it down to 2 in 500 reverts over the same time period, then that is 3 good edits preserved but also 500 more vandal edits that someone has to manually review and revert. Who does the burden of reverting these edits fall upon? Where is the line between potentially trading a new editor versus exhausting multiple anti-vandalism editors? What if we instead lowered the threshold and got 30 in 2000 false positives, and thus were fixing 100% more vandalism? This is a system where (broadly speaking) lowering false positives also ups the false negatives. —  HELLKNOWZ  ▎TALK 18:22, 6 November 2017 (UTC)
We know...disable IP editing all-together and force account creation. Less vandalism, and opportunity for such. Better false positive and false negative rates as well. :p—CYBERPOWER (Chat) 18:47, 6 November 2017 (UTC)
I think this would be a much better discussion if we actually had such metrics. I believe the absolute number is a good indicator of the extent of the problem even if it isn't relevant technically. And I believe it is relevant technically, because it indicates the amount of potential improvement that could be achieved by refining the parts of the bot outside the Bayesian filter. A careful review of reverted good-faith edits might, for example, reveal some obvious patterns that could be used to tweak the filter threshold, or the logic around it. The Uninvited Co., Inc. 01:06, 7 November 2017 (UTC)
  • Definitions are everything -- The assertion is made: "<0.1% means less than one edit is falsely categorized as vandalism out of every thousand edits it examines."
    No that's not what it means. It means less than one in a thousand is observed by a human editor as incorrectly categorized, who then follows the not-so-simple process to report it. For those pages no one follows, most of ClueBot's activities are unmonitored. Rhadow (talk) 15:34, 12 November 2017 (UTC)
    Yes, definitions are everything. We don't calculate that number based on reports. That number is calculated by dividing the training data randomly in half and giving half of the training data to the engine to train it, and then giving the rest of the training data to it as if they were live edits. It has to categorize them correctly with a false positive rate of less than 0.1%. That is, for every 1,000 edits we feed it for testing, only one can be a false positive. And this is just the core engine, before any sanity checks like the rest of that diagram after the "above threshold" box. See this FAQ entry. Please don't make uninformed assertions without doing at least a little bit of research. No where have we ever said that the false positive rate is based on reported false positives, and asserting it like you know it as fact is not an appropriate way of bringing up questions or theories. Neither is it appropriate to assert as true that my factual statements, backed up by process and code that are both publicly review-able, are definitively wrong. -- Cobi(t|c|b) 22:20, 12 November 2017 (UTC)
  • Thank you Cobi, for sending us to the definition of the published false positive rate (FPR). This is a second-semester epidemiology statistics exercise, made slightly more complicated by the third-semester definition of training sets used in AI. Publishing a false positive rate (Type I errors) from the training exercise is incomplete if not misleading. It would be more informative to see the whole confusion matrix. ClueBot uses a neural network which, unlike other classification methods, may give superior numeric results, but may never provide an explanation of how it identified a vandal's edit. An outsider needs the whole picture of the results in order to have the same level of confidence you do.
    People would have a higher level confidence in the protocol if they knew the size and the age of the training set. If the training set is not a valid sample of today's production data, then the 0.1% FPR is meaningless. I would like to see the rate of reported false positives each week or month from the actual data, not what the expected rate was from the training set. Rhadow (talk) 15:18, 13 November 2017 (UTC)
    All of the data is available either on the report website or on the Wikipedia API itself. You are welcome to generate any statistics you like. -- Cobi(t|c|b) 18:33, 13 November 2017 (UTC)
  • Hello The Uninvited -- You are correct, ClueBot III cleans up its own talk page Cluebot talk page frequently, so that a casual visitor will fine no evidence of complaints.
    And another observation -- the 0.1% denominator means nothing without a discussion of the numerator. There were 3.3 million edits last month. Of those, it looks like ClueBot makes about 30 revisions an hour or 21,000 a month. I rather doubt there are editors looking at 21,000 reversions a month. No more than 210 miscategorized articles are being reported a month. The more ClueBot does, the better the numbers look, because there are no humans to check on it. Rhadow (talk) 15:58, 12 November 2017 (UTC)
    Before talking about calculations, please get your definitions correct. . The archival settings for User talk:ClueBot Commons are set to 7 days, a common setting for user talk pages. The archives are there for anyone who wishes to look into the archives, and I am certainly open to anyone who wants to revisit discussions that were archived too soon to do so. Just, if you do so, add something to the conversation, because otherwise there is no value in pulling it from the archives. -- Cobi(t|c|b) 22:32, 12 November 2017 (UTC)
  • Is this report based on a single diff (23 October 2017) of ClueBot making a revert of an edit that a human might judge to be good faith, and so would merely click "undo" rather than "rollback"? The most important editor at Wikipedia is ClueBot because reverting vandalism quickly is key to convincing vandals that their time would be better spent at other websites. The most important person at Wikipedia is Cobi, ClueBot's maintainer. I agree that ClueBot's talk is archived too aggressively but some more generic discussion (WP:VPMISC?) about ClueBot's possible mistakes should occur rather than insisting that Cobi personally respond to each complaint. It is impossible for a bot to revert vandalism without occasionally reverting good-faith edits. Experience shows that is also impossible for humans. Johnuniq (talk) 21:55, 12 November 2017 (UTC)
  • I'm with Cobi here. It's par for the course when users that are clueless about how bots work, or the work that goes into them, come up demanding the bot to be perfect, but sometimes I really scratch my head when someone persists/piles on with no knowledge of said topic. Bots are never flawless, neither are humans, getting things right is all about balance. Just like ClueBot NG, it's similar for me with User:InternetArchiveBot.—CYBERPOWER (Around) 02:29, 13 November 2017 (UTC)
Seconded. This bot is very useful with false positives within acceptable range. Humans are also there to correct its errors. —PaleoNeonate – 07:11, 13 November 2017 (UTC)
(Off-topic) People seem to demand perfection for everything and get annoyed when there's a problem. Today the PRESTO card system was experiencing some difficulties and I see people dumping on the system on Twitter saying it has "nothing but problems" when in reality it works fine 99% of the time. Sounds similar to some of the nonsense I've seen on Wikipedia over the years about CBNG (e.g. "ClueBot is clueless" and what other creatively thought-of insults for the bot that has clearly been a WP:NETPOSITIVE, if we looked at bots that way). SMH. —k6ka 🍁 (Talk · Contributions) 01:15, 17 November 2017 (UTC)

2017 Community Wishlist Survey

The 2017 Community Wishlist Survey is up for proposals (November 6-19). You can make proposals and comment on stuff to help the technical collaboration review and organize the proposals, but the larger community input will happen from Nov 27–Dec 10.Headbomb {t · c · p · b} 15:12, 8 November 2017 (UTC)

Filed my proposal here.—CYBERPOWER (Chat) 16:06, 8 November 2017 (UTC)
Mine's here. Headbomb {t · c · p · b} 18:35, 9 November 2017 (UTC)
This entry would be useful for bot makers. A search using Elasticsearch takes < 1 minute compared to minutes / hours with database dumps via AWB. -- GreenC 21:05, 9 November 2017 (UTC)

Flicker 0056

Is this a legit bot? I don't recall any BRFAs for it... CHRISSYMAD ❯❯❯¯\_(ツ)_/¯ 13:48, 9 November 2017 (UTC)

It was just a new user experimenting. I have boldly removed some of the garbled syntax and false bot/admin claims. – Jonesey95 (talk) 13:56, 9 November 2017 (UTC)

Residual issues resulting from the Maintenance script bot's edits in 2015

See this discussion on Meta. Old/invalid accounts were renamed & given new "enwiki" names by the Maintenance script bot but the original accounts apparently weren't closed & account info wasn't migrated to the new/valid accounts... So. Editors are continuing to edit under the old/invalid accounts. Shearonink (talk) 16:30, 9 November 2017 (UTC)

Not sure this is a BOTN issue, especially since it's being dealt with at meta. Primefac (talk) 16:34, 9 November 2017 (UTC)
(edit conflict) No comment here. I was just trying to bring it to someone's attention. I've topic banned myself from BOTN for CIR reasons. GMGtalk 16:34, 9 November 2017 (UTC)
Yes, this probably isn't the completely correct place for a notice about it - I admit I don't operate bots, etc. - but it is an ongoing issue affecting Wikipedia-editing today so I thought it might need some more eyes on it. Could people have two Wikipedia accounts - both the original account that was renamed and the new account - and possibly be editing from both? Anyway, I'll wait for an answer on meta then. Shearonink (talk) 16:48, 9 November 2017 (UTC)
  • This isn't a bot related issue, but a part of SUL finalization. —k6ka 🍁 (Talk · Contributions) 01:17, 17 November 2017 (UTC)

Category:Opted-out of message delivery is now Category:Wikipedians who opt out of message delivery

Notification for anyone who uses that category in their bot. Jo-Jo Eumerus (talk, contributions) 11:00, 11 November 2017 (UTC)

Appeal by Δ (BetaCommand)

The community is invited to comment on the appeal lodged by Δ at Arbitration Requests for Clarification and Amendment.

For the arbitration committee - GoldenRing (talk) 11:13, 18 November 2017 (UTC)

Double-redirect tagging

While the discussion at Wikipedia talk:Double redirects#The bots should operate with a delay has pretty much died down without clear consensus, there's been a suggestion that double-redirect-fixing bots should tag the redirects they fix with {{R avoided double redirect}}. This will help alert human editors to redirects that are left pointing to the wrong location as a result of disputed moves or mergers being reverted. Can this be implemented? Pinging bot operators R'n'B, Xqt and Avicennasis. --Paul_012 (talk) 10:12, 21 November 2017 (UTC)

I propose to file a bug at phabricator that this proposal could be implemented in the script of the common pywikibot repository.  @xqt 11:49, 21 November 2017 (UTC)
I would certainly oppose a bot adding {{R avoided double redirect}}. Move a page like Proceedings of the Royal Society, and then you'd have 48 redirects tagged with that for no real reason.Headbomb {t · c · p · b} 12:31, 21 November 2017 (UTC)
What if limited to redirects which aren't the result of page moves? My original concern was mostly with pages that were changed into redirects and then reverted. --Paul_012 (talk) 23:51, 21 November 2017 (UTC)
Retrieved from ""
This content was retrieved from Wikipedia :
This page is based on the copyrighted Wikipedia article "Wikipedia:Bots/Noticeboard"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA