Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).


Behavior of Xqbot

As soon as a page is moved, Xqbot fixes all the resulting double redirects right away, immediately and instantly. Also, the links to the fixed target pages are shown with the prefix "en:" in the edit summaries. I don't like this behavior, because it can lead to serious errors when there is page-move vandalism. The bot should return to its old behavior. GeoffreyT2000 (talk, contribs) 23:49, 11 August 2017 (UTC)

GeoffreyT2000, what was the "old" behaviour? I was under the impression this was how it always worked (minus the wikilink change that appears to happen somewhere around 16 June 2017). Primefac (talk) 00:12, 12 August 2017 (UTC)
Ping to Xqt out of courtesy. Primefac (talk) 00:13, 12 August 2017 (UTC)
I would prefer that double redirect repair, especially of cross-namespace moves, be delayed (a 1-2x daily task perhaps?). Not too uncommon to find confused users moving their drafts in circles (user subpage, user rootpage w/ article title, project space, mainspace...). If I catch it in time I can move it back without leaving a trace by retracing their steps with suppressredirect. If I don't, I can't overwrite those redirects, and need to do page swaps and speedy tags. – Train2104 (t • c) 15:54, 13 August 2017 (UTC)
Some delay would be likely be good. Personally, I think an hour before repairing would be enough in mainspace, and 24 hours in non-mainspace, but I'll defer to the bot op on what's best. Headbomb {t · c · p · b} 11:20, 19 August 2017 (UTC)
Xqbot uses the moves log to find double redirects. New users cannot move pages around thus there is no real risk of vandalism. There are also some sanity checks against potential vandalism; the code is part of the mw:Pywikibot framework and public. Anyway I introduced some delay on my local repository's generator. Hope this helps.  @xqt 11:06, 26 August 2017 (UTC)

Bots without apparent benefit

I am wondering if we have any policy, rules, or consensus on what to do with a bot where a) the bot isn't used by anyone but its operator and b) the operator hasn't edited Wikipedia for anything but the creation of this bot. Basically, a bot which is at best for the convenience of one reader, and at worst not used at all, but still editing every day.

Specifically, I am concerned about the recently approved User:Wiki Feed Bot, operated by User:Fako85. It makes 16 edits a day, to its own space, to subpages of Fako85, and to User:DNNSRNST, which is an editor with one edit (setting up his talk page for this bot, which wasn't even approved at the time). Fako85 has made no edits unrelated to this bot.

The value of having such a bot seems minimal, and I'm not sure that this value is sufficient to outweigh the potential risks (strain on servers? bot account hacking?). Fram (talk) 07:55, 8 September 2017 (UTC)

@Fram: The projects vision is described on the bots page User:Wiki_Feed_Bot. I think the merit of this project should be judged on what it wants to achieve and not on the current usage. I develop this together with @EdSaperia:. He has made much more edits than I did. He also organised Wikimania 2014 in London. I've flown to San Fransisco from Amsterdam at my own costs, just to attend the Wikipedia dev summit and learn more about the culture. I'm a professional developer, but made many improvements to fit the Wikipedia platform better. Things that were not obvious from reading the docs. Demanding that developers also edit a lot is too restrictive in my view, but I wonder what the consensus about that is in the community. We're volunteers and we do our best to make it the best we can. We believe in this bot and that it can be useful for editors and readers alike. Many people at the summit liked the idea and we have some enthousiasts in our personal network. We're planning to make more people use it and will develop the bot further in an agile manner. We'll need permission to edit the user space if we want to test how our bot adds to Wikipedia. If we would need many people to use it before we get the rights we would be in a catch 22. Fako85 (talk) 08:30, 8 September 2017 (UTC)
You have created the bot request in January. This is supposedly for some project. Where is this project discussed? What progress is being made on it? This is all very obscure, and unlikely to reach many people or benefit enwiki in this manner. (I also can't find any evidence of your improvements made as a developer, but in the end these don't really matter here in this discussion anyway). Fram (talk) 08:49, 8 September 2017 (UTC)
Practice is to let WMF folks worry about performance, and hacked bots can be blocked if that ever happens. If the bot causes an issue with performance, WMF people will contact the operator, or block it if the operator isn't responsive. 16/edits a day, however, is nowhere near enough to even show on the radar. Pre-emptively denying bots because they might theoretically operated outside of their terms of approval is counterproductive when there's no evidence to suggest this will actually happen. Headbomb {t · c · p · b} 11:25, 8 September 2017 (UTC)
  • If there were thousands and thousands of such bots that are low-resource-usage and infinitesimal-utility, which collectively used up a significant amount of server resources, it would warrant IMO to demand some sort of "bang for the buck" from bot operators. But from my knowledge that is not the case; and even if it was, it would require a change of policies before applying them to that particular bot (which does not seem to have any unusual risks).
One might argue that Wikipedia is not a code repository or a test server (WP:NOTWEBHOST), and that therefore "convenience bots" should be banned regardless of used resources. But I would argue that the "web hosting" guideline is in place to prevent blatant abuse, and should be restricted to clear-cut cases (someone uploading their vacation photos, a bot mining Bitcoin for its operator...). Otherwise, it sounds like a decision for the WMF, not the community. TigraanClick here to contact me 11:47, 8 September 2017 (UTC)
Fine, it just seems strange that we let people use enwiki as some kind of personal playground, and spend all that time on a BRFA for something without a real use for it. There is no evidence for most of the claims about some project working on this and no obvious way for people here to get involved with it, so it seems to me there is little or no reason to let it continue. Oh, and the number of edits isn't the problem as such, it's the potential resources they use by reading the recent changes log extensively each time. I have no idea how heavy such a read is. But WMF will rarely contact a bot operator here, approved bots are somewhat expected to be a burden on servers; but what when a bot does create such a burden, but for no real benefit? The WMF won't know it, and we don't care apparently... Fram (talk) 11:58, 8 September 2017 (UTC)
Does anyone have any data about how heavy [a read of a large portion of the recent changes log] is? That is actually a point that I did not check when calling this a low-resource-usage bot. TigraanClick here to contact me 12:54, 8 September 2017 (UTC)
Not commenting about future policies, but I have the impression that the above case is not violating WP:NOTWEBHOST since they're user pages related to Wikipedia, without promotional links, which are also not fake or pov-fork articles. —PaleoNeonate – 12:28, 8 September 2017 (UTC)
What is outlined at User:Wiki Feed Bot is far from a "personal playground". The project is in development. If it amounts to something, great. If not, no harm was done. I can't see any reason to stop this bot, or halt development. Headbomb {t · c · p · b} 12:36, 8 September 2017 (UTC)
What is outlined, perhaps. But none of this can be found in reality. This has been done in February 2016 at Wikitech[1], and abandoned then there in May: Fako85 doesn't seem to have done anything else there. Then in January 2017 this came here, to get approved months later after quite a few problems (wrt respecting our policies), and then ... nothing again. The "project" is the bot. There is no indication that anything is still "in development" at all. And it is not as if it is ready; look at User:Fako85/feed/breaking news central europe, which has at least as first article something related to Central Europe (but no breaking news in it), and then the Index of Hawaii-related articles (???), Hurricane Irma, ... This thing is very far from being useful, but not really in development either. Fram (talk) 12:52, 8 September 2017 (UTC)
  • This is a single response to many things above. @Tigraan: once a day it goes over all edits from the day before. I talked to ops about this and at that time they didn't make objections. Let me know if this changed. I can see many possible performance improvements, but we'd prefer to work on better modules and a bigger user group first. @Fran: I think that judging based on one day result is anecdotal. It's a score based filter system and if nothing is there it shows random things. It's a matter of putting in a threshold for the score to prevent output if no breaking-news occurred that day. Breaking news is based on this: [2], but the bot shows something instead of nothing if there are no breaking news clusters. Project process is tracked here: [3]. Last development was end of July as you can see in the git history. One of my favourite quotes: "we're going slow, because we are going far". Arguing that a project is bad and should stop because it develops slow would argue to stop Wikipedia as new developments tend to take very long (understandably in my opinion). In general. I don't understand what you have against this project. The bot can only edit the user space and only if people place the template themselves. It would never pollute anything, that was one of the changes I had to make for the BAG approval. It is an opt-in thing. @Headbomb: thanks for your support. Fako85 (talk) 14:30, 8 September 2017 (UTC)
  • "I think that judging based on one day result is anecdotal." True. I judged it over many days, and gave one as an example. Other days seem to be similar or if possible worse (e.g. here). It looks as if you have put your one-man project in production here way too soon. "the bot shows something instead of nothing if there are no breaking news clusters" is just weird. Fram (talk) 14:50, 8 September 2017 (UTC)
Fram, this is getting close to WP:DEADHORSE territory. Headbomb {t · c · p · b} 15:25, 8 September 2017 (UTC)
I respectfully dissent, Headbomb. If reading all changes from the previous day causes nontrivial server load, then Fram's continued questioning is very valid: this bot's value, although positive, is minimal, and then whether this project looks like it is going to go somewhere is a relevant question to ask.
If the load on servers is insignificant, then yes, I would say to let the bot's creator alone, let them do their stuff and see what happens. But I do not think a WP:DEADHORSE invokation is justified; it would be more of a WP:BADGER and even that I fail to see. Or am I missing a key part of context? TigraanClick here to contact me 16:42, 8 September 2017 (UTC)
The server load is minimal. If it weren't, you'd have heard from devs. See WP:PERF. Headbomb {t · c · p · b} 16:45, 8 September 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Fram: The BOTREQ contains the text: "Currently Wiki Feed does not use the RCStream. We're considering it, but we need some time to implement this as it requires a fair amount of changes to the system.". Maybe it is wise to ask Fako to switch to EventStreams? (((The Quixotic Potato))) (talk) 19:40, 8 September 2017 (UTC)

If I understand the BOTREQ correctly (specifically the edit dated 12:42, 22 July 2017) then the bot will have to check if all images it is using are still usable every 24hrs. Imagine if a lot of people use this bot, then that would mean a massive amount of requests, right? (((The Quixotic Potato))) (talk) 20:06, 8 September 2017 (UTC)

@The Quixotic Potato: This is moot unless that happens and the devs tell us there is a server load issue. It's highly unlikely this will become an issue. ~ Rob13Talk 02:07, 18 September 2017 (UTC)

Wikipedia:Wikipedia_Signpost/2017-09-06/Humour

For those who have missed it this week, something bot-related. Headbomb {t · c · p · b} 02:52, 11 September 2017 (UTC)

Archiving live links - Redux

I knew I had seen a discussion about this before: it is at Wikipedia:Bots/Noticeboard/Archive 11#Archiving links not dead - good idea? Most of the discussants there (obviously mostly fans of bots) seemed to approve of archiving all the reference links in an article, even the live ones. Some of us less technically oriented editors think the practice can be damaging to articles. Recent example, which is the reason I am bringing it up: With this recent edit to the article Barack Obama, the IABot v1.5.2 archived 392 references, adding 74,894 bytes to the article, and increasing its already huge size by 22.6%, from 330,241 to 405,135 bytes. Is that really something that people here think is a good outcome? (The other editor reverted at my request.) Does the bot offer the option of archiving only the dead links, as some of us non-techie people have requested? --MelanieN (talk) 18:04, 17 September 2017 (UTC)

Actually only rescuing dead links is the default behavior. That behavior you linked in the diff has to be requested by the user by checking a checkbox. The checkbox option clearly states that it's optional. As the tool interface's ToS states, edits made on behalf of the user, is the responsibility of the user.—CYBERPOWER (Around) 19:04, 17 September 2017 (UTC)
Thanks for the information. Is there a more appropriate place to discuss whether people should choose that option or not? --MelanieN (talk) 00:39, 18 September 2017 (UTC)
@MelanieN: I am not certain. The best place for this is maybe here or WT:LINKROT.—CYBERPOWER (Chat) 16:37, 20 September 2017 (UTC)
If you start a discussion at LINKROT, please post a reminder here. I thought that this page's archived discussion, referenced above, showed considerable support for not adding links to archives unless the original URLs have died. I, too, have been encountering continuing massive, useless additions of such links. Where it adds little text to a small article, I don't revert. However, adding links in the hundreds, with increased byte counts in the high five figures, to an article with high readership, such as the example given above, I do my best to combat it. At Barack Obama, 3/4 of the citations were given useless added cruft at a cost of 75 kbytes. This has got to stop. Dhtwiki (talk) 23:22, 21 September 2017 (UTC)

HTTPS links: comprehensive source of domains to convert?

I've spotted User:Bender the Bot, User:KolbertBot and maybe others, helpfully converting HTTP links to HTTPS where sites have begun supporting encrypted connections since links were added to articles. It looks as if this is being done a few websites at a time based on prevalence of links to each site and ease of conversion (obviously much easier all round if http://example.com/page corresponds exactly to https://example.com/page without needing to further amend the URL). Has anyone considered using the rulesets established for HTTPS Everywhere to find many, many more sites that can have link conversion applied, including lots of obscure 'long tail' ones that are never going to get noticed by the bot operators? These rulesets are well tested because they are in daily use by HTTPS Everywhere's userbase, so there shouldn't be too many problems encountered where links are broken by the change, even if relatively complex regular expressions have to be applied rather than straightforwardly adding an 's'. See https://www.eff.org/https-everywhere/atlas/ for a list and https://www.eff.org/https-everywhere/rulesets for more info. If this is too complicated, would it be worth instead (or for starters) plundering the resource that is Chrome's HSTS preload list? Each of the sites on it has committed to serving web content through HTTPS only for the long haul, generally redirecting http:// URLs themselves (but thwarted if someone is intercepting traffic on a user's first visit, hence the need for a preload list shipped with the browser), and may have been considered a high-value target for surveillance/man-in-the-middle by the maintainers of the list. Either way, relevant work is being done in this area by outside parties that bot operators here could piggyback on. Beorhtwulf (talk) 16:27, 18 September 2017 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Noticeboard&oldid=801801798"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Wikipedia:Bots/Noticeboard
This page is based on the copyrighted Wikipedia article "Wikipedia:Bots/Noticeboard"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA