Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

JCW-CleanerBot 3

Operator: Headbomb (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:04, Sunday, December 10, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available:

Function overview: Remove {{italics title}} from pages with {{Infobox journal}} and {{Infobox magazine}} on them. The functionality is already provided by the infobox so is redundant. While this is technically WP:COSMETICBOT, the double-italics thing can confuse people and gets copy-pasted in other articles because of highly-visible pre-2010 leftovers. It also causes issues if |italic title=no is set, since there's a clash, and the article title will remain italicized. I plan on running this with genfixes enabled.

Links to relevant discussions (where appropriate):

Wikipedia talk:WikiProject Magazines#Removing pointless italics title templates from articles with a bot
Wikipedia talk:WikiProject Academic Journals#Removing pointless italics title templates from articles with a bot

Edit period(s): One time run

Estimated number of pages affected: One time run, ~715 articles for {{Infobox journal}}, ~433 for {{Infobox magazine}}

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: Find \{\{(Ital|Italic|Italic title|Italic title infobox|Italicised title|Italicisedtitle|Italicize title|Italicized title|Italicizedtitle|Italicizetitle|Italics|Italics title|Italicstitle|ITALICTITLE|Italictitle|Redirect italic title|Title italic)\}\}, replace with nothing. Running only on pages with {{Infobox journal}} and {{Infobox magazine}}

Discussion

  • This needs discussion at an appropriate WikiProject (or Village Pump). Gaining consensus shouldn't be done at the BRFA itself. ~ Rob13Talk 14:41, 10 December 2017 (UTC)

UsuallyNonviolentBot 3

Operator: Jc86035 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 08:45, Sunday, December 3, 2017 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s):

Source code available: movepages.py (standard pywikibot)

Function overview: Rename articles to format "Line number (system)" and similar formats

Links to relevant discussions (where appropriate): Wikipedia talk:WikiProject Trains#RfC: Railway line disambiguation

Edit period(s): One-time run

Estimated number of pages affected: about 400

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Pinging RfC participants: Useddenim, Anomalocaris, oknazevad, SMcCandlish, Sb2001, The Bushranger and Dicklyon. Jc86035 (talk) 08:46, 3 December 2017 (UTC)

The bot will be renaming most of the below articles.

List

I have not included rail lines which use comma disambiguation but are disambiguated by place rather than by system, since those may be considered correct unless decided otherwise by the ongoing RfC on comma disambiguation.

There are some ambiguities not addressed by the line disambiguation RfC:

  1. Should U-Bahn and S-Bahn lines be named like "U1 (Berlin)" or "U1 (Berlin U-Bahn)"? Should the ZVV (Zürich) lines be named the same way?
  2. Should lines currently disambiguated by city be disambiguated by system?
  3. Should lines currently disambiguated by country be disambiguated by system?
  4. Should numbered lines where only one article exists for lines named that number (e.g. Line 23, Shanghai Metro) be named just "Line 23" and similar?
  5. Should the Île-de-France tramway lines be named like "Line 11 Express", "Line 11 (Île-de-France tramway)", "Line 11 Express (Île-de-France tramway)", "Line T11 Express", or…?
  6. Should Paris Métro Line 14 (1937–76) be named "Line 14 (Paris Métro, 1937–1976)", "Line 14 (Paris Métro 1937–1976)", "Line 14 (1937–1976)", "Line 14 (1937–1976, Paris Métro)", or…?

Discussion

It's unclear to me why we're renaming these things in this way; the "Paris Métro line 10" style is much clearer, even if its a descriptive name rather than a proper name (it's a proper name followed by a line designation that railfans also want to call a proper name but which few other people would agree is one, any more than aisle 3 at my local grocery store is "Aisle 3" and a proper-noun phrase. The clerks who work that aisle probably think of it that way in their little microcosm, but the rest of the world does not). Maybe there is no perfect way to name these things, but parenthetic gibberish like this is pretty much the worst. Maybe I should have been more emphatic about that at the RfC, the close of which completely ignored the fact that the preference for crap like "Line 10 (Paris Métro)" is a WP:ILIKEIT demand that directly conflicts with WP:ATDAB policy. Closers are not supposed to count votes and defy policy, they're supposed to discount comments that ignore policy and give more weight to those that make better policy arguments.

Assuming I'm going to be ignored again as I was in the RfC, here's my pair of copper coins in regard to the bot's reasonably planned implementation of the RfC's unreasonably planned mess-making, in the order of the original questions above:

  1. Shorter disambiguation when possible is the rule (don't over-disambiguate).
  2. Same answer, as above, when applicable. When it comes to a choice between DAB by system or by city,, the RfC said to use system. In retrospect, it would be preferable that if the line is entirely within the city, use the city, since more readers will know they line they want is in a particular city that will be able to correctly ID what transit system it technically belongs to. I would "vote" for that option now, but I'm not sure we want to diverge from the RfC, even if it makes obvious sense to do so, because too many people pitch a fit over train-related naming [and really, really need to give it a rest]. Regardless: In a case where there are two lines in the same city with the same name but in different transit systems, or two lines in the same system but in different cities, then and only then use a long, multi-part disambiguation.
  3. Ditto, but substitute "country" for "city" in all of the above.
  4. Yes, per #1 above. However, this is not likely to affect names that simple. Lots of cities have a "line 23". To avoid stupid results like an article really named just Line 23 [which actually redirects to Widescreen signalling, LOL] when we know there are other transit lines by this name and we just don't have articles on them yet, then pre-emptively disambig by city or system so we don't have to rename it later. This is consistent with the RfC results, which valued [perhaps overvalued] consistency as the no. 1 priority. Regardless, when the name is actually unique (e.g. "AZ1 Trans-Arizona Rocket Rail" or something), then there is no need for disambiguation of any kind.
  5. What's special about Île-de-France? Going down these in order, "Line 11 Express" is unlikely to actually be unique (see #3, above). The second and third examples are unnecessary over-disambiguation unless Île-de-France also has a non-tramway Line 11 or Line 11 Express, respectively. The fourth example looks like a made-up name. But if the actual designation really is T11, we'd not be using just 11 as in the previous examples in that series. Finally, it's unlikely that "Line 11 Express" is a different line than "Line 11", in any transit system; it's just an express version of it during certain hours, making fewer stops, for longer-distance commuters. If we actually had separate articles on them they should be merged. It would be comparable to forking a notable-restaurant article "Juanita's Vegan Bistro" to "Juanita's Vegan Bistro Saturdays" just to cover their busy Saturday buffet and limited menu. We have article sections for a reason.
  6. "Line 14 (Paris Métro, 1937–1976)" would seem to be most consistent with other paterns; we need this level of disambiguation for some sports figures, etc., and commas are used between the disambiguators for clarity.

No matter what is done, I think the results are going to be so awful we'll be revisiting this again within a year and will eventually use sensible descriptive names like "Paris Métro line 10". Especially since most of these are just translations of designations in other languages to begin with.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  09:25, 3 December 2017 (UTC); updated for question #6: 15:08, 3 December 2017 (UTC)

@SMcCandlish: Thanks for your comments. Note that adding "Subway" or "Metro" and other similar "unnecessary" disambiguators is often helpful for the reader rather than only useful for disambiguation (e.g. "Yellow (Coldplay song)"; "My Little Pony: The Movie (2017 film)"). Consistency also helps; in the latter example "Movie" could technically be used as natural disambiguation (omitting "film") but there would only be three articles where that could be done.
The "Paris Métro line 10" style might not work as well for named lines, and it would be inconsistent and weird to do this style of disambiguation only for numbered/lettered lines in a system and use parenthetical disambiguation for the other lines. It's also not explicitly a disambiguator and adding it for consistency as a prefix to named lines (ignoring the Chinese metro lines, where this is unnecessarily done already to some degree) would raise the question of whether every railway line should have this sort of disambiguator (e.g. "National Rail East Coast Main Line"). Finally, sources might not usually mention the operator in front of the line name wherever it's used, so adding it where it's not actually supposed to be part of the name might be a violation of whatever guideline was cited in the comma disambiguation RfC about making up naming styles. Jc86035 (talk) 11:41, 3 December 2017 (UTC)
Misanalysis. Yellow (Coldplay song) and My Little Pony: The Movie (2017 film) are at those titles not to be "extra helpful" but because and only because Yellow (song) and My Little Pony: The Movie (film) are themselves ambiguous and require additional disambiguation. If they were not, they would be at those (and exactly those) shorter titles. As to your second point, the obvious solution would be to stop using parenthetical disambiguation for non-numbered lines that needed disambiguation; this would be more consistent with WP:ATDAB policy, which instructs us to prefer natural disambiguation, tells us to try comma-separated DAB next, and relegates parenthetic to last place (other than made-up descriptive titles, which we only use for things that don't have real names, mostly various events like floods and murders). Re: '[It] would raise the question of whether every railway line should have this sort of disambiguator (e.g. "National Rail East Coast Main Line")' – And ... so what? It's perfectly fine to raise such a question. Last point: We don't care, because we have redirects.

Anyway, this is all moot. I'm not trying to actually re-litigate the RfC, or I would just open another RfC – on the perfectly valid grounds that the close was faulty, because it blatantly ignored policy (not guidelines or WP:PROJPAGES but actual policy) in favor of vote-counting a bunch of WP:ILIKEIT nonsense. I'm actually perfectly content to let the RfC play out as-decided, which is why I answered your five questions. I'm content with that because I believe the ensuing mess will serve as an object lesson; reasoning with specialized-style fallacy thinkers rarely has much effect; what they want has to be proven to be a debacle, then the community overrides it and things go more sensibly with little patience for "do it the way [insert your fandom here] does it off-wiki".
 — SMcCandlish ¢ >ʌⱷ҅ʌ<  11:59, 3 December 2017 (UTC)

What I meant was that it would probably be fine to title those articles "Yellow (Coldplay)" etc. but we don't. WP:ILIKEIT doesn't entirely apply there since the RfC was advertised on VPP for the latter half of its duration and editors who were not part of the WikiProject commented (and some of the MOS sort of has to be based on ILIKEIT anyway, particularly where official style guides disagree; e.g. WP:MOSDASH). Jc86035 (talk) 12:21, 3 December 2017 (UTC)
@SMcCandlish: By the way, could you comment on the sixth point, which I added later? Thanks, Jc86035 (talk) 14:13, 3 December 2017 (UTC)
Done. PS: We don't use "Yellow (Coldplay)" because it doesn't make much sense in natural-ish language; "Yellow" is not a Coldplay, it's a song. WP isn't 100% strict on this; we do have some very poor disambiguators ("John Smith (baseball)", which implies a sporting goods brand, not a baseball player), but they are rare outliers that only exist because of (surprise!) entrenched wikiproject tendentiousness. I don't disagree that the VPP venue was the correct one, but we don't get good turnout on micro-topical things like this due to lack of general-audience interest in specialized trivia.

Wiki-sociological digression: I don't know of a solution to this problem, given the shrinking editorial pool and its shrinking patience with minutiae, other than to be stricter about just following the policies, instead of patiently entertaining constant special pleading demands along the "my topic is magically different" exceptionalism lines. They don't actually qualify under WP:IAR, so we need them to stop. If we don't collectively, as a community, take steps put these antics to bed, the end result (in 5 years? 10?) is simply going to be chaos: there'll be too few editors with too little time to prevent combative, insular camps of one-topic editors from WP:OWNing vast categories of articles and forking them off in all directions away from any kind of centralized Wikipedia conventions. At this point in the organizational lifecycle, WP can either become more of an institution with systems and ways of doing things, or it can continue to dysfunctionally pretend it's still in "visionary, wild-and-wooly founders who say fuck all rules" mode. Organizations that refuse to let go of the nostalgia of their formational period to become more codified just end up going through a lot of pain, inefficiency, and even threat to long-term survival until they allow the transition to happen. I could write a book about this, but others already have and it wouldn't be a profitable use of my time given the work involved in writing one.
 — SMcCandlish ¢ >ʌⱷ҅ʌ<  15:08, 3 December 2017 (UTC)

Um, looking at the above, I'm very confused. SMcCandlish...you're saying the RfC was faulty and the closure was ILIKEIT...when the closure was unanimous and those unanimous votes included yours. Or am I missing something at 6 in the morning? - The Bushranger One ping only 11:10, 9 December 2017 (UTC)
  • On the other hand, as a BAG member, I must assess whether there's actual consensus here. The relevant policy is WP:ATDAB with corresponding guideline at WP:NCDAB, both of which say natural disambiguation should be used before parenthetical disambiguation. Can you comment on why I shouldn't interpret the RfC with only eight editors as a local consensus to disregard this guideline? ~ Rob13Talk 14:41, 3 December 2017 (UTC)
    • I'll second that. Symbol merge vote.svg Needs wider discussion on a Village pump to be sure there really is consensus for this mass page move. Anomie 16:54, 3 December 2017 (UTC)
      @Anomie and BU Rob13: For what it's worth, the disambiguation RfC was advertised on WP:VPP from 11 to 26 November, all participants were in favour of standardizing the disambiguation to use the same format, and many people coming from the village pump commented on the UK station disambiguation RfC on the same page (which I also advertised for the same duration) but not on this RfC. I don't know exactly how WP:ATDAB applies here but it does suggest avoiding combining different disambiguation styles, which is in line with the RfC result to only use parenthetical disambiguation. If you want to open another RfC to confirm the consensus (or redo the RfC) then by all means do so, though I'm not sure if there would be a meaningful increase in participation; when I asked last year if we needed to standardize these titles I was told not to bother with it. Jc86035 (talk) 05:40, 4 December 2017 (UTC)

ZackBot 9

Operator: Zackmann08 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:45, Monday, November 27, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: User:ZackBot/taxon

Function overview:Removing TaxonIds in favor of Template:Taxobar.

Links to relevant discussions (where appropriate): Wikipedia:Templates_for_discussion/Log/2016_December_14#Template:TaxonIds

Edit period(s): One time run

Estimated number of pages affected: 6,265

Namespace(s):Mainspace

Exclusion compliant (Yes/No): Yes

Function details: From the testing I've done so far, this information has all been added to wikidata so all that needs to be done is to remove the old template and insert the new. First step is to remove the template from the page. Then I check to see if the new template is already there. If it is, then I'm basically done. If not, insert the new template at the very bottom, just above the categories.

Discussion

See PrimeBOT 21. — JJMC89(T·C) 06:05, 27 November 2017 (UTC)

@JJMC89: it looks like that one was has stalled... Would like to try to get mine working if that's ok? Since this is a one time thing and not a continuously running bot, I don't think it would be a problem to have both being tested at the same time... If that ones finishes the work before mine gets operational, at least I'll learn some stuff! :-) --Zackmann08 (Talk to me/What I been doing) 00:03, 28 November 2017 (UTC)

Please comment on whether you believe Wikipedia_talk:Arbitration/Requests#Crosswiki_issues:_Motion_.28November_2017.29 is applicable to this bot task. ~ Rob13Talk 14:30, 28 November 2017 (UTC)

@BU Rob13: to be honest I don't really understand that... Seems like a bunch of rules for how to deal with Wikidata. Doesn't seem to play a role here. --Zackmann08 (Talk to me/What I been doing) 07:00, 30 November 2017 (UTC)
BU Rob13, this task was done manually and can probably be withdrawn/declined. Primefac (talk) 03:47, 10 December 2017 (UTC)

IznoBot

Operator: Izno (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:02, Saturday, November 11, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised/Manual

Programming language(s): WP:AWB

Source code available: AWB

Function overview: WP:Lint <center> inside of {{US Census population}}

Links to relevant discussions (where appropriate): None available

Edit period(s): One-time run

Estimated number of pages affected: 20,000

Namespace(s): Main

Exclusion compliant (Yes/No): AWB default

Rationale: I identified an opportunity to WP:Lint for <center> in |footnote= of Template:US Census population a few weeks ago (to work on our 8 million errors-worth of obsolete HTML tags). Yesterday I took the time to start hacking at this project on User:IznoRepeat. When I got through the list of items I knew about, I went to see how large the problem was and found that there were 20k pages in mainspace alone. I was already concerned about the rate I was making the edits, so I'm here to request a bot flag for a separate account (User:IznoBot) to work on this problem.

Function details: The exact regex I ended with yesterday was the following:

  • Find (with regex): <center>(.*?)</center>
  • Replace: $1\n|align-fn=center

This is an extremely permissive find pattern and I would be willing to modify the regex if desired to look for the exact parameter name (|footnote=). I will be reviewing most/all edits regardless. This exact find and replace is evidenced at [1].

I also plan to run with general fixes on, which suggested several fixes to me yesterday. One with gen fixes accepted as-is; one with gen fixes suggested which I modified manually.

Discussion

That search patter is indeed to permissive, at the least change it to start with the parameter you care about: |footnote=<center>xaosflux Talk 00:00, 12 November 2017 (UTC)

@Xaosflux: Correct me if I am wrong, but this seems like a WP:COSMETICBOT. It also seems controversial since it is a low priority lint error. If you are going to supervise the edits, Izno, why do you need the bot flag? (Also, I'm not sure I like having my subpages be copied without my knowledge and/or permission.) Nihlus 04:01, 12 November 2017 (UTC)
It may be, I haven't looked at good examples yet. For 20000 repeated edits, it should be a flagged account to avoid watchlist flooding etc (assuming it should happen at all). — xaosflux Talk 04:32, 12 November 2017 (UTC)
That's a fair point; however, I got yelled at in multiple areas about clogging up user's watchlists with my bot when doing medium level lint fixes. I don't think a low priority run would be a good idea. Nihlus 04:35, 12 November 2017 (UTC)
<center> will stop working on Wikimedia wikis at some point in the future (this is a fact), at which point the change is clearly no longer cosmetic. I would call it "egregiously invalid HTML" now given that it's obsolete in the version of HTML that Wikipedia outputs (that is, DOCTYPE html aka HTML 5). This is regardless of its priority for linting, which is assigned by an engineer without solicitation from the community.
I suspect you were having problems mostly because your edits were being made outside the main space (which isn't critical), but maybe I'm not aware of some specific edits. The bot will only run in the mainspace, so "ensuring Wikipedia continues to look beautiful" is the acceptable rationale for most/all people, whereas it is difficult to defend signature cleaning in the same way as it is not outward-facing.
For the flag, Xaosflux covers that nicely. For the supervision, that's due to running gen fixes as well as taking the opportunity to make "better" edits than are suggested for gen fixes, if I identify such (optional behavior; I am happy not to make these suggested changes). I don't expect false positives, but there is always that potential as well. I have no problem with performing the task fully-automated, but you will find no requirement to do so in the policy for the flag. --Izno (talk) 04:55, 12 November 2017 (UTC)
That's fine. Does finding (\| *?footnote *?= *?)<center>(.*?)</center> and replacing with $1$2\n|align-fn=center work for you? --Izno (talk) 04:55, 12 November 2017 (UTC)
@SSastry (WMF): Anything coming in the future with these types of lint errors, considering they are a low priority? Nihlus 05:16, 12 November 2017 (UTC)
We prioritized the linter categories in relation to the goal of replacing Tidy. At this time, on the parsing team, we don't have any immediate parsing related work that depends on the other linter categories. It is up to wikis what they wish to do with these issues. But, I know that some UI folks and designers at the foundation prefer that the obsolete tags not be used (See phab:T175709). Editor on the Italian Wikipedia have been replacing the obsolete tags and have even set up abuse filters for discouraging their use in edits. Hope this context is helpful. SSastry (WMF) (talk) 23:35, 12 November 2017 (UTC)
I'll retract my objection then. I still think a manual approach to fixing 8 million tags is not the best way to go about it, but I won't stop people from trying. Nihlus 23:16, 13 November 2017 (UTC)

Bots in a trial period

Pi bot 3

Operator: Mike Peel (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:56, 28 November 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: on bitbucket

Function overview: Look through references to references to reports to Cochrane (organisation) to check for updates to them; when found, tag with {{update inline}} [2], and add to the report at Wikipedia:WikiProject Medicine/Cochrane update/August 2017 for manual checking by editors [3]. Also archive report lines marked with {{done}} to the archive at Wikipedia:WikiProject Medicine/Cochrane update/August 2017/Archive [4] [5].

Links to relevant discussions (where appropriate): This was previously run by @Ladsgroup on an ad-hoc basis. I was asked to take over the running of it on a more regular basis by @JenOttawa:. See [6] and [7].

Edit period(s): Once per month

Estimated number of pages affected: Depends on the number of Cochrane updates each month, and the number of references to them. Likely to be a number in the tens rather than the hundreds.

Namespace(s): Mainspace and Wikipedia

Exclusion compliant (Yes/No): No, not relevant in this situation

Function details: The code searches for cases of "journal=Cochrane" in Wikipedia articles, extracts the Pubmed ID from the reference, then fetches the webpage from pubmed and looks for a "Update in" link. If an update is available, then it marks the reference as {{update inline}}, with a link to the updated document, and adds it to the report at Wikipedia:WikiProject Medicine/Cochrane update/August 2017 where users manually check to see if the article needs updating. If it does, then they can update the reference and mark it as {{done}} in the report, and the bot then archives the report when it next runs. If it does not, then it can be marked with <!-- No update needed: ID_HERE --> in the article code, and the bot won't re-report the outdated link in the future. I've made some test edits under my main user account to demonstrate how the bot works, links are in the function overview above. Mike Peel (talk) 20:56, 28 November 2017 (UTC)

Discussion

  • Comment: Is text like "journal=The Cochrane database of systematic reviews" (as in Postpartum bleeding) or "journal = Cochrane Database of Systematic Reviews" (as in Common cold) or the presumably incorrect "title=Cochrane Database of Systematic Reviews" (as in Common cold) or "journal = Cochrane Database Syst Rev" (as in Common cold) relevant to this request? You might want to include those variations. – Jonesey95 (talk) 21:08, 28 November 2017 (UTC)
    • @Jonesey95: The code that's currently used to select articles is generator = pagegenerators.SearchPageGenerator('insource:/\| *journal *= *.+Cochrane/', site=site, namespaces=[0]). That was written by @Ladsgroup, and I'm not sure how to modify it to catch more cases. It also currently returns the message "WARNING: API warning (search): The regex search timed out, only partial results are available. Try simplifying your regular expression to get complete results. Retrieving 50 pages from wikipedia:en." Once the articles are selected, pmids = re.findall(r'\|\s*?pmid\s*?\=\s*?(\d+?)\s*?\|', text) is run on the article text to find the references to update, which will actually catch more than just the Cochrane reviews in the article, but only the references with updates are touched by the code. TBH, I'm not an expert in regexes, so any suggestions you have to improve these would be very welcome! Thanks. Mike Peel (talk) 21:17, 28 November 2017 (UTC)
      • Insource searches have a very low timeout value, so anything with a mildly complex regex will time out. See T106685 for some details. The only way I know of to get around it is to search for multiple regexes in succession, like this:
        • insource:/\| journal =.+Cochrane/
        • insource:/\| journal=.+Cochrane/
        • insource:/\|journal =.+Cochrane/
        • insource:/\|journal=.+Cochrane/
      It looks like the regex you have will catch all of the above cases except the junky "title" instance, which should be fixed manually by someone who knows the right way to fix it. – Jonesey95 (talk) 00:45, 29 November 2017 (UTC)
      • @Jonesey95: I've added a loop that runs each of those regexes in turn, and just for the fun of it I've also added the same set for 'title' as well as 'journal' so it'll try to catch those odd cases. It currently checks 6576 Wikipedia articles in total, which will include duplicates (since I don't currently filter them out - is there a good way to merge and de-duplicate the return values from SearchPageGenerator or PreloadingGenerator?). While 6 out of 8 of the regexes run without timeouts, the last two do still return the warning, but they're "insource:/\|title =.+Cochrane/" and "insource:/\|title=.+Cochrane/" - so if there's not a good way around that then maybe we just live with it (those two queries return 98 and 304 results respectively, which is a lot less than some of the others, so this is a bit odd).
      • I'd like to set this going for a full run soon, if that would be OK? Thanks. Mike Peel (talk) 21:36, 1 December 2017 (UTC)
        • @Mike Peel: Re "is there a good way to merge and de-duplicate the return values", you could maintain a list in-memory of the page IDs/titles that have been processed and skip anything that has shown up before. That may or may not be helpful depending on the amount of duplication. Anyway, I have a broader question. As you said above, the bot actually checks all PMIDs in a given page for updates, not just the Cochrane-related ones; this includes logging said non-Cochrane-related updates on the Cochrane updates page. Is there any potential for this to be a problem? Alternatively, would it be useful to potentially expand the task scope to all PMIDs? — Earwig talk 05:59, 12 December 2017 (UTC)
          • @The Earwig: De-duplicating: that's true, although I was hoping there might be a built-in option. :-) The numbers are fairly small here, and the code should cope fine with a second pass through a page (it'll see the messages left by any previous and not do anything). On checking PMIDs - @JenOttawa: can probably answer this better than me, but my understanding is that most PMIDs will never be updated since they're one-off articles rather than part of a series like the Cochrane ones are, so while we can check for updates to them they won’t be flagged by the bot. If there are any that aren’t Cochrane-related that do have an update, then they’ll be investigated by a human after being posted to the Cochrane page, and we can figure out how to deal with them then. Thanks. Mike Peel (talk) 14:24, 12 December 2017 (UTC)
Thanks for helping here The Earwig and Mike Peel. In my experience, most other PMIDs are not updated like Cochrane Reviews are, however, I can not speak for all journals/publishing companies. Other publications are certainly retracted/withdrawn, but I am also not sure what happens here to the PMIDs. This bot ran for quite a few years and seemed to work very well and be accurate. I performed a large number of the updates (at least 100). This means that I manually went through the citation needed tags + PMID list generated, and there were very few errors. I never saw an incidence where a non-Cochrane Review was flagged with the citation needed tag, for example. I hope this helps and somewhat answers the question. We have spent considerable time on this over the past 12 months, so we are now fairly caught up with the updates. In May 2017 we had about 300 updates to perform. I would expect that a full run of the bot would pull about 50-75 new updates needed (August-December updates that were published by Cochrane), and then if we run with monthly, it would pull about 15-20 a month. This means that the volunteers will be able to stay fairly up to date with the updates, and if there are errors (other reviews pulled, etc) we will be able to correct manually them within a month or so. If you have any other questions, or if there is anything that I can help with, please let me know. I am still learning about this, but we greatly appreciate your assistance on this! JenOttawa (talk) 14:38, 12 December 2017 (UTC)
Thanks for the prompt replies, everyone. This sounds good to me, so let's move forward with a trial run. Since the plan is for monthly runs, let's have the bot complete a full round of updates for this month and we can evaluate it from there. Approved for trial. — Earwig talk 17:57, 12 December 2017 (UTC)
@The Earwig: Thanks, it is now running. Mike Peel (talk) 18:17, 12 December 2017 (UTC)
It's taking longer to run than I was expecting (due to the number of unique pubmed pages it's fetching), but the edits so far seem to be OK. I'm heading offline for the eve now, so if there are any issues then please abort it by blocking the bot. Otherwise, I'll check things in the morning. Thanks. Mike Peel (talk) 23:28, 12 December 2017 (UTC)
Thanks again to both of you. Looks good so far. JenOttawa (talk) 01:28, 13 December 2017 (UTC)

90% of what the bot is marking for updates are to "withdrawn" reviews. I have reverted most of them and updated the one of two that were newer and not withdrawn.

The bot needs to exclude withdrawn articles. It also need to look for the newest version not just the next newer version. Best Doc James (talk · contribs · email) 05:30, 13 December 2017 (UTC)

  • OK, I think this test run has shown two issues - the need to handle withdrawn articles better, and also an intermittent problem with fetching the webpages (which is why the bot stopped at ~0200UT without finishing the run). I'll work on improving those before requesting another test run. Thanks. Mike Peel (talk) 19:00, 13 December 2017 (UTC)

InfoboxBot

Operator: Garzfoth (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:36, Tuesday, October 24, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Python and mwparserfromhell

Source code available: No source code available at this time, sorry. Example for original functionality: User:InfoboxBot/wikipedia_edit_pages_clean.py Yes (available at User:InfoboxBot/wikipedia_edit_pages_clean.py)

Function overview: This bot would assist me in fixing various widespread yet minor issues with non-standard infobox parameters in articles (primarily focused on issues with Template:Infobox power station and possibly Template:Infobox dam).

Links to relevant discussions (where appropriate): I do not believe that this bot would be controversial - any changes made by it are going to be uncontroversial minor changes.

Edit period(s): As needed (it'll vary significantly). It will not be anywhere near continuous.

Estimated number of pages affected: There are ~2500 articles using infobox power station and ~3500 articles using infobox dam. The number of articles out of these that would be affected by my bot is unknown. For now, let's call it an absolute upper limit of ~6000 affected articles.

Namespace(s): Mainspace only.

Exclusion compliant (Yes/No): No, as in my experience articles with infobox power station or infobox dam on them never use the bots template in the first place. I am not adverse to implementing detection for this template in the future, but I don't see the need for it unless I broaden the scope of the bot's work to different infoboxes.


Function details: I have already scraped all articles with infobox power station and infobox dam in them, placed the infobox data from said articles into a MySQL database, and am using analysis of that dataset/database to discover issues that can be fixed via this approach. Here is a good example of what kind of issues this bot can help me fix:

  • For infobox param "th_fuel_primary": There are 153 articles using the term "[[Coal]]", 90 articles using the term "Coal", 80 articles using the term "Coal-fired", and 14 articles using the term "[[Coal]]-fired". This bot can automatically change the value of "th_fuel_primary" to "[[Coal]]" for the 184 articles that use equivalent terms, resulting in 337 articles that all use the same correct homogenous terminology and are all wikilinked correctly.

So yeah, this is essentially just a specialized high-speed-editing/assisted-editing tool. As far as I understand, it is still possibly classified as a bot and thus I have to submit it to BRFA as I am doing now. I did run this on my personal account for a single run (on the infobox param "status" - changing the non-standard value "Active" into "O" (expands to "Operational") for 185 articles) before realizing that it may be classifiable as a bot (and that I was also performing operations too fast if the bot action speed limits applied - I had quite a bit of trouble locating the actual documentation on this so I had initially assumed that it was the same as the API itself and set a 1s + overhead delay between requests) and stopping. So if you want a demonstration of what this bot does in the real world, just look at the long string of commits in my history with the edit summary "Automated edit: fixing infobox parameter "status"".

Discussion

Could the bot implement some of User:Headbomb/sandbox (expand collapsed sections)? Headbomb {t · c · p · b} 11:07, 24 October 2017 (UTC)
1.a/1.c crash my scraping script, so I’ve already manually fixed those in all affected articles using either infobox dam or infobox power station. I can look into building a new script to locate and automatically fix those types of issues in other infoboxes, it would be an interesting problem to try to solve automatically, but no promises on that since it might not be doable automatically with high confidence.
For the rest, yes, the bot can do at least some of them if not most or all of them (and in fact I was already planning on implementing a number of those items), although it’s going to require additional work to implement them, and my first priority is still going to be fixing the more substantial issues. Garzfoth (talk) 17:36, 25 October 2017 (UTC)
I would greatly appreciate getting a response to at least the specific question of if this use is classified as a bot or not (i.e. does it actually need approval as a standalone bot through BRFA or can I just run it on my personal (or InfoboxBot?) account(s)?)... I have been waiting two and a half weeks for another response and it's getting a bit frustrating. I would prefer to have an account with the bot flag to run it on simply because of the expanded API limits available in that case (and being able to edit without unnecessarily cluttering up anyone's watchlist, since I could then flag my edits as bot-made which allows them to be easily hidden by users if desired), but I do not by any means need the bot flag to operate the program. Garzfoth (talk) 19:58, 11 November 2017 (UTC)

{{BAGAssistanceNeeded}}

It has been over a month since the last response. I would greatly appreciate a response to at least the question highlighted in bold above (is this use even classifiable as a bot or can I just run this as a script on my personal account without approval required?). Thanks! Garzfoth (talk) 21:17, 26 November 2017 (UTC)

From the BOTPOL definitions, the fact that you aren't personally approving each edit means that this is probably a bot, and would likely need to be approved here. It shouldn't be controversial, though. Going through the edits you made (convenience link!), the random sample that I picked all look good. It would be nice if you had some examples of the Coal change, as opposed to just the "Active" to "O" change, however. Even better would be if the code were somewhere BAG members and others could review it - you don't even have to put it on GitHub, as it's just as readable in the bot's userspace.
One important change you should make is the edit frequency: 1 second between edits is too low. For nonessential maintenance tasks, the usual delay is 10 seconds (source: WP:BOTREQUIRE). I'm not a BAG member myself, so I can't grant a trial; so I'll leave the tag here. You should probably fix the rate thing before the trial, though. Enterprisey (talk!) 13:40, 5 December 2017 (UTC)
Thanks for the feedback! I am aware of the editing frequency issue (it's specifically mentioned in my BRFA if you missed it), I would of course change that to 10 seconds between edits for a production run, as I said I only operated that fast in the first place because I originally could not locate the correct documentation on bot policies and had assumed that the general API rate limits applied.
I can't exactly give more precise examples of changes since I apparently wasn't supposed to be running the bot without BRFA approval in the first place, but I suppose I could manually make some example edits to show what the bot would be capable of doing? My main goal originally was just to homogenize a lot of common simple stuff like the coal example, but then I got branched out and started thinking of wider applications, so my application is admittedly a bit open-ended.
As far as the code goes, I dislike open-sourcing anything I've written for personal use until it's been extensively polished because I keep a lot of debug stuff commented out and don't write my commented notes for a general audience, so it gets more than a bit sloppy/unprofessional and I prefer to only publish very clean code unless absolutely necessary. I guess I could strip the comments entirely and publish it more or less as-is though. I'll think about that.
I'll leave the tag up until someone from BRFA can drop by to discuss a trial. Garzfoth (talk) 03:35, 11 December 2017 (UTC)
I've cleaned up and posted the original code used for the Active => O change run: User:InfoboxBot/wikipedia_edit_pages_clean.py Garzfoth (talk) 03:48, 11 December 2017 (UTC)

@Garzfoth: This request has sat for a very long time. I would like to apologize for that.

Minor code review. This line:

	tpl = next(x for x in templates if x.startswith("{{Infobox power station") or x.startswith("{{infobox power station") or x.startswith("{{Infobox power plant") or x.startswith("{{infobox power plant") or x.startswith("{{Infobox wind farm") or x.startswith("{{infobox wind farm") or x.startswith("{{Infobox nuclear power station") or x.startswith("{{infobox nuclear power station"))

would look better as:

    tpl = next(x for x in templates if x.name.matches(["Infobox power station", "Infobox power plant", "Infobox wind farm", "Infobox nuclear power station"]))

Now, my only real concern here is that certain changes can seem uncontroversial on the surface but are actually not once you do them en-masse. The "Active" to "O" thing is surely fine, but whether or not to wikilink "Coal" is something I could see as contentious. How do you determine what the convention is when the most common option is used by only 45% of articles (153/337, per your numbers)? Arguments could exist either way, and it might depend on the article (maybe).

Anyway, let's do a fairly loose trial to get a sense of the kinds of changes you would like to make and how they pan out. If possible, please do a variety of types of fixes, but if you only have a couple in mind right now, that's fine too. Approved for trial (100 edits). — Earwig talk 06:23, 12 December 2017 (UTC)

Thank you for your comments. The code suggestion is extremely helpful, I tested it and subsequently refactored all of my code (including components that have not been published such as the scraping stuff) to incorporate it.
I have thought extensively about the issue of balancing too-minor/controversial changes with real action for a while now. For wikilinking stuff like that I think it's no contest — a wikilink is almost always going to be justified for stuff like that (especially as the infobox is a separate entity and the MOS makes the provision that repeating links in infoboxes is fine if helpful for the readers). For capitalization issues, it's a messier situation, but I think the best approach is to focus on choosing the option that makes the most grammatical sense (something I've tried to clarify with limited research), fits best within the generalized context of an infobox, adheres to the MOS, is the most visually consistent & pleasing with other infobox elements, and corresponds with the established consensus (I can see how popular each option is while analyzing the DB for variables to work on, so that lets me measure the rough level of consensus for existing options). I'm actually really curious if anyone will object to the capitalization standardization I'm using — if it triggers an objection, I'll of course discuss the issue, and if the discussion results are to use non-capitalization for the standard (or whatever else), I can then use the bot to put the articles in line with the outcome of the discussion instead.
I started on the trial run. Here are changes done so far:
IPS parameter
name/key/category
Original value Modified value #
th_technology steam [[Steam turbine]] 2
th_technology Steam [[Steam turbine]] 17
th_technology [[gas turbine]] [[Gas turbine]] 3
th_technology [[Gas Turbine]] [[Gas turbine]] 3
country United States [[United States]] 5[a]
country England [[England]] 5[b]
ps_units_manu_model Siemens [[Siemens]] 3
ps_units_manu_model Vestas [[Vestas]] 2
status Operating O (expands to Operational) 5[c]
status operational O (expands to Operational) 17
status Baseload O (expands to Operational) 6
status Peak O (expands to Operational) 5
th_fuel_primary Coal [[Coal]] 5[d]
th_fuel_primary Coal-fired [[Coal]] 5[e]
th_fuel_primary [[Natural Gas]] [[Natural gas]] 5[f]
th_fuel_primary [[natural gas]] [[Natural gas]] 5[g]
th_fuel_primary Natural gas [[Natural gas]] 5[h]
Total edits made during initial trial: 98
  1. ^ There were 257 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  2. ^ There were 105 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  3. ^ There were 38 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  4. ^ There were 88 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  5. ^ There were 72 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  6. ^ There were 27 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  7. ^ There were 24 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  8. ^ There were 23 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
During the run only one edit was reverted (this one), with the reason being "editing tests". The editor in question subsequently thanked the bot's account for a different edit, and I'll be replying to their message on the bot's talk page to explain the matter and see what their views on the capitalization change really are (i.e. did they truly intend to revert or did they simply not notice that the edit actually changed something).
Here is the updated primary bot code, with various improvements made, functionality added, code cleaned up, and most code comments preserved (even the stupid ones): User:InfoboxBot/wikipedia_edit_pages_clean.py
Thanks again! Garzfoth (talk) 14:02, 15 December 2017 (UTC)
WP:OVERLINK applies. You should not be linking countries like the U.S. and England. — JJMC89(T·C) 19:40, 15 December 2017 (UTC)

Bots that have completed the trial period

Bot1058 4

Operator: Wbm1058 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:20, Monday, November 20, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available:

Function overview: Sync unsynchronized disambiguation talk page redirects

Links to relevant discussions (where appropriate): Template talk:R from incomplete disambiguation#Error checking revisited

Edit period(s): One time big run, followed by periodic smaller runs as needed

Estimated number of pages affected: ~1,800 pages on initial run

Namespace(s): Talk

Exclusion compliant (Yes/No): No

Function details: This task is the cousin of Bot1058's task 3. That was done with AWB, but for this one I wrote a simple PHP program. The typical scenario goes like this... a parenthetically disambiguated title is created for a film, book or album, etc., for example Homework (film) disambiguates from Homework. That works fine until a second film is made with the same title. Then the first film is moved to a fully disambiguated titleTalk:Homework (film) moved to Talk:Homework (1989 film): to disambiguate against Homework (2011 film). Homework (film) is then redirected to Homework (disambiguation), but the talk redirect is often left unchanged, rendering it an "unsynchronized disambiguation talk page redirect". This bot's task is to re-sync these redirects by redirecting Talk:Homework (film) to Talk:Homework (disambiguation), if Talk:Homework (disambiguation) exists and is not blank. In rare cases (about three dozen on the first run), when the target talk page is a red-link or has no content, then I replace the redirect with {{Talk page of a redirect}} and {{WikiProject Disambiguation}} templates (like this).

Now I must make a confession. In the course of testing my code, I accidentally made a de facto 370-edit trial run. I was intending to make a dry run through to completion, and commented out lines of code which I expected to un-comment for an approved trial. Unfortunately, I neglected to comment out one key line; sorry about that. I noticed my mistake about three minutes later – it's amazing how many edits an unrestricted bot can make in 3 minutes. One line I had commented out was the sleep command that put a governor on the bot's editing speed. I'll try to be more careful in the future.

The bot works Category:Unsynchronized disambiguation talk page redirects in alphabetical order. So my trial has cleared A–H already, except for a few "red-link" cases, as I added the code to replace those with {{Talk page of a redirect}} and {{WikiProject Disambiguation}} after that bootleg-trial. Those will be the first done on the next run. – wbm1058 (talk) 00:20, 20 November 2017 (UTC)

Discussion

  • Approved for trial (370 edits). (as this has already been done). I'm reviewing the accidental trial now. In the future, for dry runs, it's a good idea to keep the sleep command uncommented as a precautionary measure. ~ Rob13Talk 10:08, 27 November 2017 (UTC)
    • Trial complete. ~ Rob13Talk 10:12, 27 November 2017 (UTC)
  • Symbol tick plus blue.svg Approved for extended trial (25 edits). I reviewed the latest 50 edits and everything looked good to me. The extended trial is mostly to make sure the code is now operating at an appropriate rate. ~ Rob13Talk 10:11, 27 November 2017 (UTC)
    • Trial complete. 25 edits, at a more pedestrian speed. 19 of these are redirects, as in the first 370-edit trial (edit summary Syncing unsynchronized disambiguation talk page redirect). The other 6 are template placements (edit summary Fixing unsynchronized disambiguation {{Talk page of a redirect}}). Talk:Get It (song) is one of these pages. – wbm1058 (talk) 00:45, 29 November 2017 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Requests_for_approval&oldid=815086368"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Wikipedia:BRFA
This page is based on the copyrighted Wikipedia article "Wikipedia:Bots/Requests for approval"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA