Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)
Jump to navigation Jump to search

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

Bots in a trial period

EranBot 3

Operator: ערן (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:07, Saturday, September 15, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: source on github

Function overview:: This bot submits newly added text to the iThenticate API which determines if other sources are similar to it. Suspected copyvios (>50% similarity) can then reviewed manually (copypatrol; top reviewers: Diannaa, Sphilbrick, L3X1). In this BRFA I would like to ask to join it to copyviobot group, to access pagetriagetagcopyvio API which will be used by PageCuration extension aka Special:NewPagesFeed (see phab tasks).

Links to relevant discussions (where appropriate): prev BRFA, copyviobot, Epic task for copyvio in new pages feed (and subtasks)

EditActive period(s): Continuous

Estimated number of pages affected: N/A. The bot will tag suspected edits using API. This may be used by special page Special:NewPagesFeed.

Namespace(s): main namespace and drafts (the bot is not editing them, but may check them for copy)

Exclusion compliant (Yes/No): N/A

Function details:

  • any diff (except rollbacks) in main and draft NS which adds large chunck of text may be a subject for copyvio check
  • Copyvio check is done using iThenticate service (WP:Turnitin who kindly provided us access to their service)
  • Changes that are similar to existing text in external source are reported (can be reviewed in https://tools.wmflabs.org/copypatrol/en ) so users can further review them manually.
  • (new) By adding the bot to copyviobot group, it will be possible to access to suspected diffs more easily from Special:NewPagesFeed later

Eran (talk) 16:07, 15 September 2018 (UTC)

Discussion

48% of the edits reported as suspected copyvio required additional follow up ("page fixed"). In tools.labsdb:select status, count(*) from s51306__copyright_p.copyright_diffs group by status;

The full details how it is going to be shown in Special:NewPagesFeed would probably need to be discussed with community and with Growth team (MMiller, Roan Kattouw) - however, it is already possible to see an example in beta test wiki (search for "copyvio"). It would be important to note tagged page just means an edit may contain copied text (such edits may be OK [CC-BY content from government institutions], copyright violation [copy & paste from commercial news service] or promotional content [may be legally OK sometimes, but violates WP:Promo). Eran (talk) 16:07, 15 September 2018 (UTC)

It isn't sinking in how this fits in with the CopyPatrol activities. I'd like to discuss this further. Please let me know if this is a good place to have that discussion or if I should open up a discussion on your talk page or elsewhere.--S Philbrick(Talk) 18:03, 15 September 2018 (UTC)
Sphilbrick: I think it is relevant in this discussion, can you please elaborate? thanks, Eran (talk) 19:30, 15 September 2018 (UTC)
I start with a bit of a handicap. While I understand the new pages feed in a very broad sense, I haven't actually worked with it in years and even then had little involvement.
It appears to me that the goal is to give editors who work in the new page feed a heads up that there might be a copyvio issue. I've taken a glance at the beta test wiki — I see a few examples related to copyvios. I see that those entries have a link to CopyPatrol. Does this mean that the new page feed will not be directly testing for copyright issues but will be leaning on the copy patrol feed? I checked the links to copy patrol and found nothing in each case which may make sense because those contrived examples aren't really in that report, but I would be interested to know exactly how it works if there is an entry.
The timing is coincidental. I was literally working on a draft of a proposal to consider whether the copy patrol tools should be directly making reports to the editors. That's not exactly what's going on here but it's definitely related.
What training, if any is being given to the editors who work on the new pages feed? Many reports are quite straightforward, but there are a few subtleties, and I wonder what steps have been taken to respond to false positives.--S Philbrick(Talk) 19:57, 15 September 2018 (UTC)
CopyPartol is driven by EranBot with checks done by iThenticate/Turnitin. This BRFA is to send revision IDs with possible violations to the API, which will cause the CopyPatrol links to be shown in the new pages feed. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
Sphilbrick: thank you for the good points.
  • Regarding training for handling new pages feed and copyvios - I was about to suggest to document it, but actually it is already explained in Wikipedia:New pages patrol#Copyright violations (WP:COPYVIO) quite well (but we may want to update it later)
  • Directly making reports to the editors - This is good idea, and actually it was already suggested but was never fully defined and implemented - phab:T135301. You are more than welcome to suggest how it should work there (or in my talk page and I will summarize the discussion on phabricator).
Eran (talk) 18:40, 16 September 2018 (UTC)
Thanks for the link to the training material. I have clicked on the link to "school" thinking it would be there, but I now see the material in the tutorial link.
Regarding direct contacts, I'm in a discussion with Diannaa who has some good reasons why it may be a bad idea. I intend to follow up with that and see if some of the objections can be addressed. Discussion is [[|User_talk:Diannaa#Copyright_and_new_page_Patrol|here]].--S Philbrick(Talk) 18:54, 16 September 2018 (UTC)
@Sphilbrick: thanks for the questions, and I'm sorry it's taken me a few days to respond. It looks like ערן has summarized the situation pretty well, but I'll also take a stab. One of the biggest challenges with both the NPP and AfC process is that there are so many pages that need to be reviewed, and there aren't good ways to prioritize which ones to review first. Adding copyvio detection to the New Pages Feed is one of three parts of this project meant to make it easier to find both the best and worst pages to review soonest. Parts 1 and 2 are to add AfC drafts to the New Pages Feed (being deployed this week), and to add ORES scores on predicted issues and predicted class to the feed for both NPP and AfC (being deployed in two weeks). The third part will add an indicator next to any pages who have a revision that shows up in CopyPatrol, and those will say, "Potential issues: Copyvio". Reviewers will then be able to click through to the CopyPatrol page for those revisions, investigate, and address them. The idea is that this way, reviewers will be able to prioritize pages that may have copyvio issues. Here are the full details on this plan. Xaosflux has brought up questions around using the specific term "copyvio", and I will discuss that with the NPP and AfC communities. Regarding training, yes, I think you are bringing up a good point. The two reviewing communities are good at assembling training material, and I expect that they will modify their material as the New Pages Feed changes. I'll also be continually reminding them about that. Does this help clear things up? -- MMiller (WMF) (talk) 20:32, 20 September 2018 (UTC)
Yes, it does, thanks.--S Philbrick(Talk) 21:37, 20 September 2018 (UTC)
  • User:ערן how will your bot's on-wiki actions be recorded (e.g. will they appear as 'edits', as 'logged actions' (which log?), etc?). Can you point to an example of where this get recorded on a test system? — xaosflux Talk 00:22, 16 September 2018 (UTC)
    Xaosflux: For the bot side it is logged to s51306__copyright_p on tools.labsdb but this is clearly not accessible place. It is not logged on wiki AFAIK - If we do want to log it this should be done in the extension side. Eran (talk) 18:40, 16 September 2018 (UTC)
    phab:T204455 opened for lack of logging. — xaosflux Talk 18:48, 16 September 2018 (UTC)
Thanks, Xaosflux. We're working on this now. -- MMiller (WMF) (talk) 20:33, 20 September 2018 (UTC)
  • I've never commented on a B/RFA before, but I think that another bot doing copyvios would be great, esp if it had less false positives than the current bot. Thanks, L3X1 ◊distænt write◊ 01:12, 16 September 2018 (UTC)
    • L3X1: the Page Curation extension defines infrastructure for copyvio bots - so if there are other bots that can detect copyvios they may be added to this group later. AFAIK the automated tools for copyvio detection are Earwig's copyvio detector and EranBot/CopyPatrol and in the past there was also CorenSearchBot. The way it works is technically different (one is based on a general purpose search using Google search, one is based on Turnitin copyvio service) and they are completing each other with various pros and cons for each. I think Eranbot works pretty well (can be compared to Wikipedia:Suspected copyright violations/2016-06-07 for example)
    • As for the false positives - it is possible to define different thresholds for the getting less false positives but also missing true positives. I haven't done a full Roc analysis to tune all the parameters but the arbitrary criteria is actually works pretty well somewhere in the middle ground. Eran (talk) 18:40, 16 September 2018 (UTC)
  • Follow up from BOTN discussion, from what has been reviewed so far, the vendor this bot will get results from can check for "copies" but not necessarily "violations of copyrights" (though some copies certainly are also copyvios), as such I think all labels should be limited to descriptive (e.g. "copy detected"), as opposed to accusatory (humans should make determination if the legal situation of violating a copyright has occured). — xaosflux Talk 01:30, 16 September 2018 (UTC)
    That would be part of the new pages feed, which the bot doesn't control. Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018 or Phabricator would be more appropriate venues for discussing the interface. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
    @JJMC89: what I'm looking for is where is a log of what this bot does control. As this is editor-managed, its not unreasonable to think another editor may want to run a similar or backup bot in the future. — xaosflux Talk 05:14, 16 September 2018 (UTC)
  • Would it be possible to assign a number of bytes to "large chunck of text"? SQLQuery me! 02:25, 16 September 2018 (UTC)
    500 bytes. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
  • Procedural note: The components for reading changes, sending data to the third party, and making off-wiki reports alone do not require this BRFA; making changes on the English Wikipedia (i.e. submitting new data to our new pages feed, etc) are all we really need to be reviewing here. Some of this may have overlap (e.g. what namesapces, text size, etc), however there is nothing here blocking the first 3 components alone. — xaosflux Talk 18:54, 16 September 2018 (UTC)
  • It looks like phab:T204455 has been closed regarding logging, can you show an example of an action and it making use of this new logging? — xaosflux Talk 11:02, 2 October 2018 (UTC)
  • @MMiller (WMF): any update on verbiage related to phab:T199359#4587185 ? — xaosflux Talk 18:35, 2 October 2018 (UTC)
    • @Xaosflux: If the verbiage issue is resolved, I was wondering if we could move ahead with a trial for this BFRA. The way that PageTriage works is that it won't allow bots to post copyvio data to it unless the bot belongs to the "Copyright violation bots" group. So for the trial, you'll need to add EranBot to the group with whatever expiration time you like. It would be good to have at least a couple days so that we can make sure everything is running properly on our end as well. Ryan Kaldari (WMF) (talk) 17:35, 4 October 2018 (UTC)
    • @Xaosflux: ping. Ryan Kaldari (WMF) (talk) 17:12, 10 October 2018 (UTC)
      • Thanks, I've prompted any other feedback at Wikipedia:Village_pump_(proposals)#Bot_to_add_external_party_originality_flag_to_new_pages_feed. In the meantime, @Ryan Kaldari (WMF): I'd like to see this demonstrated on testwiki or test2wiki prior to production trials here, so that our human reviewers can submit data easily and see how it responds. Any impediments to this? — xaosflux Talk 17:38, 10 October 2018 (UTC)
        • @Xaosflux: Unfortunately, setting up EranBot to monitor a new wiki isn't trivial and might take a while. You can see what the new logs will look like on Beta Labs. And you can see what sort of stuff EranBot flags by looking at CopyPatrol. What do you think about just doing a one day trial on English Wikipedia and having folks take a look at the results? That way it will be tested against more realistic edits anyway. Ryan Kaldari (WMF) (talk) 00:15, 11 October 2018 (UTC)
          • phab:T206731 created as we do not currently have community control over this access. — xaosflux Talk 02:14, 11 October 2018 (UTC)
  • Thank you, pending community closure at WP:VPP. As far as a trial goes, any specific day you would like to do the live run? — xaosflux Talk 03:54, 14 October 2018 (UTC)
  • Closed the VPP thread as succesful. WBGconverse 06:23, 14 October 2018 (UTC)
{{OperatorAssistanceNeeded}} In prepping for a live trial, what day(s) would you like to do this? I want to make sure we send notices to Wikipedia:New pages patrol and perhaps a note at MediaWiki:Pagetriage-welcome. — xaosflux Talk 13:48, 15 October 2018 (UTC)
Xaosflux: Would it work to run with reports between 16 October ~16:00 UTC time - 17 October ~16:00 UTC ? Eran (talk) 15:23, 15 October 2018 (UTC)
That sounds good for us. What do you think Xaosflux? Ryan Kaldari (WMF) (talk) 17:24, 15 October 2018 (UTC)
  • Approved for trial (7 days). I've added the cvb flag for the trial and let the NPP/Reviewers know. Do you have a good one line text that could be added to MediaWiki:Pagetriage-welcome to help explain things and point anyone with errors here? — xaosflux Talk 18:41, 15 October 2018 (UTC)
User:Ryan Kaldari (WMF), I don't actually see an option for using this filter in Special:NewPagesFeed - is it hidden because there are none currently? — xaosflux Talk 19:30, 15 October 2018 (UTC)
I'm not seeing on betalabs either - how is anyone going to actually make use of this? — xaosflux Talk 19:32, 15 October 2018 (UTC)
I was guessing it would show in the filters under "potential issues", but there's nothing there. FWIW, "attack" also has no articles, but is still shown there. I think I might be misunderstanding how this works altogether. Natureium (talk) 19:39, 15 October 2018 (UTC)
@Xaosflux and Natureium: During the trial period, they will need to add "copyvio=1" to the Special:NewPageFeed URL to see the interface changes. So https://en.wikipedia.org/wiki/Special:NewPagesFeed?copyvio=1. Nothing has been tagged as a potential copyvio yet, so not much to see at the moment. Ryan Kaldari (WMF) (talk) 20:16, 15 October 2018 (UTC)
I added a note at Wikipedia talk:New pages patrol/Reviewers with the above info. Ryan Kaldari (WMF) (talk) 20:20, 15 October 2018 (UTC)
@Ryan Kaldari (WMF): thank you, I included a link to that in the header for Special:NewPagesFeed to help guide any testers. — xaosflux Talk 20:53, 15 October 2018 (UTC)

JJMC89 bot 16

Operator: JJMC89 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:39, Monday, October 15, 2018 (UTC)

Function overview: Reports inactive interface administrators

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: After approval

Links to relevant discussions (where appropriate): Request

Edit period(s): Monthly

Estimated number of pages affected: 1

Namespace(s): Wikipedia

Exclusion compliant: Yes

Function details: Reports inactive interface administrators to Wikipedia:Interface administrators' noticeboard.

Inactive is currently defined as

  1. not having any edits/logged actions for 2 months, or
  2. not editing a CSS/JS page, excluding the editor's own, for 6 months.

Message copy:

== Inactive interface administrators <date> ==

The following interface administrator(s) are inactive:
* {{admin|<user1>}}
* {{admin|<user2>}}
...
* {{admin|<usern>}}
~~~~

Discussion

I'm the 'requester' for this, so would prefer if someone else from BAG can review for trial. Very low risk as it only will edit one page. For testing I suggest just tweaking the month thresholds to days to ensure it runs and produces output in a useful format. Perhaps @SQL: could approve the trial? — xaosflux Talk 13:00, 15 October 2018 (UTC)

It edits one page, one time a month. Seems extremely uncontroversial, and run by an experienced botop. I don't see a way that this could possibly go sideways and cause disruption to the project. Let's see how it works, with Xaosflux's suggestion above, if you'd like. Approved for trial. Let us know when you're happy with the results. SQLQuery me! 15:48, 15 October 2018 (UTC)

MusikBot II 2

Operator: MusikAnimal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:23, Monday, September 24, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Typically I use Ruby, but here it may have to be PHP or possibly Node.js.

Source code available: GitHub

Function overview: Syncs Wikipedia:Geonotice/list.json (to be created, fully-protected) with MediaWiki:Gadget-geonotice-list.js.

Links to relevant discussions (where appropriate): Special:PermaLink/862124571#Geonotices closed discussion, support of usage at Wikipedia talk:Interface administrators (see also RFC for IAdmins at top of that page allowing bot access where bot operator is also an IAdmin)

Edit period(s): Continuous

Estimated number of pages affected: 1

Namespace(s): MediaWiki

Exclusion compliant (Yes/No): No, not applicable.

Adminbot (Yes/No): Yes

Function details: First, some background: With the advent of the interface administrator user group, sysops can no longer edit MediaWiki:Gadget-geonotice-list.js. Many of these users are not particularly tech-savvy, and have no use for editing site-wide JS outside configuring geonotices for outreach purposes, etc. The configuration is literally just a JavaScript object, with key/value pairs. Using a JSON page then makes much more sense, which they'd be able to edit. However currently we cannot put JSON pages behind ResourceLoader (phab:T198758), so for performance reasons we need to continue to maintain the JS page. The proposed workaround is have a bot sync a JSON page with the JS page. This is in our best interests for security reasons (fewer accounts with access to site JS), but also JSON is easier to work with and gives you nice formatting, hence less prone to mistakes.

Implementation details:

  1. Check the time of the last edit to Wikipedia:Geonotice/list.json.
  2. If it is after the time of the last sync by the bot (tracked by local caching), process the JSON.
  3. Perform validations, which include full JSON validation, validating the date formats, country code (going off of ISO 3166), and format of the corners.
  4. If validations fail, report them at User:MusikBot II/GeonoticeSync/Report (example) and do nothing more.
  5. If validations pass, build the JS and write to MediaWiki:Gadget-geonotice-list.js (example), and update the report stating there are no errors (example).

The comment block at the top of MediaWiki:Gadget-geonotice-list.js can be freely edited. The bot will retain this in full.

Discussion

Per Xaosflux I've preemptively started this BRFA. I haven't done any coding but I think this is a good time to discuss implementation details. Concerns that come to mind:

  • What to do when there are syntax errors. The native JSON editor should mean admins won't introduce syntax errors, because it won't even let you save. But, it can happen -- say the admin ironically has JavaScript disabled. As a safeguard, the bot can validate the JSON, too (easy, thanks to existing libraries). Similar to User:MusikBot/PermClerk/Report, the bot could have a status report page, transcluded at Wikipedia talk:Geonotice/list.json. This way they can get some debugging info should something go wrong. If we want to get real fancy, the bot could also report when the configuration doesn't match the expected format, as described in the comments at MediaWiki:Gadget-geonotice-list.js. I think that would a nice feature, but not a requirement.
  • After deployment, we'd need to update the JS page to clearly say it should not be edited directly. We could do a two-way syncing, but I'd prefer not to, just to keep it simple.
  • I can confirm MusikBot II's account is secure and 2FA is enabled (with some caveats). The bot solution still puts us on the winning end, as there will be fewer int-admin accounts than if we added it to all who manage geonotices.
  • Anything else? MusikAnimal talk 03:23, 24 September 2018 (UTC)
  • @MusikAnimal: for the directional sync concerns, a defined "section" (delineated by comments) should be the only area edited - this section should be marked "do not edit directly" - and the bot should only edit within the section. This way if other changes to the page are needed they won't interfere. — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • This should work fine, just like her PERMclerking, right? Would be good if there are rush edits, last-minute-changes, etc. ~ Amory (utc) 16:22, 24 September 2018 (UTC)
    • Yeah, we can definitely reserve a part of the JS page for free editing, much like we do at WP:AWB/CP. MusikAnimal talk 16:41, 24 September 2018 (UTC)
  • I'd like to see some tests over at testwiki that can be used to demonstrate the edits. — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • No problem. Though I don't think we need to test Geonotice itself (could be tedious), rather just that the JS was generated properly. MusikAnimal talk 16:41, 24 September 2018 (UTC)
      • Agree, don't need to actually implement the geonotice, just that things work as expected in the namespaces and content types. — xaosflux Talk 01:21, 25 September 2018 (UTC)
  • Syntax errors could still occur in the data - will you validate this as well? For example putting start/end dates in their own cells, validate that this is date data and not something else? Everything should be validated (e.g. this should not be a route to inject new javascript). — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • Perhaps make the mock-up json page to demonstrate? — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • JS injection shouldn't be possible, unless there are vulnerabilities in Geonotice itself. I would hope it doesn't use eval on the strings. Arbitrary code (e.g. begin: alert('foo!') isn't valid JSON and hence would fail on the initial validation (and the MediaWiki JSON editor won't let you save it, either). We can still validate it ourselves, to be sure. As I said this would be a nice feature. I don't know that I want to validate things like the country, though. We could validate the 'begin'/'end' date format, in particular, but for everything else I think the bot will just look for permissible keys and the data type of the values ('country' is a string, 'corners' is an array of two arrays, each with two integers). MusikAnimal talk 16:41, 24 September 2018 (UTC)
      • Injection would be if you accepted arbitrary "text" and just made it js, where the text could contain characters that would terminate the text field and then continue along in javascript. — xaosflux Talk 17:11, 24 September 2018 (UTC)
  • For the JSON page, not the bot: we'll also have to move the normal explanation text into an editnotice or regular notice, since comments are stripped on save for pages with the JSON content model. Enterprisey (talk!) 23:32, 24 September 2018 (UTC)
  • Got a prototype working, see Special:Diff/861109475. There are quotations around all the keys, but JavaScript shouldn't care. Maybe we should test against testwiki's Geonotice to be sure. This does mean the rules have changed -- you don't need to escape single quotes ', but you do for double quotation marks ". This is just a consequence -- that's how JSON wants it. I think single quotes are probably more commonly used in the geonotice text anyway, so this might be a welcomed change. The bot could find/replace all "'s to ', but this would be purely cosmetic and error-prone when it is not really needed. Other formatting has changed, mostly whitespace. Also in the edit summary we're linking to the combined diff of all edits made to the JSON page since the last sync. That way we can easily verify it was copied over correctly. We do loose attribution here (as opposed to linking to individual diffs), but I think that's okay? Source code (work in progress) is on GitHub. I've made this task translatable, should other wikis be interested in it. I'm going to stop here until the bot proposal discussion has closed. MusikAnimal talk 04:55, 25 September 2018 (UTC)
    I agree with the quoting change. You may want to specify the number of edits if it's more than one, but I don't know if that's required for attribution. (And it's displayed on the diff page anyway.) Enterprisey (talk!) 06:16, 25 September 2018 (UTC)
  • I started adding some 'directions' at Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json, please fill out with more directions, requirements, etc. As far as attribution, in the edit request at least pipe to the name of the source page to make it clear where the source is without having to follow the diff. — xaosflux Talk 15:21, 28 September 2018 (UTC)
    • Another option there is to put the whole attribution (source, diff, time, user of diff) into the comments of the .json, and only minimal in the edit summary (revid, sourcepage). --Dirk Beetstra T C 13:59, 2 October 2018 (UTC)

─────────────────────────

  • Update I've resumed work on this and am ready for more feedback. The current implementation is described in the "function details" above. I still need to work on filling out Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json, please feel free to help. That page will be moved to Template:Editnotices/Page/Wikipedia:Geonotice/list.json when we're ready to go live.

    For validations, see Special:PermaLink/863494086 for an example invalid config (with lots of errors!) and Special:PermaLink/863494234 for generated report. A few notes:

    • I'm using Ruby internal methods to tell if the date is valid. This works for "Invalid date" or "35 January 2018 00:00 UTC" but not for invalid month names as with "15 Foobar 2018 00:00 UTC". Going by some logic I don't understand it chooses some other valid month. I could use regular expressions to ensure the month names are valid, but I want this bot task to work in other languages where I assume they're able to put in localized month names, if not a different format entirely (which Ruby should still be able to validate). Anyway I think this is fine. There were no validations before, after all :)
    • Validating the country code actually works! It's going off of the ISO 3166 spec, which is what's advertised as the valid codes Geonotice accepts.
    • Coordinates are validated by ensuring there are two corners, and each with two values (lat and lng), and that the values are floats and not integers or strings.
    • The keys of each list item are also validated, ensuring they only include "begin", "end", "country", and either "corners" or "text".
    • I added code to check if they escaped single quotations (as with \'), since Geonotice admins probably are used to doing this. Amazingly, MediaWiki won't even let you save the JSON page if you try to do this, as indeed it is invalid JSON. So turns out no validation is needed for this, or any other JSON syntax errors for that matter. This should mean we don't need to worry about anyone injecting malicious code.
    • The comment block at the top of the JS page is retained and can be freely edited.
    • Back to the issue of attribution in the edit summary, I went with Xaosflux's recommendation and am using a combined diff link, piped with the title of the JSON page. I'm not sure it's worth the hassle of adding in comments in the generated JS code directly, but let me know if there are any strong feelings about that.

Let me know if there's anything else I should do, or if we're ready for a trial! MusikAnimal talk 04:35, 11 October 2018 (UTC)

MediaWiki won't let you save the page with invalid JSON even if you turn off JS or use the API, right? Because if it does you may want to validate for that case. Enterprisey (talk!) 04:43, 11 October 2018 (UTC)
Luckily it's server side. It shows the error "Invalid content data", even if you have JS turned off. I haven't tested the API yet, but if it does work it's probably a bug in MediaWiki :) MusikAnimal talk 16:38, 11 October 2018 (UTC)
But I should clarify, the bot does validate JSON content, but I haven't tested to see if this works because I am unable to create invalid JSON :) At any rate, we would not end up in a situation where an invalid JS object is written to MediaWiki:Gadget-geonotice-list.js, because the core JSON methods that we're using would error out before this happens. MusikAnimal talk 20:15, 11 October 2018 (UTC)
This seems fine to me. My main concern (and it is fairly minor) is making sure the bot's role is clear in documentation/notices, and that people will know how to look for errors if something doesn't get updated because validation failed (because there won't be immediate user feedback as there is with the basic MW-side JSON validation). I'm giving this for a two-week trial, pending completion of the editnotice(s) and related pages and granting of the i-admin flag; based on history, that should allow for at least a handful of edits to test with, but feel free to extend if more time is required. Approved for trial (14 days). — Earwig talk 05:51, 14 October 2018 (UTC)
@The Earwig: will this be trialing on the actual pages or in userspace? Ping me if you need a flag assigned for trialing. — xaosflux Talk 00:06, 15 October 2018 (UTC)
@Xaosflux: My intention is for a full trial. I saw there were already reasonable tests done in the userspace, so given that MA feels comfortable working with the actual pages now, I'm fine with that too. As for the IA flag, it's not clear to me from the policy whether we can do that here or a request needs to be explicitly made to BN? I would prefer MA post something to BN to be safe, but I suppose one interpretation of the policy would let you grant it immediately without the waiting period. — Earwig talk 03:22, 15 October 2018 (UTC)
@MusikAnimal: what authentication options do you have configured for this bot account? (e.g. 2FA, BotPasswords, OAuth) — xaosflux Talk 11:57, 15 October 2018 (UTC)
@Xaosflux: 2FA is enabled. Historically I have not had a good solution for OAuth, but times have changed. I'll try to look into this today. For the record the current consumer for MusikBot II can only edit protected pages, all other admin rights are not permitted. We will use a different consumer here, and all related edits will be tagged with the application name. We could use "GeonoticeSync 1.0" (what I've dubbed the task, and then a version number), or is there a better name? For permissions, I believe the consumer only needs editsiteconfig.
So no need to grant int-admin just yet -- although it should be safe to do so, because we have 2FA enabled and the current consumer can't edit sitewide or user JS/CSS.
The outstanding to-dos:
  1. Create OAuth consumer and rework the bot to use it.
  2. Create Wikipedia:Geonotice/list.json to reflect current configuration, fully protect it, and move Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json to Template:Editnotices/Page/Wikipedia:Geonotice/list.json.
  3. Update documentation at Wikipedia:Geonotice and also describe the new process in the comment block at MediaWiki:Gadget-geonotice-list.js.
  4. Ping all the current Geonotice admins to make sure they know about the new system, and the new rules (don't escape single quotes, but do for double, etc.).
  5. Grant int-admin to MusikBot II, and enable the task.
I'll ping you when I'm done with steps 1-3, and once given the final okay we'll do 4-5. Sound like a plan? If we have to rollback or abandon the new system, I'll be sure to let everyone know that they can go back to editing MediaWiki:Gadget-geonotice-list.js directly. MusikAnimal talk 16:58, 15 October 2018 (UTC)
Sounds fine, let us know when you are ready. — xaosflux Talk 18:37, 15 October 2018 (UTC)

ZackBot 10

Operator: Zackmann08 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:36, Friday, September 28, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: User:ZackBot/Infobox-needed

Function overview:The goal is to scan pages that are in Category:Wikipedia articles with an infobox request and remove any pages that already have an infobox.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Bot_to_update_'Needs_infobox'

Edit period(s): One time run for now.

Estimated number of pages affected: Very difficult to say. Per PetScan there are currently 88,074 talk pages that fall in the category. I'd guess that somewhere between 3%-8% of those have Infoboxes and thus would be affected by this script. So A guess would be somewhere around 7,000-8,000 pages? But that is a TOTAL guess. This will be greatly dependent on how many of these sub categories I will run the script against.

Namespace(s):Main

Exclusion compliant (Yes/No): yes

Function details:

The functionality is pretty straight forward:

  1. Take a list of pages from a PetScan search. These will be Talk pages that are marked as needing an infobox.
  2. Check the text of the page and search for the word infobox. My research thus far has indicated that just looking for the word infobox should be good enough as it is not a term used in any other context that I can find. However, if granted a trial run, this will be an area I will be focusing my attention on.
  3. If the page is found to contain the word then go back to the talk page and look for the param 'needs-infobox' and remove it from the templates.
  4. In the event that the needs-infobox parameter is not found, an error is raised and logged for manual inspection.

The ONLY change that this script will be making is to Talk pages, and it will be to remove text matching \|\s*needs-infobox\s*=\s*y(?:es){0,1}\s*

--Zackmann (Talk to me/What I been doing) 20:36, 28 September 2018 (UTC)

Discussion

  • Could it regex for something like [{][{][ \n\t]*[Ii]nfobox ? -- GreenC 21:34, 28 September 2018 (UTC)
  • @GreenC: so technically speaking it can search for any regex. I think you are on the right track, but that has a few problems. Not all infoboxes start with {{infobox.... But perhaps something like /\{\{[\s\w\n]*infobox/i. See my testcase: here --Zackmann (Talk to me/What I been doing) 21:41, 28 September 2018 (UTC)
Great. -- GreenC 23:35, 28 September 2018 (UTC)

I'd like to see a short trial to see how it works. Approved for trial (100 edits). SQLQuery me! 03:30, 14 October 2018 (UTC)

Filedelinkerbot 3

Operator: Krd (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 06:09, Tuesday, October 2, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Perl

Source code available: No

Function overview: The bot is active as a Commonsdelinker clone since 2014, removing links to files deleted at Commons. It was requested at Wikipedia:Bot requests#CAT:MISSFILE bot that the bot shall also remove links to files deleted locally, which has been activated as trial and appears to work without problems.

Links to relevant discussions (where appropriate): none

Edit period(s): Continuous

Estimated number of pages affected: 10 per day

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes.

Function details: n/a

Discussion

Does it check to see if the file exists locally before de-linking? Unlikely, I know, but possible all the same. SQLQuery me! 23:01, 10 October 2018 (UTC)

Yes, of course, but irrelevant for this request as this is about local deletions. --Krd 04:50, 11 October 2018 (UTC)

Good task for a bot. ImageRemovalBot used to remove red-linked images until it went AWOL last month -FASTILY 16:50, 13 October 2018 (UTC)

Approved for trial (7 days). SQLQuery me! 03:29, 14 October 2018 (UTC)

Bots that have completed the trial period

FRadical Bot

Operator: FR30799386 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:58, Thursday, September 20, 2018 (UTC)

Automatic, Supervised, or Manual: Manual

Programming language(s): AutoWikiBrowser

Source code available: AWB

Function overview: This bot-task will try to remove all instances of the use of MiszaBot, MiszaBot I, MiszaBot II, MiszaBot III from the parameters of the template {{Auto archiving notice}} and replace them with Lowercase sigmabot III.

Links to relevant discussions (where appropriate):

Edit period(s): (Irregular) As and when I get time to run the bot. I will try not to exceed a 15 edits/minute edit rate.

Estimated number of pages affected: ~4294 pages will be affected

Namespace(s):Talk: namespace

Exclusion compliant (Yes/No): No

Function details: In most of the article talkpages, the manually set |bot= parameters of the template {{Auto archiving notice}} point to the long inactive set of MiszaBots namely MiszaBot, MiszaBot I, MiszaBot II, MiszaBot III. I will via this bot account (using AWB) try to make the notice point to the right bot, namely Lowercase sigmabot III. The logic used is outlined below :

  • First all the pages transculding the template Auto archiving notice are extracted using the Make List function of AWB.
  • These pages are then filtered to include only those in the Talk: namespace.
  • The pages are then pre-parsed to remove those with \|bot *\= *Lowercase\ sigmabot\ III

* Finally, the pages are then checked for the strings = *MiszaBot(regex), MiszaBot I, MiszaBot II, MiszaBot III and then replaced with =Lowercase sigmabot III for the first and Lowercase sigmabot III for the rest.

  • Find instances of \{\{([Aa]utoarchivalnotice|[Aa]utoarchive|[Aa]utoArchivingNotice|[Aa]utoarchivingnotice|[Aa]uto[ _]+archiving[ _]+notice)(.*?)\|bot\=( *)MiszaBot *I* and replace it with {{$1$2|bot=$3Lowercase sigmabot III

Additionally, each and every edit will be reviewed by the operator(me) via the AWB. Regards — fr+ 17:58, 20 September 2018 (UTC)

Discussion

The bot is currently configured to run based on MiszaBot's template transclusions. I could, in theory, reconfigure it to use a new (as of yet nonexistent) template for lowercase sigmabot, but I intentionally did not do so to avoid making hundreds of thousands of needless edits to change a transclusion. I would not recommend proceeding further with this BRFA. Σσς(Sigma) 22:34, 22 September 2018 (UTC)
Whoops. I misread. As far as lowercase sigmabot's behaviour is concerned, this looks fine, I'll let the BAG decide what's best. Σσς(Sigma) 22:43, 22 September 2018 (UTC)
  • Not to ask the stupid question, but if you're doing it totally manually, why not just get an "AWB account"? Primefac (talk) 19:59, 23 September 2018 (UTC)
Primefac The bot will edit pages that have extremely high number of watchers (For example : Talk:Mahabharata which has 622 watchers, 69 of which watch recent changes regularly). Since the (bot) flag will allow edits to be hidden from the watchers, I would prefer to use a bot account over a AWB account the edits of which cannot be hidden from the watch-list. Regards — fr+ 10:51, 24 September 2018 (UTC)
While not in scope of this task, if expanding to user_talk: in a future task a bot flag will be critical to prevent 'new messages' alerts. — xaosflux Talk 12:54, 24 September 2018 (UTC)
  • @FR30799386: is this solely in the "Talk:" namespace, or also in "talk namespaces" (e.g. user_talk, wikipedia_talk)? — xaosflux Talk 02:18, 24 September 2018 (UTC)
Xaosflux In this bot request, Talk: does mean only those pages with the Talk: prefix (i.e. only those in ns:1). However, I have plans to extend the bot functionality to encompass the rest of the talk namespace in a later BRFA. A full list of all pages this bot is expected to edit can be found here — fr+ 10:51, 24 September 2018 (UTC)

Seems like a good task for a bot. All these references to Misza I/II/III Bot are likely confusing for newbies. -FASTILY 05:20, 26 September 2018 (UTC)

  • xaosflux Would it be okay to add the bot to the AutoWikiBrowser check page, so that I can run some mock tests in my userspace ? — fr+ 11:16, 27 September 2018 (UTC)
    @FR30799386: OK added to AWBCP, only own-user spaces should be used right now. — xaosflux Talk 12:07, 27 September 2018 (UTC)
  • Xaosflux I have made the mock test in my userspace [diff]. I have also posted the revised RegEx (developed as result of the mock test) in the function details parameter of the request. Regards — fr+ 15:57, 29 September 2018 (UTC)
  • {{BAG assistance needed}} — fr+ 08:16, 5 October 2018 (UTC)
  • It has been around a week since the last BAG member edited this page. Are there any outstanding queries which I need to resolve ? Will it be possible to have a trail this ensuing week ? I am asking this because I will be chronically unavailable from 14th to 22th October. It would be good if I can finish the trail before that. — fr+ 08:16, 5 October 2018 (UTC)
    I know that it's the 14th now, and I am sorry for the wait. I think this is an excellent task for a bot, and you've addressed everything brought forward. Let's see a good size trial to make sure everything works right, Approved for trial (250 edits). SQLQuery me! 03:35, 14 October 2018 (UTC)
  • @SQL: Trial complete.. I missed yesterdays bus so I got a little time off during which I finished the trial. I accidentally overshot the limit of 250 pages by ~seven pages as a result of my absentmindedness(I was looking at the diffs and forgot to loom at the counter regularly). Additionally, there was a two glitches while performing the trail both of which I think were adequately resolved:
  • The edit summary was a truncated at the start of the trail. I changed the edit summary.
  • The bot could not detect pages with | bot=(red spot indicates pattern which it failed to recognize). This occurred within the first five pages. I fixed the bot to recognize those particular patterns and have had not problems through out the rest of the trail.

. All pages edited can be found here.Thanks — fr+ 11:11, 15 October 2018 (UTC)

  • Comment: Aha, thanks for taking on my request! Or was that just a coincidence? :-) Anyhow, I noticed it on my watchlist at Talk:Apricot. In any case, try to keep in mind what I wrote there about other templates containing the term "MiszaBot" and the fact that the bot shouldn't edit beyond the first heading. Graham87 15:18, 15 October 2018 (UTC)
    Looking over the edits, it seems like the trial went pretty well. Looks like you've addressed any problems that came up during the trial run. SQLQuery me! 15:55, 15 October 2018 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.


Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Requests_for_approval&oldid=864198103"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Wikipedia:BRFA
This page is based on the copyrighted Wikipedia article "Wikipedia:Bots/Requests for approval"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA