Wikipedia:Edit filter noticeboard

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Welcome to the edit filter noticeboard
Filter 953 — Flags: disabled
Last changed at 02:10, 17 July 2019 (UTC)

Filter 664 — Pattern modified

Last changed at 16:47, 15 July 2019 (UTC)

Filter 997 (new) — Actions: none; Flags: enabled,private; Pattern modified

Last changed at 16:01, 12 July 2019 (UTC)

Filter 967 — Pattern modified

Last changed at 03:21, 12 July 2019 (UTC)

This is the edit filter noticeboard, for coordination and discussion of edit filter use and management.

If you wish to request an edit filter, please post at Wikipedia:Edit filter/Requested. If you would like to report a false positive, please post at Wikipedia:Edit filter/False positives.

Private filters should not be discussed in detail here; please email an edit filter manager if you have specific concerns or questions about the content of hidden filters.



AbuseFilter/384

Renamed this header btw from False Negative?. –MJLTalk 23:37, 7 June 2019 (UTC)

A recent vandal ([1]) has brought to my attention that this did not trip any of our edit filters. I'm not well versed in this process, but could we edit 384 with the following code to catch this in the future?

\b(horse|dog)?shit(|s|ti?er|t?y|t?ing)?\b

Hopefully, I am not making a terrible suggestion here. –MJLTalk 07:49, 17 May 2019 (UTC)

  • \b(bull|dog|horse)?shit(|s|ti?er|t?y|t?ing)?\b maybe? I agree with this change either way. --qedk (t c) 08:13, 17 May 2019 (UTC)
For quick reference, 384 currently catches \b(horse|dog)?shits?\b. I would say that I suspect some of the new things to catch would be prone to false positives. "Shitty" appears quite a bit in titles and quotes. What happened here was the vandal actually worked the edit filter until he found the sweet spot where his language wasn't bad enough to trip 384, but his deletion wasn't big enough to trip 12. Someguy1221 (talk) 08:19, 17 May 2019 (UTC)
Possibly set to log-only and see what we're catching first then? bull seems to be a no-brainer addition imo, given the other two are up there. --qedk (t c) 08:22, 17 May 2019 (UTC)
Great addition qedk! Also, I'm fine with a log-only if we want to play it safe. –MJLTalk 10:05, 17 May 2019 (UTC)
I don't think bull is a no-brainer addition - bullshit has way more legitimate uses than horseshit or dogshit, and so more scope for FPs. Galobtter (pingó mió) 14:56, 18 May 2019 (UTC)
What are the chances an un-confirmed editor is using any of these words constructively, hmmm. Can't say for sure but instincts tell me rarely. --qedk (t c) 10:23, 20 May 2019 (UTC)
Yeah, if someone is working around a filter, the solution is blocking them, really, because if someone tries enough they certainly can always get around the filter. Galobtter (pingó mió) 14:56, 18 May 2019 (UTC)
  • Question. @QEDK and Someguy1221: in the same vein as this, could we edit Filter 11 from !("suc?k to !("suc?k?s ? I just noticed that this did not get tagged ( [[Category:Blue Peter Sucks| ]]) .–MJLTalk 18:25, 17 May 2019 (UTC)
    Should probably be logged to check for FPs, suc?k?(er|ah)?s? is a suggestion I can offer. --qedk (t c) 09:58, 18 May 2019 (UTC)
    @MJL: suc?k?s breaks the "suck" match btw? Or am I just bad at regex? 🤔 --qedk (t c) 10:03, 18 May 2019 (UTC)
    @QEDK: lol yeah you're right. I'm still learning the ropes with RegexBuddy. –MJLTalk 14:12, 18 May 2019 (UTC)
    To be quite fair, I'm terrible at it myself, so no defense here. Face-smile.svg I can make some hacky stuff, because why not? --qedk (t c) 14:18, 18 May 2019 (UTC)
    There's no word boundary at the end of suc?k, so the filter will match "sucks", "sucker", "suckah" etc; the reason that edit didn't match is that the regex requires the word before "suck" to match \b((yo)?u( ?all)?|(s?h|w)e|they|it?|y'?al+). Galobtter (pingó mió) 15:05, 18 May 2019 (UTC)
    @Galobtter: Could you take a look at this edit for me? It was triggered once but not twice.MJLTalk 23:35, 7 June 2019 (UTC)
    I can't imagine the word bass needs to be tripping the filter..?MJLTalk 23:37, 7 June 2019 (UTC)
    Nevermind, I'm dumb. It was the word lolMJLTalk 23:42, 7 June 2019 (UTC)
    @MJL: I sent you an explanation via email, not posting onwiki per beans --DannyS712 (talk) 23:47, 7 June 2019 (UTC)
    @DannyS712: Replied, thank you! :D –MJLTalk 23:52, 7 June 2019 (UTC)

New tool for testing filter changes

I've always been bothered by the fact that Special:AbuseFilter/test doesn't allow batch testing against old filter hits, so it's easy to break a filter without knowing it. So, I created User:Suffusion of Yellow/batchtest-plus (source). This script allows you to test a pattern against 100 old hits, with a single click. Comments, and suggestions for improvement, are welcome. Suffusion of Yellow (talk) 00:44, 11 June 2019 (UTC)

Suffusion of Yellow, very very nice! I generally manually test on a few past filter hits using examine when making most filter edits, but this is certainly much more convenient, and should save time. Also, testing against 100 hits would allow seeing how much a change to reduce false positives reduces the amount of abuse caught. Galobtter (pingó mió)
@Galobtter: Thanks! Glad to know at least person found it useful. I'd still like to make it faster, but don't like the idea of flooding the servers with requests as fast as the browser will send them. MusikAnimal or Daimona Eaytoy, would either of you know what an acceptable rate limit on abusefiltercheckmatch requests would be? Right now I'm limiting it to five requests per second, and never making concurrent requests. Am I being overly cautious? If you don't know, who do I ask? Suffusion of Yellow (talk) 18:30, 13 June 2019 (UTC)
Suffusion of Yellow, per the general mw:API:Etiquette, you should be perfectly fine if you wait for the one request to finish before sending a new request, such that you are never making more than one request at the same time, but on the scale of the WMF servers, I'd think being sent 100 requests at once (or over a few seconds) vs over 20 seconds isn't going to be much of a difference; the issue would more be about making a lot of requests over minutes and hours. (also, you should set a Api-User-Agent header, so that in case issues are caused you can be contacted - see m:User-Agent_policy) Galobtter (pingó mió) 19:41, 13 June 2019 (UTC)
I also agree with Galobtter, although TBH I don't know how heavy that API module can be (I guess it strongly depends on the input). I'd also love to bring this feature inside AbuseFilter, but I think I found it to be non-trivial the last time I looked. --Daimona Eaytoy (Talk) 08:28, 14 June 2019 (UTC)
Thank you Suffusion of Yellow! I think this will be very useful. I also concur with Galobtter. In general, I would not worry too much about flooding the servers. Twinkle for instance can fire off hundreds and hundreds of POST requests at a single time (batch delete, unlink, etc.), and the worse that happens is the servers reject a few of them. In this case you're not even doing POSTs, and 100 requests isn't that many, so I'd say make it as fast as you can :) MusikAnimal talk 18:18, 14 June 2019 (UTC)
@Galobtter, Daimona Eaytoy, and MusikAnimal: Thanks for all your advice! I've removed the rate limit. I tried sending all the request at once, but some of them would timeout without any even an error response. So, instead I've limited it to 10 parallel requests, which seems to work, and finishes in a few seconds, usually. Let me know if you have any problems, and I can tweak the setting, or make it user-configurable. Suffusion of Yellow (talk) 19:41, 20 June 2019 (UTC)

894 - equal to any using regex?

Hi. I just came across 894 (hist · log). The first line is page_namespace rlike "^(0|118)$" & (. Is there a reason that this comparison is done using a regex string comparison, rather than equals_to_any(page_namespace, 0, 118) & (? Just wondering, since it seems less efficient to convert an integer into a string and do this regex rather than using the built in equals_to_any. Thanks, --DannyS712 (talk) 03:38, 19 June 2019 (UTC)

My guess is that the regex-matching means there's always one condition, while an equals_to_any(page_namespace, 0, 118) would be a worst-case of two conditions (don't quote me on that). --qedk (tc) 05:34, 19 June 2019 (UTC)
@QEDK: but doesn't the regex or (|) mean that the cases are the same? Equals to any is the same as doing a = b or a = c, but it is numerical comparison instead of strings --DannyS712 (talk) 05:46, 19 June 2019 (UTC)
Regex-matching would just mean matching the entire string to the regex, so the entire string is compared at the same time, instead of two different times, page_namespace == 0 and then if that fails, page_namespace == 118. --qedk (tc) 05:51, 19 June 2019 (UTC)
Oh, that makes sense. Can anyone (maybe requires access to logstash?/something else?) see what is more efficient? --DannyS712 (talk) 05:54, 19 June 2019 (UTC)
I know @Daimona Eaytoy: has access, I don't know if there's anyone else. --qedk (tc) 05:59, 19 June 2019 (UTC)
@QEDK and DannyS712:, no, the reason is that equals_to_any was only relatively recently added as a function, so before that rlike was used to save conditions. Definitely, rlike should be replaced with equals_to_any, which does only use one condition, but it is not particularly urgent. Galobtter (pingó mió) 07:47, 19 June 2019 (UTC)
Woop, that's good to know. Good I wasn't quoted afterall. --qedk (tc) 07:53, 19 June 2019 (UTC)
Just to absolve myself of this crime, 718smiley.svg I will state I saw the examples on condition limits and observed that each lookup in the argument lists (or better called, invocations) counts as one condition, so this essentially looked the same to me. Noting that, I personally deal only with seeing edit filter logs and not making them. --qedk (tc) 08:00, 19 June 2019 (UTC)
@Galobtter: Well, given that it isn't urgent, I'm not going to file a bunch of "edit requests", but I've started keeping track (at User:DannyS712/sandbox2) of filters that should have their syntax updated. Thanks for explaining --DannyS712 (talk) 08:10, 19 June 2019 (UTC)
Galobtter is right. I added equals_to_any last year, and its main use case is, in fact, to check the namespace against a given list. It avoids the overhead of using regexps, and equals_to_any(val,a,b,c) is the same as val===a | val === b | val === c, but taking one condition instead of 3, and a bit more readable. Note that, in this case, the performance is roughly identical. This is just one of the many cases where more condition doesn't mean worse performance. Using rlike, instead, could be a very little bit slower. --Daimona Eaytoy (Talk) 09:24, 19 June 2019 (UTC)

False negatives

384 (hist · log)

Can I suggest that \bass\s?holes?\b be replaced with \b(ass|butt)\s?holes?\b? See Special:Diff/902637086. I don't know if this would cause any false positives, since I can't check, but I doubt that it would. Thanks, --DannyS712 (talk) 06:51, 20 June 2019 (UTC)

I'm very tired right now and so not gonna trust myself with regex, but just noting that "asshole" is used in twice as many articles as "butthole", so not anticipating a problematic number of false positives. Someguy1221 (talk) 07:17, 20 June 2019 (UTC)
@DannyS712 and Someguy1221: Testing in 839 (hist · log). Since that's a rather expensive filter for just one word, now would be a good time to test other possible additions to 384. Any suggestions? I'd like to avoid memes or anything else that will go "stale"; 614 is better fit for those. Suffusion of Yellow (talk) 20:54, 25 June 2019 (UTC)
@Suffusion of Yellow: Maybe expand the ass to include when it doesn't have \s?holes\s?? "__BLP__ is a pain in the ass" doesn't trigger it. --DannyS712 (talk) 20:58, 25 June 2019 (UTC)
@DannyS712:  Done Checking for (dumb)?ass. With >10K uses in mainspace, I'm fairly sure that there will way too many FPs for plain "ass", but I'm curious anyway. Suffusion of Yellow (talk) 21:19, 25 June 2019 (UTC)
@Suffusion of Yellow: (?:dumb)?ass matches dumbass (but not dumb ass) as well as just plain ass, which I don't think you wanted. can I suggest dumb\s?ass? --DannyS712 (talk) 21:21, 25 June 2019 (UTC)
@DannyS712: I mean to check to for both "dumbass" and plain "ass", for a little while, in case I'm wrong about the FPs for plain "ass". If there are too many FPs, I'll change it to your suggestion. Suffusion of Yellow (talk) 21:29, 25 June 2019 (UTC)
@MJL and QEDK: Also added some "shit"-related tests, as you suggested in the other thread. I modified the regex so as not to overlap with 384 during testing. Suffusion of Yellow (talk) 23:40, 25 June 2019 (UTC)
680 (hist · log)

Can I suggest that [🄀-🇿🌀-🙏🚀-🛳☀-☄☇-☿♃-♬♰-✒✙-✯✱-➿] be replaced with [🄀-🇿🌀-🙏🚀-🛳☀-☄☇-☿♃-♬♰-✒✙-✯✱-➿🤪🤙]? See Special:Diff/903247214. I don't know if this would cause any false positives, since I can't check, bu tI doubt that it would. Thanks, --DannyS712 (talk) 16:50, 24 June 2019 (UTC)

DannyS712 - thanks for spotting! Rather than adding individual emojis, I've added the range U+1F90D to U+1F9FF (see Unicode blocks) per Template:Emoji (Unicode block) to update things till Unicode 12. Galobtter (pingó mió) 17:42, 24 June 2019 (UTC)

Set filter 993 to disallow

I was about to turn this off, thinking they had gone away. Seems not. I've already set it to disallow, as the LTA was just active, and there have been no FPs in over a month. If anyone objects, please revert. Suffusion of Yellow (talk) 21:08, 29 June 2019 (UTC)

Suffusion of Yellow, the reason it seemed they had gone away was because I had blocked Special:Contributions/181.21.128.0/17 for a month - probably should have told you. You can probably merge to 906. Galobtter (pingó mió) 08:55, 30 June 2019 (UTC)
@Galobtter: LOL, that would do it. They've edited from a few other ranges in the past, but if they don't show up again in a few days, I'll merge. Suffusion of Yellow (talk) 17:17, 30 June 2019 (UTC)

Expanding 664

664 (hist · log)

Can I suggest that it be expanded to include the draft namespace, not just the main namespace? Since non-confirmed editors can't create pages in mainspace, their test pages are generally drafts. See this page creation (Special:AbuseFilter/examine/1162056102) with the word "Hello". Thanks, --DannyS712 (talk) 13:25, 10 July 2019 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Edit_filter_noticeboard&oldid=905653066"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Wikipedia:Edit_filter_noticeboard
This page is based on the copyrighted Wikipedia article "Wikipedia:Edit filter noticeboard"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA