Wikipedia:Edit filter/Documentation

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

The extension defines a domain-specific language solely to write filter rules. Since the language is not Turing complete, it cannot replace bots for more complex tasks.

Significant content taken from mw:Extension:AbuseFilter/Rules format; see page history for attribution.

Variables

The edit filter captures the following data from edits. They are stored in the following variables. They can be manipulated and analyzed with various functions and operators. The data types are int (signed integer values), string (sequences of Unicode characters), bool (true and false), and float (signed rational numbers).

Note that some numerical variables may be defined as a string; to act based on these variables, you may need to cast them to an int. For example, the variable user_editcount is a string because it is empty for unregistered users; to perform a comparison, you must cast it to an int first (e.g. int(user_editcount) <= 500).

"Pre-save transformed" means before the wikitext is evaluated after saving; i.e. without template substitution. The wikitext is taken from the latest version before page save. For example, the added_lines of {{subst:mbox}} is {{#invoke:Message box|mbox}}; the added_lines_pst is exactly {{subst:mbox}} verbatim.

Variables available
Description Name Data type Values
Edit count of the user user_editcount string Empty for unregistered users.
Name of the user account user_name string
Time email address was confirmed user_emailconfirm string YYYYMMDDHHMMSS
Age of the user account user_age in seconds; 0 for IP
Whether the user is blocked user_blocked bool is 1 for blocked users.
Whether or not a user is editing through the mobile interface user_mobile bool is 1 for mobile users.
Groups (including implicit) the user is in user_groups string
Rights that the user has user_rights string
Page ID (found in the page's HTML source - search for wgArticleId) article_articleid integer In theory this is 0 for new pages, but this is unreliable. Instead, use "old_size==0" to identify new page creation.
Page namespace article_namespace integer refers to namespace index
Page title (without namespace) article_text string
Full page title article_prefixedtext string
Edit protection level of the page article_restrictions_edit string
Move protection level of the page article_restrictions_move string
Upload protection of the file article_restrictions_upload string
Create protection of the page article_restrictions_create string
Last ten users to contribute to the page article_recent_contributors string Empty if the user is the only contributor to the page(?), only scans the last 100 revisions
First user to contribute to the page article_first_contributor string
Action action string edit, move, createaccount, autocreateaccount, delete, upload, gatheredit
Edit summary/reason summary string
Whether or not the edit is marked as minor minor_edit bool
Old page wikitext, before the edit (no more in use) old_wikitext string
New page wikitext, after the edit new_wikitext string
Unified diff of changes made by edit edit_diff string
Unified diff of changes made by edit, pre-save transformed edit_diff_pst string
New page size new_size integer
Old page size old_size integer
Size change in edit edit_delta integer
Lines added in edit, pre-save transformed added_lines_pst string
Lines added in edit added_lines string
Lines removed in edit removed_lines string
All external links in the new text all_links string
Links in the page, before the edit old_links string
All external links added in the edit added_links string
All external links removed in the edit removed_links string
New page wikitext, pre-save transformed new_pst string
Parsed HTML source of the new revision new_html string
New page text, stripped of any markup new_text string
Disabled old_html
Disabled old_text
Whether or not the change was made through a tor exit node tor_exit_node bool 0, 1
Unix timestamp of change timestamp string int(timestamp) gives you a number with which you can calculate the date, time, day of week, etc.
SHA1 hash of file contents in hexadecimal file_sha1 string
Size of the file in bytes file_size integer The file size in bytes
Page ID of move destination page moved_to_articleid
Full title of move destination page moved_to_prefixedtext
Namespace of move destination page moved_to_namespace
Namespace of move source page moved_from_namespace
Full title of move source page moved_from_prefixedtext
Page ID of move source page moved_from_articleid
Account name (on account creation) accountname string

Comparison

Operator True when ...
< the left operand is less than the right.
> the right operand is more than the right.
<= the left operand is less than or equal to the right.
>= the left operand is more than or equal to the right.
= the left operand is equal to the right.
!= the left operand is not equal to the right.
== the left operand is equal to the right and they are of the same data type.
!== the left operand is not equal to the right and they are not of the same data type.

Arithmetic

Operator Operation
+ Addition
- Subtraction
* Multiplication
/ Division
** Exponentation
% Modulo (remainder)

+ concatenates a string with a string or other data type.

Keywords

  • like returns true if the left string matches the right string; this is distinct from = as the right string can include wildcard characters.
  • in returns true if the right string contains the left string.
  • rlike returns true if the left string matches the regular expression pattern in the right string. irlike is rlike with case-insensitivity. The regex engine is PCRE with support for Unicode characters. This is expensive; regexes with quantifiers can grow exponentially with an edit. A filter should use as few regex searches as possible.
  • if .. then .. else .. end and the ternary conditional operator condition ? if true : else.


Declaring new variables

You can declare new variables within a condition; their scope is lexical scoping limited to the condition they appear within.

Functions

name description
lcase Returns the argument converted to lower case.
ucase Returns the argument converted to upper case.
length Returns the length of the string given as the argument.
string Casts to string data type.
int Casts to integer data type.
float Casts to floating-point data type.
bool Casts to boolean data type.
norm Equivalent to rmwhitespace(rmspecials(rmdoubles(ccnorm(arg1)))).
ccnorm Normalises confusable/similar characters in the argument, and returns a canonical form. A list of characters and their replacements can be found Template:Git file, eg. ccnorm( "Eeèéëēĕėęě3ƐƷ" ) == "EEEEEEEEEEEEE".[1][2] Note that the extension AntiSpoof is required for this function to have an effect. Without it the string will simply be left unchanged.
specialratio Returns the number of non-alphanumeric characters divided by the total number of characters in the argument.
rmspecials Removes any special characters in the argument, and returns the result. (Equivalent to s/[^\p{L}\p{N}]//g.)
rmdoubles Removes repeated characters in the argument, and returns the result.
rmwhitespace Removes whitespace (spaces, tabs, and newlines).
count Returns the number of times the needle (first string) appears in the haystack (second string). If only one argument is given, splits it by commas and returns the number of segments.
rcount Similar to count but the needle uses a regular expression instead. Can be made case-insensitive by letting the regular expression start with "(?i)".
ip_in_range Returns true if user's IP (first string) matches specified IP ranges (second string). Only works for anonymous users. Supports IPv4 and IPv6 addresses.
contains_any Returns true if the first string contains any strings from the following arguments (unlimited number of arguments).
substr Returns the portion of the first string, by offset from the second argument (starts at 0) and maximum length from the third argument (optional).
strlen Same as length.
strpos Returns the numeric position of the first occurrence of needle (second string) in the haystack (first string). This function may return 0 when the needle is found at the begining of the haystack, so it might be misinterpreted as false value by another comparative operator. The better way is to use == or !== for testing whether it is found.
str_replace Replaces all occurrences of the search string with the replacement string. The function takes 3 arguments in the following order: text to perform the search, text to find, replacement text.
rescape Returns the argument with some characters preceded with the escape character "\", so that the string can be used in a regular expression without those characters having a special meaning.
set Sets a variable (first string) with a given value (second argument) for further use in the filter. Another syntax: name := value.
set_var Same as set.

Actions which can be assigned in response to filtered edits

If a user triggers a filter, the edit filter can apply any of the following sanctions based on the severity of the offense:

  • All actions triggering a filter are logged at a special page.
  • The user's action can be tagged for further review.
  • The user can be warned that their actions may be unconstructive.
  • The user's action may be disallowed.

The following actions are currently not available on this wiki:

  • The user's account may be blocked from editing, along with all IP addresses used in the last 7 days.
  • The user's account may be removed from all privileged groups (such as sysop, bot, rollbacker).

Note: Individual sanctions can be disabled selectively. Any edit filter manager can restore autoconfirmed status in case of an error.

Condition limit

The condition limit is a limit imposed by the software on the total number of conditions that can be evaluated by the filters. It is arbitrarily fixed at 1,000. This is for performance reasons. See mw:Extension:AbuseFilter/Conditions for more details.

Monitoring

All edits triggering an action will produce a report at Special:AbuseLog. On this page, a brief log entry is entered. Users with the appropriate permissions may view the log summary. Users with certain higher permissions may view details on the log entry. This includes all information available to the filter when it ran, and may be useful for debugging purposes. Users with the highest level of log-viewing permissions may view private data about the action which caused the log event, such as the user's IP address. See the AbuseFilter documentation for more details on the permissions structure.

Sample abuse log entries

  • 06:43, 23 June 2008: Andrew (talk | contribs | block) triggered an abuse filter, making an edit on Main Page. Actions taken: warn,disallow; Filter description: Test Filter
  • 06:43, 23 June 2008: Andrew (talk | contribs | block) triggered an abuse filter, making an edit on Main Page. Actions taken: none; Filter description: Test Filter

Sample detailed abuse log entries

A sample detailed log entry
  • 06:43, 23 June 2008: Andrew (talk | contribs | block) triggered filter 1, making an edit on Main Page. Actions taken: warn,disallow; Filter description: Test Filter (details)
  • 06:43, 23 June 2008: Andrew (talk | contribs | block) triggered filter 2, making an edit on Main Page. Actions taken: none; Filter description: Test Filter (details)
  • 06:42, 23 June 2008: Andrew (talk | contribs | block) triggered filter 1, making an edit on Main Page. Actions taken: warn; Filter description: Test Filter (details)
  • 06:42, 23 June 2008: Andrew (talk | contribs | block) triggered filter 2, making an edit on Main Page. Actions taken: none; Filter description: Test Filter (details)
  • 06:22, 23 June 2008: Andrew (talk | contribs | block) triggered filter 1, making an edit on Main Page. Actions taken: warn,disallow; Filter description: Test Filter (details)
  • 06:22, 23 June 2008: Andrew (talk | contribs | block) triggered filter 2, making an edit on Main Page. Actions taken: none; Filter description: Test Filter (details)

The details link brings up a screen like that on the right.

Safeguards

To protect the wiki against poorly configured filters, a technical limit is imposed on the maximum percentage of actions that will trigger a given filter. Other technical limits are in the process of being written.

Notification

All notifications are based on the template {{edit filter warning}}.

Standard notifications shown to a user triggering a filter action:

Message name Message text
abusefilter-disallowed
abusefilter-degrouped
abusefilter-autopromote-blocked

Generic warning message is below. Admins are advised to use custom warnings.

Message name Message text
abusefilter-warning

Some existing filters and their warnings:

Filter and message Message text
3: blanking articles

blanking

30: large deletions

removal

If a filter is set to warn and disallow, then a user clicking "Save page" will alternatively see that warning and standard disallowed message.

  1. ^ Be aware of phab:T29987
  2. ^ Be aware of phab:T27619
Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Edit_filter/Documentation&oldid=844333807"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Wikipedia:Edit_filter/Documentation
This page is based on the copyrighted Wikipedia article "Wikipedia:Edit filter/Documentation"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA