Mestimator
It has been suggested that Twostep Mestimators involving MLE be merged into this article. (Discuss) Proposed since October 2017.

In statistics, Mestimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. Leastsquares estimators are a special case of Mestimators. The definition of Mestimators was motivated by robust statistics, which contributed new types of Mestimators. The statistical procedure of evaluating an Mestimator on a data set is called Mestimation.
More generally, an Mestimator may be defined to be a zero of an estimating function.^{[1]}^{[2]}^{[3]}^{[4]}^{[5]}^{[6]} This estimating function is often the derivative of another statistical function. For example, a maximumlikelihood estimate is the point where the derivative of the likelihood function with respect to the parameter is zero; thus, a maximumlikelihood estimator is a critical point of the score function.^{[7]} In many applications, such Mestimators can be thought of as estimating characteristics of the population.
Contents
Historical motivation
The method of least squares is a prototypical Mestimator, since the estimator is defined as a minimum of the sum of squares of the residuals.
Another popular Mestimator is maximumlikelihood estimation. For a family of probability density functions f parameterized by θ, a maximum likelihood estimator of θ is computed for each set of data by maximizing the likelihood function over the parameter space { θ } . When the observations are independent and identically distributed, a MLestimate satisfies
or, equivalently,
Maximumlikelihood estimators have optimal properties in the limit of infinitely many observations under rather general conditions, but may be biased and not the most efficient estimators for finite samples.
Definition
In 1964, Peter J. Huber proposed generalizing maximum likelihood estimation to the minimization of
where ρ is a function with certain properties (see below). The solutions
are called Mestimators ("M" for "maximum likelihoodtype" (Huber, 1981, page 43)); other types of robust estimator include Lestimators, Restimators and Sestimators. Maximum likelihood estimators (MLE) are thus a special case of Mestimators. With suitable rescaling, Mestimators are special cases of extremum estimators (in which more general functions of the observations can be used).
The function ρ, or its derivative, ψ, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense, close to the assumed distribution.
Types of Mestimators
Mestimators are solutions, θ, which minimize
This minimization can always be done directly. Often it is simpler to differentiate with respect to θ and solve for the root of the derivative. When this differentiation is possible, the Mestimator is said to be of ψtype. Otherwise, the Mestimator is said to be of ρtype.
In most practical cases, the Mestimators are of ψtype.
ρtype
For positive integer r, let and be measure spaces. is a vector of parameters. An Mestimator of ρtype is defined through a measurable function . It maps a probability distribution on to the value (if it exists) that minimizes :
For example, for the maximum likelihood estimator, , where .
ψtype
If is differentiable, the computation of is usually much easier. An Mestimator of ψtype T is defined through a measurable function . It maps a probability distribution F on to the value (if it exists) that solves the vector equation:
For example, for the maximum likelihood estimator, , where denotes the transpose of vector u and .
Such an estimator is not necessarily an Mestimator of ρtype, but if ρ has a continuous first derivative with respect to , then a necessary condition for an Mestimator of ψtype to be an Mestimator of ρtype is . The previous definitions can easily be extended to finite samples.
If the function ψ decreases to zero as , the estimator is called redescending. Such estimators have some additional desirable properties, such as complete rejection of gross outliers.
Computation
For many choices of ρ or ψ, no closed form solution exists and an iterative approach to computation is required. It is possible to use standard function optimization algorithms, such as NewtonRaphson. However, in most cases an iteratively reweighted least squares fitting algorithm can be performed; this is typically the preferred method.
For some choices of ψ, specifically, redescending functions, the solution may not be unique. The issue is particularly relevant in multivariate and regression problems. Thus, some care is needed to ensure that good starting points are chosen. Robust starting points, such as the median as an estimate of location and the median absolute deviation as a univariate estimate of scale, are common.
Concentrating parameters
In computation of Mestimators, it is sometimes useful to rewrite the objective function so that the dimension of parameters is reduced. The procedure is called “concentrating” or “profiling”. Examples in which concentrating parameters increases computation speed include seemingly unrelated regressions (SUR) models.^{[8]} Consider the following Mestimation problem:
Assuming differentiability of the function q, Mestimator solves the first order conditions:
Now, if we can solve the second equation for γ in terms of and , the second equation becomes:
where g is, there is some function to be found. Now, we can rewrite the original objective function solely in terms of β by inserting the function g into the place of . As a result, there is a reduction in the number of parameters.
Whether this procedure can be done depends on particular problems at hand. However, when it is possible, concentrating parameters can facilitate computation to a great degree. For example, in estimating SUR model of 6 equations with 5 explanatory variables in each equation by Maximum Likelihood, the number of parameters declines from 51 to 30.^{[8]}
Despite its appealing feature in computation, concentrating parameters is of limited use in deriving asymptotic properties of Mestimator.^{[9]} The presence of W in each summand of the objective function makes it difficult to apply the law of large numbers and the central limit theorem.
Properties
Distribution
It can be shown that Mestimators are asymptotically normally distributed. As such, Waldtype approaches to constructing confidence intervals and hypothesis tests can be used. However, since the theory is asymptotic, it will frequently be sensible to check the distribution, perhaps by examining the permutation or bootstrap distribution.
Influence function
The influence function of an Mestimator of type is proportional to its defining function.
Let T be an Mestimator of ψtype, and G be a probability distribution for which is defined. Its influence function IF is
assuming the density function exists. A proof of this property of Mestimators can be found in Huber (1981, Section 3.2).
Applications
Mestimators can be constructed for location parameters and scale parameters in univariate and multivariate settings, as well as being used in robust regression.
Examples
Mean
Let (X_{1}, ..., X_{n}) be a set of independent, identically distributed random variables, with distribution F.
If we define
we note that this is minimized when θ is the mean of the Xs. Thus the mean is an Mestimator of ρtype, with this ρ function.
As this ρ function is continuously differentiable in θ, the mean is thus also an Mestimator of ψtype for ψ(x, θ) = θ − x.
Median
For the median estimation of (X_{1}, ..., X_{n}), instead we can define the ρ function as
and similarly, the ρ function is minimized when θ is the median of the Xs.
While this ρ function is not differentiable in θ, the ψtype Mestimator, which is the subgradient of ρ function, can be expressed as
and
See also
References
 ^ V. P. Godambe, editor. Estimating functions, volume 7 of Oxford Statistical Science Series. The Clarendon Press Oxford University Press, New York, 1991.
 ^ Christopher C. Heyde. Quasilikelihood and its application: A general approach to optimal parameter estimation. Springer Series in Statistics. SpringerVerlag, New York, 1997.
 ^ D. L. McLeish and Christopher G. Small. The theory and applications of statistical inference functions, volume 44 of Lecture Notes in Statistics. SpringerVerlag, New York, 1988.
 ^ Parimal Mukhopadhyay. An Introduction to Estimating Functions. Alpha Science International, Ltd, 2004.
 ^ Christopher G. Small and Jinfang Wang. Numerical methods for nonlinear estimating equations, volume 29 of Oxford Statistical Science Series. The Clarendon Press Oxford University Press, New York, 2003.
 ^ Sara A. van de Geer. Empirical Processes in Mestimation: Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
 ^ Ferguson, Thomas S. (1982). "An inconsistent maximum likelihood estimate". Journal of the American Statistical Association. 77 (380): 831–834. doi:10.1080/01621459.1982.10477894. JSTOR 2287314.
 ^ ^{a} ^{b} Giles, D. E. (2012, July 10) Concentrating, or Profiling, the Likelihood Function" [Web log post] Retrieved from http://davegiles.blogspot.com/2012/07/concentratingorprofilinglikelihood.html
 ^ Wooldridge, J.M.,Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge,Mass.
Further reading
 Andersen, Robert (2008). Modern Methods for Robust Regression. Quantitative Applications in the Social Sciences. 152. Los Angeles, CA: Sage Publications. ISBN 9781412940726.
 Godambe, V. P. (1991). Estimating functions. Oxford Statistical Science Series. 7. New York: Clarendon Press. ISBN 9780198522287.
 Heyde, Christopher C. (1997). Quasilikelihood and its application: A general approach to optimal parameter estimation. Springer Series in Statistics. New York: Springer. doi:10.1007/b98823. ISBN 9780387982250.
 Huber, Peter J. (2009). Robust Statistics (2nd ed.). Hoboken, NJ: John Wiley & Sons Inc. ISBN 9780470129906.
 Hoaglin, David C.; Frederick Mosteller; John W. Tukey (1983). Understanding Robust and Exploratory Data Analysis. Hoboken, NJ: John Wiley & Sons Inc. ISBN 0471097772.
 McLeish, D.L.; Christopher G. Small (1989). The theory and applications of statistical inference functions. Lecture Notes in Statistics. 44. New York: Springer. ISBN 9780387967202.
 Mukhopadhyay, Parimal (2004). An Introduction to Estimating Functions. Harrow, UK: Alpha Science International, Ltd. ISBN 9781842651636.
 Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007), "Section 15.7. Robust Estimation", Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge University Press, ISBN 9780521880688
 Serfling, Robert J. (2002). Approximation theorems of mathematical statistics. Wiley Series in Probability and Mathematical Statistics. Hoboken, NJ: John Wiley & Sons Inc. ISBN 9780471219279.
 Shapiro, Alexander (2000). "On the asymptotics of constrained local Mestimators". Annals of Statistics. 28 (3): 948–960. CiteSeerX 10.1.1.69.2288 . doi:10.1214/aos/1015952006. JSTOR 2674061. MR 1792795.
 Small, Christopher G.; Jinfang Wang (2003). Numerical methods for nonlinear estimating equations. Oxford Statistical Science Series. 29. New York: Oxford University Press. ISBN 9780198506881.
 van de Geer, Sara A. (2000). Empirical Processes in Mestimation: Applications of empirical process theory. Cambridge Series in Statistical and Probabilistic Mathematics. 6. Cambridge, UK: Cambridge University Press. doi:10.2277/052165002X. ISBN 9780521650021.
 Wilcox, R. R. (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press. pp. 55–79.
 Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press.
External links
 Mestimators — an introduction to the subject by Zhengyou Zhang