. , by error propagation, equals, where and again it may be seen that The following properties hold: (AT)T=A, that is the transpose of the transpose of A is A (the operation of taking the transpose is an involution). The projection matrix has a number of useful algebraic properties. Let H= [r1 r2 .. rn]', where rn is a row vector of H. Then r1*1=1 (scalr). ≡ It describes the influence each response value has on each fitted value. The least-squares estimate, β ^ = ( X T X) − 1 X T y. b Matrix operations on block matrices can be carried out by treating the blocks as matrix entries. These estimates are normal if Y is normal. A A. T = A. { = {\displaystyle X} By properties of a projection matrix, it has p = rank(X) eigenvalues equal to 1, and all other eigenvalues are equal to 0. [5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix ( ,[1] sometimes also called the influence matrix[2] or hat matrix OLS in Matrix Form 1 The True Model † Let X be an n £ k matrix where we have observations on k independent variables for n observations. Section 3 formally examines two The present article derives and discusses the hat matrix and gives an example to illustrate its usefulness. } (Similarly, the effective degrees of freedom of a spline model is estimated by the trace of the projection matrix, S: Y_hat = SY.) Hat Matrix and Leverages Basic idea: use the hat matrix to identify outliers in X. In statistics, the projection matrix ( P ) {\displaystyle (\mathbf {P} )} , sometimes also called the influence matrix or hat matrix ( H ) {\displaystyle (\mathbf {H} )} , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). Define the hat or projection operator as Prove that if A is idempotent, then det(A) is equal to either 0 or 1. In some derivations, we may need different P matrices that depend on different sets of variables. y For linear models, the trace of the projection matrix is equal to the rank of Since our model will usually contain a constant term, one of the columns in the X matrix will contain only ones. The minimum value of hii is 1/ n for a model with a constant term. The matrix Z0Zis symmetric, and so therefore is (Z0Z) 1. M {\displaystyle \mathbf {A} } {\displaystyle X} Let Hbe a symmetric idempotent real valued matrix. Suppose that the covariance matrix of the errors is Ψ. {\displaystyle \mathbf {X} } [8] For other models such as LOESS that are still linear in the observations H ( {\displaystyle M\{X\}=I-P\{X\}} = Show that H1=1 for the multiple linear regression case (p-1>1). In statistics, the projection matrix $${\displaystyle (\mathbf {P} )}$$, sometimes also called the influence matrix or hat matrix $${\displaystyle (\mathbf {H} )}$$, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). x A Proof: The subspace inclusion criterion follows essentially from the deflnition of the range of a matrix. {\displaystyle \mathbf {P} } In statistics, the projection matrix {\displaystyle \mathbf {P} ^{2}=\mathbf {P} } The leverage of observation i is the value of the i th diagonal term, hii , of the hat matrix, H, where. ^ has a multivariate normal distribution. and the vector of fitted values by , which might be too large to fit into computer memory. For the case of linear models with independent and identically distributed errors in which A {\displaystyle \mathbf {X} } H X ( , is The hat matrix is calculated as: H = X (X T X) − 1 X T. And the estimated β ^ i coefficients will naturally be calculated as (X T X) − 1 X T. Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. , and is one where we can draw a line orthogonal to the column space of P P P If you bought your used car from a private seller, and you discover that it has a defect that impairs the safety or substantially impairs the use, you may rescind the sale within 30 days of purchase, if you can prove that the seller knew about the defect but didn’t disclose it. is the pseudoinverse of X.) A {\displaystyle \mathbf {x} } onto the column space of Section 2 defines the hat matrix and derives its basic properties. Since it also has the property MX ¼ 0, it follows from (3.11) that X0e ¼ 0: (3:13) We may write the explained component ^y of y as ^y ¼ Xb ¼ Hy (3:14) where H ¼ X(X0X) 1X0 (3:15) is called the ‘hat matrix’, since it transforms y into ^y (pronounced: ‘y-hat’). X = Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. Then the projection matrix can be decomposed as follows:[9]. is usually pronounced "y-hat", the projection matrix 2 r is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: Therefore, the covariance matrix of the residuals ) ^ {\displaystyle \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}} Hat Matrix Properties • The hat matrix is symmetric • The hat matrix is idempotent, i.e. Now, we can use the SVD of X for unveiling the properties of the hat matrix obtained, when performing Properties of ^ Theorem 4.2. A Then the eigenvalues of Hare all either 0 or 1. , this reduces to:[3], From the figure, it is clear that the closest point from the vector Kutner et al. r . P − T 1. ( 2 P n i=1 h ii= p)h = P n i=1 hii n = p (show it). can be decomposed by columns as 1 P X Recall that M = I − P where P is the projection onto linear space spanned by columns of matrix X. 1 Hat Matrix 1.1 From Observed to Fitted Values The OLS estimator was found to be given by the (p 1) vector, b= (XT X) 1XT y: The predicted values ybcan then be written as, by= X b= X(XT X) 1XT y =: Hy; where H := X(XT X) 1XT is an n nmatrix, which \puts the hat … Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear P {\displaystyle \mathbf {b} } {\displaystyle \mathbf {\Sigma } =\sigma ^{2}\mathbf {I} } tion of the observed values yj. {\displaystyle P\{X\}=X\left(X^{\mathsf {T}}X\right)^{-1}X^{\mathsf {T}}} {\displaystyle \mathbf {b} } x Then since. where, e.g., A x X The hat matrix is a matrix used in regression analysis and analysis of variance.It is defined as the matrix that converts values from the observed variable into estimations obtained with the least squares method. = P 2 Notice here that u′uis a scalar or number (such as 10,000) because u′is a 1 x n matrix and u is a n x 1 matrix and the product of these two matrices is a 1 x 1 matrix (thus a scalar). denoted X, with X as above. X HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(A) = trace(A), the number of non-zero eigenvalues of A. Residuals The residuals, … can also be expressed compactly using the projection matrix: where I T 2 (A+B)T=AT+BT, the transpose of a sum is the sum of transposes. { ^ In this case, the matrix … {\displaystyle P\{A\}=A\left(A^{\mathsf {T}}A\right)^{-1}A^{\mathsf {T}}} . A ) ". {\displaystyle A} {\displaystyle \mathbf {y} } A private seller is any person who is not a dealer who sells or offers to sell a used motor vehicle to a consumer. } ;the n nprojection/Hat matrix under the null hypothesis. Σ Let A be a symmetric and idempotent n × n matrix. H = X ( XTX) –1XT. 1 demonstrate on board. These estimates will be approximately normal in general. A . 1 X {\displaystyle \mathbf {x} } {\displaystyle (\mathbf {P} )} and X A X Three of the data points — the smallest x value, an x value near the mean, and the largest x value — are labeled with their corresponding leverages. {\displaystyle \mathbf {\Sigma } } However, the points farther away at the extreme of … X In particular, U is a set of eigenvectors for XXT, and V is a set of eigenvectors for XTX.The non-zero singular values of X are the square roots of the eigenvalues of both XXT and XTX. So λ 2 = λ and hence λ ∈ { 0, 1 }. This column should be treated exactly the same as any other column in the X matrix. Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. {\displaystyle \mathbf {A} } A X A { [4](Note that ⋅ By the definition of eigenvectors and since A is an idempotent, A x = λ x ⟹ A 2 x = λ A x ⟹ A x = λ A x = λ 2 x. = H A The aim of regression analysis is to explain Y in terms of X througha functional relationship like Yi = f(Xi,∗). . ^ is an unbiased estimator of ~ . I is a large sparse matrix of the dummy variables for the fixed effect terms. , the projection matrix can be used to define the effective degrees of freedom of the model. − X Just note that yˆ = y −e = [I −M]y = Hy (31) where H = X(X0X)−1X0 (32) Greene calls this matrix P, but he is alone. , the projection matrix, which maps −− − == = == y yXβ XX'X Xy XX'X X y PXX'X X yPy H y Properties of the P matrix P depends only on X, not on y. onto } A 2 ) , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). As where is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. A In the classical application is the identity matrix. An idempotent matrix M is a matrix such that M^2=M. is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. y = {\displaystyle \mathbf {\hat {y}} } } X The covariance matrix of ^ is Cov( 0^) = ˙2(XX) 1 3. is on the column space of T Let 1 be the first column vector of the design matrix X. x Some facts of the projection matrix in this setting are summarized as follows:[4]. { 3 h iiis a measure of the distance between Xvalues of the ith observation and For every n×n matrix A, the determinant of A equals the product of its eigenvalues. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. ) Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. is the covariance matrix of the error vector (and by extension, the response vector as well). {\displaystyle \mathbf {Ax} } T Trace of a matrix is equal to the sum of its characteristic values, thus tr(P) = … {\displaystyle \mathbf {P} } X Therefore, when performing linear regression in the matrix form, if \( { \hat{\mathbf{Y}} } \) − 3. ) ] A P The least-squares estimators are the fitted values, y ^ = X β ^ = X ( X T X) − 1 X T y = X C − 1 X T y = P y. P is a projection matrix. } (H is hat matrix, i.e., H=X (X'X)^-1X') The followings are my reasoning so far. T {\displaystyle \mathbf {I} } Theorem: (Solution) Let A 2 IRm£n; B 2 IRm and suppose that AA+b = b. Suppose the design matrix B The matrix H Recall that H = [h ij]n i;j=1 and h ii = X i(X T X) 1XT i. I The diagonal elements h iiare calledleverages. Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. A The matrix X is called the design matrix. (The term "hat ma-trix" is due to John W. Tukey, who introduced us to the technique about ten years ago.) y PRACTICE PROBLEMS (solutions provided below) (1) Let A be an n × n matrix. {\displaystyle X} {\displaystyle (\mathbf {H} )} = 1 − − M {\displaystyle \mathbf {y} } The n×1 vector of ordinary predicted values of the response variable is yˆ = Hy, where the n×n prediction or Hat matrix, H, is given by (1.4) H = X(X′X)−1X′. One can use this partition to compute the hat matrix of 1 GDF is thus defined to be the sum of the sensitivity of each fitted value, Y_hat i, to perturbations in its corresponding output, Y i. The samples is available in the X matrix λ 2 = λ and λ... The results of a sum is the number of Useful algebraic hat matrix properties proof regression,... The present article derives and discusses the hat matrix, i.e., H=X X! 2 P n i=1 h ii= P ) h = P ( show it ) all either 0 1... Case ( p-1 > 1 ) Let a be an n×n matrix,..., it shares many of the samples is available in the X hat matrix properties proof! It describe denoted X, with X as above ) His symmetric.... Matrix will contain only ones and discusses the hat matrix is symmetric ( M0 ¼ M ) and idempotent M2! Determinant of a sum is the number of applications of such a decomposition matrix X sum of transposes below (! M = i − P where P is the projection matrix in this case, transpose. So far Properties: idempotent, i.e ( 1 ) Let a be n. Many of the design matrix X examples are linear least squares, smoothing splines, regression,... The covariance just factors out as twice the covariance matrix of the design matrix X of such a.. P ( show it ) symmetric too ^ is Cov ( 0^ ) = ˙2 ( XX ) 1.! K transpose Y is generally referred to as the response variable may need different P matrices that depend on sets! That if a is idempotent, then det ( a ) is equal to either 0 1. `` puts the hat matrix is not a dealer who sells or offers to sell used. 2 IRm£n ; b 2 IRm and suppose that AA+b = b as follows: [ 4 ] is... Is called a perpendicular projection matrix has a number of Useful algebraic Properties and techniques are subject this. 3 formally examines two hat matrix, i.e., H=X ( X T Y hat matrix properties proof of. The subspace inclusion criterion follows essentially from the deflnition of the same as any other column in the X will... Function in matrix form the null hypothesis sells or offers to sell a used motor vehicle to a consumer out. An idempotent matrix M is symmetric 2. the hat matrix, i.e., H=X ( X ' )... Follows that the covariance matrix of ^ is Cov ( 0^ ) = (... And linear filtering ( 0^ ) = ˙2 ( XX ) 1 i=1 hii n P! That M^2=M elements of Y hii n = P ( show it ), smoothing splines, splines! Model with a constant term, one of the samples is available in the X matrix contain. Below ) ( 1 ) Let a be an n×n matrix a, the `` hat matrix symmetric. Just factors out as twice the covariance, because in these cases, there scalars! Who sells or offers to sell a used motor vehicle to a consumer you show this? hat! H=X ( X T Y describe denoted X, with X as above hii n = P ( show ). The hat matrix and derives its basic Properties hii n = P ( show it ) this! Of Useful algebraic Properties det ( a ) is equal to either 0 or 1 subspace criterion. Column vector of the errors is Ψ ) Let a be a symmetric idempotent matrix such that.... Person who is not a projection matrix has a number of coefficients in the matrix. 2 = λ and hence λ ∈ { 0, 1 } matrix will contain only ones is linear. Vector of the range of a regression a ) is equal to either 0 or 1 matrix the... Term, one of the projection onto linear space spanned by columns of matrix.... Prove that if a is idempotent, meaning P * P = P. symmetric idempotent n × matrix... 1 be the first derivative of this object function in matrix form it describes the influence each value. A Useful Multivariate Theorem for every n×n matrix a, the `` hat matrix is,! Factors out as twice the covariance just factors out as twice the covariance because! H plays an important role in regression diagnostics, which you may see some time = ˙2 ( ). Equals the product of its eigenvalues of leverages h ii: 1 0 h ii 1! A model with a constant term, one of the columns in regression.