Contents preface PART ONE SIMPLE LINEAR REGRESSION 1 Chapter 1 Linear Regression with One Predictor Variable 2 1.1 Relations between Variables 2 Functional Relation between Two Variables 2 Statistical Relation between Two Variables 3 1.2 Regression Models and Their Uses 5 Historical Origins 5 Basic Concepts 5 Construction of Regression Models 7 Uses of Regression Analysis 8 Regression and Causality 8 Use of Computers 9 1.3 Simple Linear Regression Model with Distribution of Error Terms Unspecified 9 Formal Statement of Model 9 Important Features of Model 9 Meaning of Regression Parameters 11 Alternative Versions of Regression Model 12 1.4 Data for Regression Analysis 12 Observational Data 12 Experimental Data 13 Completely Randomized Design 13 1.5 Overview of Steps in Regression Analysis 13 1.6 Estimation of Regression Function 15 Method of Least Squares 15 Point Estimation of Mean Response 21 Residuals 22 Properties of Fitted Regression Line 23 1.7 Estimation of Error Terms Variance •2 24 Point Estimator of •2 24 1.8 Normal Error Regression Model 26 Model 26 Estimation of Parameters by Method of Maximum Likelihood 27 Cited References 33 Problems 33 Exercises 37 Projects 38 Chapter 2 Inferences in Regression and Correlation Analysis 40 2.1 Inferences Concerning •1 40 Sampling Distribution of b1 41 Sampling Distribution of (b1 -•1)/s{b1} 44 Confidence Interval for •1 45 Tests Concerning •1 47 2.2 Inferences Concerning •0 48 Sampling Distribution of b0 48 Sampling Distribution of (b0 -•0)/s{b0} 49 Confidence Interval for •0 49 2.3 Some Considerations on Making Inferences Concerning •0 and •1 50 Effects of Departures from Normality 50 Interpretation of Confidence Coefficient and Risks of Errors 50 Spacing of the X Levels 50 Power of Tests 50 2.4 Interval Estimation of E{Yh} 52 Sampling Distribution of ˆY h 52 Sampling Distribution of ( ˆY h - E{Yh})/s{ ˆY h} 54 Confidence Interval for E{Yh} 54 2.5 Prediction of New Observation 55 Prediction Interval for Yh(new) when Parameters Known 56 Prediction Interval for Yh(new) when Parameters Unknown 57 Prediction of Mean of m New Observations for Given Xh 60 2.6 Confidence Band for Regression Line 61 2.7 Analysis of Variance Approach to Regression Analysis 63 Partitioning of Total Sum of Squares 63 Breakdown of Degrees of Freedom 66 x Contents xi Mean Squares 66 Analysis of Variance Table 67 Expected Mean Squares 68 F Test of •1 = 0 versus •1 _= 0 69 2.8 General Linear Test Approach 72 Full Model 72 Reduced Model 72 Test Statistic 73 Summary 73 2.9 Descriptive Measures of Linear Association between X and Y 74 Coefficient of Determination 74 Limitations of R2 75 Coefficient of Correlation 76 2.10 Considerations in Applying Regression Analysis 77 2.11 Normal Correlation Models 78 Distinction between Regression and Correlation Model 78 Bivariate Normal Distribution 78 Conditional Inferences 80 Inferences on Correlation Coefficients 83 Spearman Rank Correlation Coefficient 87 Cited References 89 Problems 89 Exercises 97 Projects 98 Chapter 3 Diagnostics and Remedial Measures 100 3.1 Diagnostics for Predictor Variable 100 3.2 Residuals 102 Properties of Residuals 102 Semistudentized Residuals 103 Departures from Model to Be Studied by Residuals 103 3.3 Diagnostics for Residuals 103 Nonlinearity of Regression Function 104 Nonconstancy of Error Variance 107 Presence of Outliers 108 Nonindependence of Error Terms 108 Nonnormality of Error Terms 110 Omission of Important Predictor Variables 112 Some Final Comments 114 3.4 Overview of Tests Involving Residuals 114 Tests for Randomness 114 Tests for Constancy of Variance 115 Tests for Outliers 115 Tests for Normality 115 3.5 Correlation Test for Normality 115 3.6 Tests for Constancy of Error Variance 116 Brown-Forsythe Test 116 Breusch-Pagan Test 118 3.7 F Test for Lack of Fit 119 Assumptions 119 Notation 121 Full Model 121 Reduced Model 123 Test Statistic 123 ANOVA Table 124 3.8 Overview of Remedial Measures 127 Nonlinearity of Regression Function 128 Nonconstancy of Error Variance 128 Nonindependence of Error Terms 128 Nonnormality of Error Terms 128 Omission of Important Predictor Variables 129 Outlying Observations 129 3.9 Transformations 129 Transformations for Nonlinear Relation Only 129 Transformations for Nonnormality and Unequal Error Variances 132 Box-Cox Transformations 134 3.10 Exploration of Shape of Regression Function 137 Lowess Method 138 Use of Smoothed Curves to Confirm Fitted Regression Function 139 3.11 Case Example—Plutonium Measurement 141 Cited References 146 Problems 146 Exercises 151 Projects 152 Case Studies 153 xii Contents Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 154 4.1 Joint Estimation of •0 and •1 154 Need for Joint Estimation 154 Bonferroni Joint Confidence Intervals 155 4.2 Simultaneous Estimation of Mean Responses 157 Working-Hotelling Procedure 158 Bonferroni Procedure 159 4.3 Simultaneous Prediction Intervals for New Observations 160 4.4 Regression through Origin 161 Model 161 Inferences 161 Important Cautions for Using Regression through Origin 164 4.5 Effects of Measurement Errors 165 Measurement Errors in Y 165 Measurement Errors in X 165 Berkson Model 167 4.6 Inverse Predictions 168 4.7 Choice of X Levels 170 Cited References 172 Problems 172 Exercises 175 Projects 175 Chapter 5 Matrix Approach to Simple Linear Regression Analysis 176 5.1 Matrices 176 Definition of Matrix 176 Square Matrix 178 Vector 178 Transpose 178 Equality of Matrices 179 5.2 Matrix Addition and Subtraction 180 5.3 Matrix Multiplication 182 Multiplication of a Matrix by a Scalar 182 Multiplication of a Matrix by a Matrix 182 5.4 Special Types of Matrices 185 Symmetric Matrix 185 Diagonal Matrix 185 Vector and Matrix with All Elements Unity 187 Zero Vector 187 5.5 Linear Dependence and Rank of Matrix 188 Linear Dependence 188 Rank of Matrix 188 5.6 Inverse of a Matrix 189 Finding the Inverse 190 Uses of Inverse Matrix 192 5.7 Some Basic Results for Matrices 193 5.8 Random Vectors and Matrices 193 Expectation of Random Vector or Matrix 193 Variance-Covariance Matrix of Random Vector 194 Some Basic Results 196 Multivariate Normal Distribution 196 5.9 Simple Linear Regression Model in Matrix Terms 197 5.10 Least Squares Estimation of Regression Parameters 199 Normal Equations 199 Estimated Regression Coefficients 200 5.11 Fitted Values and Residuals 202 Fitted Values 202 Residuals 203 5.12 Analysis of Variance Results 204 Sums of Squares 204 Sums of Squares as Quadratic Forms 205 5.13 Inferences in Regression Analysis 206 Regression Coefficients 207 Mean Response 208 Prediction of New Observation 209 Cited Reference 209 Problems 209 Exercises 212 PART TWO MULTIPLE LINEAR REGRESSION 213 Chapter 6 Multiple Regression I 214 6.1 Multiple Regression Models 214 Contents xiii Need for Several Predictor Variables 214 First-Order Model with Two Predictor Variables 215 First-Order Model with More than Two Predictor Variables 217 General Linear Regression Model 217 6.2 General Linear Regression Model in Matrix Terms 222 6.3 Estimation of Regression Coefficients 223 6.4 Fitted Values and Residuals 224 6.5 Analysis of Variance Results 225 Sums of Squares and Mean Squares 225 F Test for Regression Relation 226 Coefficient of Multiple Determination 226 Coefficient of Multiple Correlation 227 6.6 Inferences about Regression Parameters 227 Interval Estimation of •k 228 Tests for •k 228 Joint Inferences 228 6.7 Estimation of Mean Response and Prediction of New Observation 229 Interval Estimation of E{Yh} 229 Confidence Region for Regression Surface 229 Simultaneous Confidence Intervals for Several Mean Responses 230 Prediction of New Observation Yh(new) 230 Prediction of Mean of m New Observations at Xh 230 Predictions of g New Observations 231 Caution about Hidden Extrapolations 231 6.8 Diagnostics and Remedial Measures 232 Scatter Plot Matrix 232 Three-Dimensional Scatter Plots 233 Residual Plots 233 Correlation Test for Normality 234 Brown-Forsythe Test for Constancy of Error Variance 234 Breusch-Pagan Test for Constancy of Error Variance 234 F Test for Lack of Fit 235 Remedial Measures 236 6.9 An Example—Multiple Regression with Two Predictor Variables 236 Setting 236 Basic Calculations 237 Estimated Regression Function 240 Fitted Values and Residuals 241 Analysis of Appropriateness of Model 241 Analysis of Variance 243 Estimation of Regression Parameters 245 Estimation of Mean Response 245 Prediction Limits for New Observations 247 Cited Reference 248 Problems 248 Exercises 253 Projects 254 Chapter 7 Multiple Regression II 256 7.1 Extra Sums of Squares 256 Basic Ideas 256 Definitions 259 Decomposition of SSR into Extra Sums of Squares 260 ANOVA Table Containing Decomposition of SSR 261 7.2 Uses of Extra Sums of Squares in Tests for Regression Coefficients 263 Test whether a Single •k = 0 263 Test whether Several •k = 0 264 7.3 Summary of Tests Concerning Regression Coefficients 266 Test whether All •k = 0 266 Test whether a Single •k = 0 267 Test whether Some •k = 0 267 Other Tests 268 7.4 Coefficients of Partial Determination 268 Two Predictor Variables 269 General Case 269 Coefficients of Partial Correlation 270 7.5 Standardized Multiple Regression Model 271 Roundoff Errors in Normal Equations Calculations 271 Lack of Comparability in Regression Coefficients 272 Correlation Transformation 272 Standardized Regression Model 273 X_X Matrix for Transformed Variables 274 xiv Contents Estimated Standardized Regression Coefficients 275 7.6 Multicollinearity and Its Effects 278 Uncorrelated Predictor Variables 279 Nature of Problem when Predictor Variables Are Perfectly Correlated 281 Effects of Multicollinearity 283 Need for More Powerful Diagnostics for Multicollinearity 289 Cited Reference 289 Problems 289 Exercise 292 Projects 293 Chapter 8 Regression Models for Quantitative and Qualitative Predictors 294 8.1 Polynomial Regression Models 294 Uses of Polynomial Models 294 One Predictor Variable—Second Order 295 One Predictor Variable—Third Order 296 One Predictor Variable—Higher Orders 296 Two Predictor Variables—Second Order 297 Three Predictor Variables—Second Order 298 Implementation of Polynomial Regression Models 298 Case Example 300 Some Further Comments on Polynomial Regression 305 8.2 Interaction Regression Models 306 Interaction Effects 306 Interpretation of Interaction Regression Models with Linear Effects 306 Interpretation of Interaction Regression Models with Curvilinear Effects 309 Implementation of Interaction Regression Models 311 8.3 Qualitative Predictors 313 Qualitative Predictor with Two Classes 314 Interpretation of Regression Coefficients 315 Qualitative Predictor with More than Two Classes 318 Time Series Applications 319 8.4 Some Considerations in Using Indicator Variables 321 Indicator Variables versus Allocated Codes 321 Indicator Variables versus Quantitative Variables 322 Other Codings for Indicator Variables 323 8.5 Modeling Interactions between Quantitative and Qualitative Predictors 324 Meaning of Regression Coefficients 324 8.6 More Complex Models 327 More than One Qualitative Predictor Variable 328 Qualitative Predictor Variables Only 329 8.7 Comparison of Two or More Regression Functions 329 Soap Production Lines Example 330 Instrument Calibration Study Example 334 Cited Reference 335 Problems 335 Exercises 340 Projects 341 Case Study 342 Chapter 9 Building the Regression Model I: Model Selection and Validation 343 9.1 Overview of Model-Building Process 343 Data Collection 343 Data Preparation 346 Preliminary Model Investigation 346 Reduction of Explanatory Variables 347 Model Refinement and Selection 349 Model Validation 350 9.2 Surgical Unit Example 350 9.3 Criteria for Model Selection 353 R2 p or SSEp Criterion 354 R2 a,p or MSEp Criterion 355 Mallows’ Cp Criterion 357 AICp and SBCp Criteria 359 PRESSp Criterion 360 9.4 Automatic Search Procedures for Model Selection 361 “Best” Subsets Algorithm 361 Stepwise Regression Methods 364 Contents xv Forward Stepwise Regression 364 Other Stepwise Procedures 367 9.5 Some Final Comments on Automatic Model Selection Procedures 368 9.6 Model Validation 369 Collection of New Data to Check Model 370 Comparison with Theory, Empirical Evidence, or Simulation Results 371 Data Splitting 372 Cited References 375 Problems 376 Exercise 380 Projects 381 Case Studies 382 Chapter 10 Building the Regression Model II: Diagnostics 384 10.1 Model Adequacy for a Predictor Variable—Added-Variable Plots 384 10.2 Identifying Outlying Y Observations— Studentized Deleted Residuals 390 Outlying Cases 390 Residuals and Semistudentized Residuals 392 Hat Matrix 392 Studentized Residuals 394 Deleted Residuals 395 Studentized Deleted Residuals 396 10.3 Identifying Outlying X Observations—Hat Matrix Leverage Values 398 Use of Hat Matrix for Identifying Outlying X Observations 398 Use of Hat Matrix to Identify Hidden Extrapolation 400 10.4 Identifying Influential Cases—DFFITS, Cook’s Distance, and DFBETAS Measures 400 Influence on Single Fitted Value—DFFITS 401 Influence on All Fitted Values—Cook’s Distance 402 Influence on the Regression Coefficients—DFBETAS 404 Influence on Inferences 405 Some Final Comments 406 10.5 Multicollinearity Diagnostics—Variance Inflation Factor 406 Informal Diagnostics 407 Variance Inflation Factor 408 10.6 Surgical Unit Example—Continued 410 Cited References 414 Problems 414 Exercises 419 Projects 419 Case Studies 420 Chapter 11 Building the Regression Model III: Remedial Measures 421 11.1 Unequal Error Variances Remedial Measures—Weighted Least Squares 421 Error Variances Known 422 Error Variances Known up to Proportionality Constant 424 Error Variances Unknown 424 11.2 Multicollinearity Remedial Measures—Ridge Regression 431 Some Remedial Measures 431 Ridge Regression 432 11.3 Remedial Measures for Influential Cases—Robust Regression 437 Robust Regression 438 IRLS Robust Regression 439 11.4 Nonparametric Regression: Lowess Method and Regression Trees 449 Lowess Method 449 Regression Trees 453 11.5 Remedial Measures for Evaluating Precision in Nonstandard Situations—Bootstrapping 458 General Procedure 459 Bootstrap Sampling 459 Bootstrap Confidence Intervals 460 11.6 Case Example—MNDOT Traffic Estimation 464 The AADT Database 464 Model Development 465 Weighted Least Squares Estimation 468 xvi Contents Cited References 471 Problems 472 Exercises 476 Projects 476 Case Studies 480 Chapter 12 Autocorrelation in Time Series Data 481 12.1 Problems of Autocorrelation 481 12.2 First-Order Autoregressive Error Model 484 Simple Linear Regression 484 Multiple Regression 484 Properties of Error Terms 485 12.3 Durbin-Watson Test for Autocorrelation 487 12.4 Remedial Measures for Autocorrelation 490 Addition of Predictor Variables 490 Use of Transformed Variables 490 Cochrane-Orcutt Procedure 492 Hildreth-Lu Procedure 495 First Differences Procedure 496 Comparison of Three Methods 498 12.5 Forecasting with Autocorrelated Error Terms 499 Cited References 502 Problems 502 Exercises 507 Projects 508 Case Studies 508 PART THREE NONLINEAR REGRESSION 509 Chapter 13 Introduction to Nonlinear Regression and Neural Networks 510 13.1 Linear and Nonlinear Regression Models 510 Linear Regression Models 510 Nonlinear Regression Models 511 Estimation of Regression Parameters 514 13.2 Least Squares Estimation in Nonlinear Regression 515 Solution of Normal Equations 517 Direct Numerical Search—Gauss-Newton Method 518 Other Direct Search Procedures 525 13.3 Model Building and Diagnostics 526 13.4 Inferences about Nonlinear Regression Parameters 527 Estimate of Error Term Variance 527 Large-Sample Theory 528 When Is Large-Sample Theory Applicable? 528 Interval Estimation of a Single •k 531 Simultaneous Interval Estimation of Several •k 532 Test Concerning a Single •k 532 Test Concerning Several •k 533 13.5 Learning Curve Example 533 13.6 Introduction to Neural Network Modeling 537 Neural Network Model 537 Network Representation 540 Neural Network as Generalization of Linear Regression 541 Parameter Estimation: Penalized Least Squares 542 Example: Ischemic Heart Disease 543 Model Interpretation and Prediction 546 Some Final Comments on Neural Network Modeling 547 Cited References 547 Problems 548 Exercises 552 Projects 552 Case Studies 554 Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 555 14.1 Regression Models with Binary Response Variable 555 Meaning of Response Function when Outcome Variable Is Binary 556 Contents xvii Special Problems when Response Variable Is Binary 557 14.2 Sigmoidal Response Functions for Binary Responses 559 Probit Mean Response Function 559 Logistic Mean Response Function 560 Complementary Log-Log Response Function 562 14.3 Simple Logistic Regression 563 Simple Logistic Regression Model 563 Likelihood Function 564 Maximum Likelihood Estimation 564 Interpretation of b1 567 Use of Probit and Complementary Log-Log Response Functions 568 Repeat Observations—Binomial Outcomes 568 14.4 Multiple Logistic Regression 570 Multiple Logistic Regression Model 570 Fitting of Model 571 Polynomial Logistic Regression 575 14.5 Inferences about Regression Parameters 577 Test Concerning a Single •k: Wald Test 578 Interval Estimation of a Single •k 579 Test whether Several •k = 0: Likelihood Ratio Test 580 14.6 Automatic Model Selection Methods 582 Model Selection Criteria 582 Best Subsets Procedures 583 Stepwise Model Selection 583 14.7 Tests for Goodness of Fit 586 Pearson Chi-Square Goodness of Fit Test 586 Deviance Goodness of Fit Test 588 Hosmer-Lemeshow Goodness of Fit Test 589 14.8 Logistic Regression Diagnostics 591 Logistic Regression Residuals 591 Diagnostic Residual Plots 594 Detection of Influential Observations 598 14.9 Inferences about Mean Response 602 Point Estimator 602 Interval Estimation 602 Simultaneous Confidence Intervals for Several Mean Responses 603 14.10 Prediction of a New Observation 604 Choice of Prediction Rule 604 Validation of Prediction Error Rate 607 14.11 Polytomous Logistic Regression for Nominal Response 608 Pregnancy Duration Data with Polytomous Response 609 J - 1 Baseline-Category Logits for Nominal Response 610 Maximum Likelihood Estimation 612 14.12 Polytomous Logistic Regression for Ordinal Response 614 14.13 Poisson Regression 618 Poisson Distribution 618 Poisson Regression Model 619 Maximum Likelihood Estimation 620 Model Development 620 Inferences 621 14.14 Generalized Linear Models 623 Cited References 624 Problems 625 Exercises 634 Projects 635 Case Studies 640 Appendix A Some Basic Results in Probability and Statistics Appendix B Tables Appendix C Data Sets Appendix D Selected Bibliography Index