- Statistics - Discussion
- Z table
- Weak Law of Large Numbers
- Venn Diagram
- Variance
- Type I & II Error
- Trimmed Mean
- Transformations
- Ti 83 Exponential Regression
- T-Distribution Table
- Sum of Square
- Student T Test
- Stratified sampling
- Stem and Leaf Plot
- Statistics Notation
- Statistics Formulas
- Statistical Significance
- Standard normal table
- Standard Error ( SE )
- Standard Deviation
- Skewness
- Simple random sampling
- Signal to Noise Ratio
- Shannon Wiener Diversity Index
- Scatterplots
- Sampling methods
- Sample planning
- Root Mean Square
- Residual sum of squares
- Residual analysis
- Required Sample Size
- Reliability Coefficient
- Relative Standard Deviation
- Regression Intercept Confidence Interval
- Rayleigh Distribution
- Range Rule of Thumb
- Quartile Deviation
- Qualitative Data Vs Quantitative Data
- Quadratic Regression Equation
- Process Sigma
- Process Capability (Cp) & Process Performance (Pp)
- Probability Density Function
- Probability Bayes Theorem
- Probability Multiplecative Theorem
- Probability Additive Theorem
- Probability
- Power Calculator
- Pooled Variance (r)
- Poisson Distribution
- Pie Chart
- Permutation with Replacement
- Permutation
- Outlier Function
- One Proportion Z Test
- Odd and Even Permutation
- Normal Distribution
- Negative Binomial Distribution
- Multinomial Distribution
- Means Difference
- Mean Deviation
- Mcnemar Test
- Logistic Regression
- Log Gamma Distribution
- Linear regression
- Laplace Distribution
- Kurtosis
- Kolmogorov Smirnov Test
- Inverse Gamma Distribution
- Interval Estimation
- Individual Series Arithmetic Mode
- Individual Series Arithmetic Median
- Individual Series Arithmetic Mean
- Hypothesis testing
- Hypergeometric Distribution
- Histograms
- Harmonic Resonance Frequency
- Harmonic Number
- Harmonic Mean
- Gumbel Distribution
- Grand Mean
- Goodness of Fit
- Geometric Probability Distribution
- Geometric Mean
- Gamma Distribution
- Frequency Distribution
- Factorial
- F Test Table
- F distribution
- Exponential distribution
- Dot Plot
- Discrete Series Arithmetic Mode
- Discrete Series Arithmetic Median
- Discrete Series Arithmetic Mean
- Deciles Statistics
- Data Patterns
- Data collection - Case Study Method
- Data collection - Observation
- Data collection - Questionaire Designing
- Data collection
- Cumulative Poisson Distribution
- Cumulative plots
- Correlation Co-efficient
- Co-efficient of Variation
- Cumulative Frequency
- Continuous Series Arithmetic Mode
- Continuous Series Arithmetic Median
- Continuous Series Arithmetic Mean
- Continuous Uniform Distribution
- Comparing plots
- Combination with replacement
- Combination
- Cluster sampling
- Circular Permutation
- Chi Squared table
- Chi-squared Distribution
- Central limit theorem
- Boxplots
- Black-Scholes model
- Binomial Distribution
- Beta Distribution
- Best Point Estimation
- Bar Graph
- Arithmetic Range
- Arithmetic Mode
- Arithmetic Median
- Arithmetic Mean
- Analysis of Variance
- Adjusted R-Squared
- Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Statistics - Linear regression
Once the degree of relationship between variables has been estabpshed using co-relation analysis, it is natural to delve into the nature of relationship. Regression analysis helps in determining the cause and effect relationship between variables. It is possible to predict the value of other variables (called dependent variable) if the values of independent variables can be predicted using a graphical method or the algebraic method.
Graphical Method
It involves drawing a scatter diagram with independent variable on X-axis and dependent variable on Y-axis. After that a pne is drawn in such a manner that it passes through most of the distribution, with remaining points distributed almost evenly on either side of the pne.
A regression pne is known as the pne of best fit that summarizes the general movement of data. It shows the best mean values of one variable corresponding to mean values of the other. The regression pne is based on the criteria that it is a straight pne that minimizes the sum of squared deviations between the predicted and observed values of the dependent variable.
Algebraic Method
Algebraic method develops two regression equations of X on Y, and Y on X.
Regression equation of Y on X
${Y = a+bX}$
Where −
${Y}$ = Dependent variable
${X}$ = Independent variable
${a}$ = Constant showing Y-intercept
${b}$ = Constant showing slope of pne
Values of a and b is obtained by the following normal equations:
${sum Y = Na + bsum X \[7pt] sum XY = a sum X + b sum X^2 }$
Where −
${N}$ = Number of observations
Regression equation of X on Y
${X = a+bY}$
Where −
${X}$ = Dependent variable
${Y}$ = Independent variable
${a}$ = Constant showing Y-intercept
${b}$ = Constant showing slope of pne
Values of a and b is obtained by the following normal equations:
${sum X = Na + bsum Y \[7pt] sum XY = a sum Y + b sum Y^2 }$
Where −
${N}$ = Number of observations
Example
Problem Statement:
A researcher has found that there is a co-relation between the weight tendencies of father and son. He is now interested in developing regression equation on two variables from the given data:
Weight of father (in Kg) | 69 | 63 | 66 | 64 | 67 | 64 | 70 | 66 | 68 | 67 | 65 | 71 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Weight of Son (in Kg) | 70 | 65 | 68 | 65 | 69 | 66 | 68 | 65 | 71 | 67 | 64 | 72 |
Develop
Regression equation of Y on X.
Regression equation of on Y.
Solution:
${X}$ | ${X^2}$ | ${Y}$ | ${Y^2}$ | ${XY}$ |
---|---|---|---|---|
69 | 4761 | 70 | 4900 | 4830 |
63 | 3969 | 65 | 4225 | 4095 |
66 | 4356 | 68 | 4624 | 4488 |
64 | 4096 | 65 | 4225 | 4160 |
67 | 4489 | 69 | 4761 | 4623 |
64 | 4096 | 66 | 4356 | 4224 |
70 | 4900 | 68 | 4624 | 4760 |
66 | 4356 | 65 | 4225 | 4290 |
68 | 4624 | 71 | 5041 | 4828 |
67 | 4489 | 67 | 4489 | 4489 |
65 | 4225 | 64 | 4096 | 4160 |
71 | 5041 | 72 | 5184 | 5112 |
${sum X = 800}$ | ${sum X^2 = 53,402}$ | ${sum Y = 810}$ | ${sum Y^2 = 54,750}$ | ${sum XY = 54,059}$ |
Regression equation of Y on X
Y = a+bX
Where , a and b are obtained by normal equations
${sum Y = Na + bsum X \[7pt] sum XY = a sum X + b sum X^2 \[7pt] Where sum Y = 810, sum X = 800, sum X^2 = 53,402 \[7pt] , sum XY = 54, 049, N = 12 }$${Rightarrow}$ 810 = 12a + 800b ... (i)
${Rightarrow}$ 54049 = 800a + 53402 b ... (ii)
Multiplying equation (i) with 800 and equation (ii) with 12, we get:
96000 a + 640000 b = 648000 ... (iii)
96000 a + 640824 b = 648588 ... (iv)
Subtracting equation (iv) from (iii)
-824 b = -588
${Rightarrow}$ b = -.0713
Substituting the value of b in eq. (i)
810 = 12a + 800 (-0.713)
810 = 12a + 570.4
12a = 239.6
${Rightarrow}$ a = 19.96
Hence the equation Y on X can be written as
${Y = 19.96 - 0.713X}$Regression equation of X on Y
X = a+bY
Where , a and b are obtained by normal equations
${sum X = Na + bsum Y \[7pt] sum XY = a sum Y + b sum Y^2 \[7pt] Where sum Y = 810, sum Y^2 = 54,750 \[7pt] , sum XY = 54, 049, N = 12 }$${Rightarrow}$ 800 = 12a + 810a + 810b ... (V)
${Rightarrow}$ 54,049 = 810a + 54, 750 ... (vi)
Multiplying eq (v) by 810 and eq (vi) by 12, we get
9720 a + 656100 b = 648000 ... (vii)
9720 a + 65700 b = 648588 ... (viii)
Subtracting eq viii from eq vii
900b = -588
${Rightarrow}$ b = 0.653
Substituting the value of b in equation (v)
800 = 12a + 810 (0.653)
12a = 271.07
${Rightarrow}$ a = 22.58
Hence regression equation of X and Y is
${X = 22.58 + 0.653Y}$ Advertisements