Computer Assignment
Sociology 166-461B
Todd Ferguson (9833181)
Dr. Kara Joyner
Choice of Sample
I selected only respondents between the ages of 18 and 65, to control for the effects of both retirement for older respondents and after-school jobs for younger respondents. Since only one value in the sample was 18 and none were younger, this was a sampling only those under 65 in the regression analysis.
The Dependent Variable
I have selected Respondentās Income as the dependent variable to be examined. I am interested in exploring the relationship between income and various other social factors. In addition, Respondentās Income is a variable whose distribution is suitable for the purposes OLS regression analysis.
The Independent Variables
The independent variables I examined were age of respondent; highest degree completed (replacing highest year of school completed); sex of respondent; average number of hours spent watching TV each day; number of sex partners respondent had in the previous year; frequency of sex during the previous year; whether the respondent was married or not; and the number of children reported by the respondent. Relationships between some of the above variables and income may not be readily grasped. Age tends to correspond with an individualās prime working years, and should therefore cause a bell curve in the distribution of income. Education is believed to be positively correlated with age. I assumed that those watching more hours of television would spend less time, for example, improving upon their human capital, and income would be subsequently reduced. If a respondent had sex often and with many different partners, I assumed that this would effect the amount of time and energy left to earn an income (unless having sex
·2
with different partners was done to earn income!). I hoped to establish that people choose to marry after they assess their income to be high enough to support a family; the same rationale applies to the number of children reported.
Coding of Independent Variables
Several of the independent variables I selected needed to be collapsed into broader categories. The variable age was collapsed into ten five-year aggregates, from 18 to 64; the variable hours spent watching television had the top categories collapsed into an "7 hours or more" grouping; the variable number of children had 5, 6, 7, or 8 or more children collapsed into one category; and the variable number of sex partners had the upper limits collapsed into a category for 4 or more partners. I replaced years of schooling with the variable highest degree completed, as this variable corresponded to the groupings that the former needed to be collapsed into. The variables sex of respondent, frequency of sex and married did not require re-coding as the size in each category was sufficiently large.
Mean Values
The mean values for all variables I examined were as follows:
|
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
|
Respondent's Income |
994 |
1 |
22 |
12.80 |
5.62 |
|
Respondent's Sex |
1500 |
1 |
2 |
1.57 |
.49 |
|
RS Highest Degree |
1496 |
0 |
4 |
1.41 |
1.18 |
|
NEWPART |
1367 |
.00 |
4.00 |
.9993 |
.8396 |
|
NEWTV |
1489 |
.00 |
7.00 |
2.7676 |
1.7242 |
|
NEWKIDS |
1495 |
.00 |
5.00 |
1.7900 |
1.5126 |
|
NEWAGE |
1210 |
1.00 |
10.00 |
4.9479 |
2.3702 |
|
Married ? |
1499 |
1.00 |
2.00 |
1.4696 |
.4992 |
|
Frequency of Sex During Last Year |
1330 |
0 |
6 |
2.88 |
1.98 |
|
Valid N (listwise) |
849 |
The mean values for the best fit, that between income and highest degree completed (which correlated with a Pearsonās r of .353), was:
Respondent's Income
|
RS Highest Degree |
Mean |
N |
Std. Deviation |
|
Less than HS |
8.97 |
106 |
5.64 |
|
High school |
12.11 |
528 |
5.44 |
|
Junior college |
12.81 |
73 |
5.14 |
|
Bachelor |
14.75 |
191 |
4.85 |
|
Graduate |
16.92 |
95 |
4.43 |
|
Total |
12.80 |
993 |
5.62 |
Effects of Individual Variables
|
Variable |
Effect on Income |
Significance |
|
Age |
.596 |
0 |
|
Degree |
1.178 |
0 |
|
Sex |
-2.889 |
0 |
|
Hours of TV |
-.656 |
0 |
|
# of Sex Partners |
.232 |
.287 |
|
Frequency Of Sex |
0.130 |
0.242 |
|
Married? |
-.904 |
.018 |
|
# of Children |
-.128 |
.373 |
Here we can immediately see that the variables number of children, number of sex partners and frequency of sex have little significance with regards to income, as their p-values are much higher than even a 90% level of confidence.
Of the remaining variables, the strongest effects appear to be due to the sex of the respondent, whether or not the respondent was married, the highest education degree they had received, their age, and how much TV they watched. Therefore, these five variables will be combined in my final model.
FINAL MODEL
Descriptives
|
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
|
Respondent's Income |
994 |
1 |
22 |
12.80 |
5.62 |
|
Respondent's Sex |
1500 |
1 |
2 |
1.57 |
.49 |
|
RS Highest Degree |
1496 |
0 |
4 |
1.41 |
1.18 |
|
NEWTV |
1489 |
.00 |
7.00 |
2.7676 |
1.7242 |
|
NEWAGE |
1210 |
1.00 |
10.00 |
4.9479 |
2.3702 |
|
Married ? |
1499 |
1.00 |
2.00 |
1.4696 |
.4992 |
|
Valid N (listwise) |
940 |
Regression
Model Summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1 |
.521 |
.271 |
.267 |
4.73 |
a Predictors: (Constant), Respondent's Sex, NEWAGE, RS Highest Degree, Married ?, NEWTV
ANOVA
|
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
|
1 |
Regression |
7780.780 |
5 |
1556.156 |
69.483 |
.000 |
|
Residual |
20917.955 |
934 |
22.396 |
|||
|
Total |
28698.735 |
939 |
a Predictors: (Constant), Respondent's Sex, NEWAGE, RS Highest Degree, Married ?, NEWTV
b Dependent Variable: Respondent's Income
Coefficients
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
|||
|
Model |
B |
Std. Error |
Beta |
|||
|
1 |
(Constant) |
14.949 |
.882 |
16.946 |
.000 |
|
|
NEWTV |
-.680 |
.104 |
-.189 |
-6.524 |
.000 |
|
|
NEWAGE |
.542 |
.072 |
.215 |
7.525 |
.000 |
|
|
Married ? |
-.725 |
.319 |
-.065 |
-2.270 |
.023 |
|
|
RS Highest Degree |
1.263 |
.136 |
.270 |
9.312 |
.000 |
|
|
Respondent's Sex |
-2.783 |
.311 |
-.252 |
-8.950 |
.000 |
a Dependent Variable: Respondent's Income
From the above, we can see that these five variables explain 27.1% of the differences between incomes of respondents. Using the formula 1-R Square, we can also see that 72.9% of the differences between incomes are explained by other variables not accounted for in this model. Those variables could be anything, from type of education to regional employment rates. For example, the combined R Squares of the other independent variables account for 20.5% of the remaining differences. Still, an 27% causality for these five variables is somewhat convincing. Where x is zero, the predicted value of y would be 14.949, which translates to an annual income of approximately $25,000.