Q1. Read the provided raw data carefully to check whether all respondents have provided information for each variable. Explain what you have done to manage the missing data. Clearly indicate the final number of observations (respondents) you will use in the following analysis. Submit an electronic copy of the Excel spreadsheet of the final dataset together with your assignment. All your following analysis should be based on this final dataset. [10 marks]

Q2. Pick up two numerical variables and two categorical variables and then describe each of them one by one. Use appropriate tables/graphs and numerical measures to help you describe the distribution of the variables. [12 marks]

Q3. It’s often asked what factors relate to IQ score and KW score. Look through your data and first pick up one numerical variable that you think may relate to IQ score.

Explain why you pick up this variable. Then use an appropriate graph and an appropriate numerical measure to discuss the empirical relationship between IQ score and this numerical variable. Repeat the same exercise for the relationship between KW score and a numerical variable to which you think KW may relate.[16 marks]

Q4. You want to look at the relationship between gender and wages. However, you notice that gender is a categorical variable and wage is a numerical variable. One way to work on two different types of variables is to transform one variable to the type of the other. You decide to generate a categorical variable based on the level of wage, and this categorical variable has two values, “high” and “low”. For example, you choose a threshold value for wage, and if a respondent’s wage is no less than the threshold value, you enter “high” and enter “low” otherwise.

i. Describe in detail how you have decided the threshold value for generating the new categorical variable for the level of wage. Then use an appropriate graph to present this variable. (Hint: you may choose to use an appropriate numerical measure of wage as the threshold value). [6 marks]

ii. Present these two categorical variables together using an appropriate graph, and then discuss what the graph shows. [4 marks]

iii. Produce a contingency table to present these two categorical variables. Based on the contingency table, calculate the related (empirical) joint and marginal probabilities. You may find helpful to produce another contingency table to show your calculated probabilities. (Hint: you may need Excel skills -- e.g. use the commands such as “sort” or “countif”— to count the relevant frequencies, or use PivotTable function) [8 marks]

iv. Based on the sample information, calculate the probability of either being a female or getting a low wage level and calculate the probability of being a female conditional on getting a low wage level. [5 marks]

v. Examine whether the statement “Males tend to receive high wages than females” is true, false or inconclusive based on the sample information. Explain your response. [5 marks]

[Total marks: 28]

Q5. Suppose that the population average of the (monthly) wages of young employees in Tasmania in the previous year before this survey was conducted was $900.

i. Conduct a hypothesis test that the population average wage of young employees in Tasmania during the year of survey remains the same as in the previous year. [7 marks]

ii. Construct a 95% confidence estimate for the population average wage, and comment on whether the population average wage in the year of the survey remains the same as in the previous year. [5 marks]

[Total marks: 12]

Q6. You want to use the collected data to study what is the most important factor that affects young employees' wage in Tasmania. Use simple regression analysis to answer the following questions. (For each regression you run, show the Excel regression output and report the regression equation. Partial marks from the following questions assign to your regression results.)

i. Do the years of education have a significant impact on wages? (You need to explain the choice of the null and the alternative hypotheses.) [6 marks]

ii. Do the IQ scores have a significant impact on wages? (You need to explain the choice of the null and the alternative hypotheses.) [6 marks]

iii. Which of the two variables is a better predictor for the wage, years of education or IQ scores? Explain why. [3 marks]

iv. Do the years of work experience have significant impact on the wages? (You need to explain the choice of the null and the alternative hypotheses.) [6 marks]

v. Do the KW scores have significant impact on the wages? (You need to explain the choice of the null and the alternative hypotheses.) [6 marks]

vi. Which of the two variables is a better predictor for the wages, years of work experience or KW scores? Explain why. [3 marks]

vii. Newspapers often criticize a weak link between wage and education comparing with the link between wage and work experience. Discuss if the criticism is consistent with our data. [2 marks]

[Total marks: 32]