|
|||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||
| Contingency Table
In the previous section, you have learn how to perform computation of descriptive statistics and frequency analysis from our questionnaire data. In this section, you will learn how to analyze relationship between several variables in the questionnaire data. In particular, cross tabulation (pivot table) and congtingency table (chi-square independent test).
Using the data that we have formatted, we want to answer the next research questions: 4. Is the existence of children playground in the park may gain higher visitors satisfaction ? 5. Is there any relationship between people activity and their activity time of the family in the park? 6. Is there any relationship between activity time and their mode to go to park?
Because those research problems are about relationship between two variables, we need to do what is called Cross Tabulation. Cross Tabulation (or CrossTab for short) is a frequency table between two or more variables. For readability, it is normally involve less than 4 variables. Cross Tabulation table has many name for different people. Actually they refer to the same thing. Some statistician called it Contingency Table while MS excel call it Pivot Table .
In Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .
When you click the toolbar or menu, Pivot Table wizard will pop up, click Next
In the step 2 of the wizard, you highlight the data including the label of the data in the top as shown in the following figure
In step 3 of the Pivot Table Wizard, select Layout button
To answer the relationship between variable Playground and Satisfaction , drag and drop the name of the variables on the right into the diagram. Put Satisfaction button in the row and Playground button in the column and make another drop to put Satisfaction once again to the Data . It will appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button twice.
When you go back to the Step 3 of Pivot table wizard, click Finish button.
MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and use Menu Edit Copy (CTRL-C).
Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK button
After we reformat, we need to count the Independent values of the table. We need to do so because we want to know whether variable Playground has relationship with variable Satisfaction , or not. We will perform some simple test, called Chi-square test . If the result of the chi-square test shows that, variable Playground is independent from variable Satisfaction, and then we cannot conclude any relation between the two variables. Otherwise, we can conclude the relationship. Sound easy, isn't it?
To get the independent values, we to compute
Don't bother with the formula if you don't understand notation
Example, for Satisfaction = 1 and Playground = 1, we have data of 2 respondents. The total of rows is 7 and total of columns is 4, the total of all data is 12. Then, the independent value for that cell is 7*4/12 = 2.333. Do the same thing for all cells in the table we get the table of independent values. This table has meaning that if variable Playground is 100% independent from variable Satisfaction , then the contents of the cells must be equal to these values.
To make sure that variable Playground has relationship (not independent) from variable Satisfaction , we need to put the degree of independent as small as possible, say less than 5%. The number of 5% is called the error or mistake that may happen by chance (who can avoid that we get it out of luck). Some people called it significant level . The problem now is how to get the index to indicate degree of independent? The mathematician are very smart, they invented probability to represent that “degree of independent”. To determine that probability, we need to compute the difference between the observe values (from the Pivot table) and the expected values (from the Independent table), square this difference, and then divide with the expected value (to get the same unit back) and we sum for all entries of the table. In short,
To get the probability, we need to compute degree of freedom (df), that is
The probability can be obtained using MS Excel function = CHIDIST( If probability is lower than 0.05, the two variables have relationship, otherwise we cannot conclude any relationship between the two variables in the contingency table. As shown in the figure above, the value of probability is 0.048, thus we may conclude that there is a relationship between existences of playground with satisfaction level of parks' visitors. This answers the research question of “Is the existence of children playground in the park may gain higher visitors satisfaction ?” positively. Note that if follow old statistical book, it will ask you to compare the Chi-square value (
Another note that MS excel also provide function =CHITEST (actual-range, expected range) to simplify the computation of probability. In this case, you don't need to create the third table to compute the value of Chi-square. The result is directly probability that you can compare with the value of significant level (i.e. 0.05). For the next research questions, we can do the same steps as above in the new worksheet. To get the Pivot table of Activity and activity time , drag and drop the time button in the column and all activities (1 to 6) button in the data . Let it be in the sum field because our activity data is binary.
Following the same steps as above, we gain the probability that variable Activity is independent from Time is 46.2%. Since it is larger than 5% as required in the Chi-square test, we conclude that there is no relationship between people activity and their activity time of the family in the park. This answers the research questions of “Is there any relationship between people activity and their activity time of the family in the park?”
For the last research question, we use variable Time in the Column of Pivot table and count of Mode in the data diagram. The results is shown in the figure below.
Since the chi-square probability is larger than 0.05, we conclude that there is no relationship between mode to go to parks with the activity time in the park. This answers negatively to the sixth research questions of “Is there any relationship between activity time and their mode to go to park?” You may download the spreadsheet example of this tutorial here. You have seen simple data analysis using MS excel Data Analysis – Descriptive Statistics and Pivot Table can be used to analyze your data from questionnaire survey. In the next section you will learn different method to analyze your data using conditional probability, and Bayes Theorem. Send your comments, questions and suggestions
Preferable reference for this tutorial is Teknomo, Kardi. Data Analysis from Questionnaires. http:\\people.revoledu.com\kardi\ tutorial\Questionnaire\
|
||||||||||||||
|
|||||||||||||||
|
|||||||||||||||