In this assignment, youll compute

ISTA 311 Programming assignment 3: Information Theory Due: Sunday, October 21, 11:59 PM Submission instructions: Write a Python script to the following specification. Description In this assignment, youll compute entropy and mutual information for a number of categorical variables and show how it relates to conditional probability. Instructions Download the data file and the Python script from D2L. Recall that we saw that the party affiliation and vote on the immigration bill were nearly independent. For this assignment, identify three pairs of variables (X; Y ) in the data set that are strongly associated (which we will classify as a mutual information of at least 0.5) and three pairs that are nearly independent (mutual information less than 0.1). Once you have identified these, calculate the conditional probability distributions P(X = xjY = y). (Hint: use the pandas function crosstab to create contingency tables from the data frame). You may ignore ? votes for this purpose. For example, for the immigration-party affiliation example, we would calculate: P(party = republicanjimmigration = y) P(party = democratjimmigration = n) P(party = republicanjimmigration = y) P(party = democratjimmigration = n) Then, examine these conditional probabilities. Do you notice a difference between the conditional probability distributions when the mutual information is high, versus when it is low? Your submission should include: A Python script that performs the above calculations for the six pairs of variables you identified In a comment in your script or a separate document, a brief summary of the results, containing: { The names of the variables { The mutual information and conditional probabilities you calculated { A brief answer to the following question: What difference do you notice about the conditional probabilities for the pairs with high mutual information, versus those with low mutual information? 0.1 The script The script contains several functions and data structures that will help. When you import votes, the following variables will be initialized: votes.votedf: a pandas data frame containing 17 variables (party affiliation and 16 votes). Party affiliation has values democrat or republican; votes have values y, n, ?. names is a list of strings which are the names of the variables in votes.votedf The following functions will be defined: votes.mi(name1, name2, df = votedf) computes the mutual information of variables name1 and name2 in the data frame df (which for this assignment you can leave as the default votedf mi with(name, df = votedf) plots a bar chart of the mutual information between name and all other variables in the data frame df. This is useful for identifying the pairs of variables with high or low mutual information. This function requires matplotlib. If you are getting errors, it might be that you have an older version of matplotlib; try updating it. Requirements For this assignment, you will need to install pandas if you do not already have it.

Pssst…We can write an original essay just for you.

Any essay type. Any subject. We will even overcome a 6 hour deadline.

<< SAVE15 >>

Place your first order with code to get 15% discount right away!

Impressive sample results