Mistakes are made
First, I think I've been using the wrong NELS data for the past however many months. More than I care to think about. I just ordered the correct data. Maybe it doesn't matter. I just ordered a later version than I should have. I imagine the var names are the same. And in the end, I am always using the wrong data until I am in Madison where I can use the restricted data. I was just spazzing out over missings, etc., and I might not have to do that so much with the almost right data. Whatever. I am putting that on hold for now. So I decided to work on my other data set...
Um, why is it that when I compare a distribution of information printed in a book, it is not the same as the information I get from the data myself (that the authors in fact collected and used themselves)? Did that make sense? What I mean is this:
Their distribution:
Category 1: 2507 cases
Category 2: 1426
Category 3: 1329
missings: 0
My distribution using their data:
Category 1: 2420 cases
Category 2: 1387
Category 3: 1453
missings: 2
What? That seems like kind of a whoop to me. I am trying to sleuth it out of the rest of their book (and everyone else's work using this data). Weird! I feel like I have just detected some big scam, but maybe I am overly dramatic. Maybe. Though now I should watch my back just in case I'm getting too close to the truth...
2 Comments:
My guess is that something else was done to the variable that you think is the variable being reported as the distribution.
Well, I am definitely jumping on your bandwagon about full disclosure of all data/programs. At first, I wasn't sure, but that's b/c my programs are so embarrassingly bad (or, they work, but a monkey could probably do better), that I didn't want anyone to ever see them. But now I'm right with you, b/c of this weird discrepancy (and I can't imagine what other info they used to get the distribution they got, and, of course, they don't say anything about it or missing information at all in their book). Grrr.
Post a Comment
<< Home