I'd love it if someone could take a look and give me a second opinion, but I think this whole thing is flawed. I tried to plot the data myself, but the problem is - he's getting his numbers for each batch of votes based on the percentage of the overall vote. Which is fine at first, but once the overall vote gets larger, the percentage barely changes, since it's rounded to only 1 decimal place.
So all the mail-in ballot batches that he assumed were the same because they were "mixed together like cards" is really just because of the way the data is formatted.
But like I've said elsewhere in this thread I'm running on empty so I could be off the mark on my analysis too.
This guy did a thread on why the NYT "Edison" scraped dataset is no good. If you are interested, please read it. We may be barking up the wrong tree here.
Yes! Glad I'm not going crazy. The tl;dr from his tweets:
This dataset doesn't even have the number of votes actually cast. It's
total votes cast
percentage dem
percentage rep
lots of room for error here. Once we get into the millions, we are going to miss >thousands of votes and attribute them incorrectly due to rounding. So we have >to be super careful about what conclusions we can reach using this data.
You can find my twitter if you want, @CosmoDiGirolamo. I helped share this original time series analysis. If it's incorrect, I want to help share the correction. I don't want pedes wasting time in a flawed dataset.
Well there goes my free time this week
I'd love it if someone could take a look and give me a second opinion, but I think this whole thing is flawed. I tried to plot the data myself, but the problem is - he's getting his numbers for each batch of votes based on the percentage of the overall vote. Which is fine at first, but once the overall vote gets larger, the percentage barely changes, since it's rounded to only 1 decimal place.
So all the mail-in ballot batches that he assumed were the same because they were "mixed together like cards" is really just because of the way the data is formatted.
But like I've said elsewhere in this thread I'm running on empty so I could be off the mark on my analysis too.
This guy did a thread on why the NYT "Edison" scraped dataset is no good. If you are interested, please read it. We may be barking up the wrong tree here.
https://twitter.com/hyonschu/status/1325627295181103104
If you are interested in checking out what is a bigger / better scraped dataset, check this one out from Decision Desk HQ.
https://gofile.io/d/eUNz6r
Yes! Glad I'm not going crazy. The tl;dr from his tweets:
If you are interested in checking out what is a bigger / better scraped dataset, check this one out from Decision Desk HQ.
https://gofile.io/d/eUNz6r
βHello, fellow millipedes! Trust my true and honest advising!β
Why are u spamming this bot
You can find my twitter if you want, @CosmoDiGirolamo. I helped share this original time series analysis. If it's incorrect, I want to help share the correction. I don't want pedes wasting time in a flawed dataset.