12
posted ago by PLcoder ago by PLcoder +12 / -0

Greetings! My first post!

I'm a coder and I'd like to look for more clues in the SCTYL data that Edward Solomon has, but I'm trying to find the original format.

It seems it was in over a hundred different files or json objects, but I can't figure out where to get a copy of that.

Any tips or suggestions would be greatly appreciated!

Comments (10)
sorted by:
You're viewing a single comment thread. View all comments, or full comment thread.
2
PLcoder [S] 2 points ago +2 / -0

Thanks! I did however already get the NYT data. But it's not very good because it doesn't actually list vote counts in the timeseries, just total votes and %share to the nearest tenth of a percent.

My understanding is that Edison Research got their data from SCTYL (a Spanish company....? Why are our votes being counted by a Spanish company? Nothing against Spain, but you'd think we should run our own elections.)

As to the NYT data:

For example, if you look in data -> races -> 0 -> timeseries -> 714 -> voteshares

You see the following: Total votes: 6917583 bidenj: 0.5 (which would be 50.0%) trumpd: 0.488 (which would be 48.8%)

The difficulty is that the percentage is only shown to the nearest tenth of a percent. So you can "calculate" the votes, but not exactly. 48.75% to 48.8499999999999% would all round to 48.8%. That's a 0.01% range of ambiguity, which translates to 692 votes if the total vote count is 6917583.

So obviously a big change in votes that is greater than a tenth of a percent is obviously a problem, it becomes harder to figure out what's going on with changes that are less than a tenth of a percent.

There is a field earlier in the json as follows:

data -> races -> 0 -> candidates -> 0,1,2 or 3 -> votes

and these are for bidenj, trumpd, jorgensenj, and "write-ins."

They show actual vote counts, but they are not a timeseries, just the final "result."

I'm under the impression that Edward Solomon did get a hold of the SCTYL data, and that it actually had time series integer vote counts in it. Or perhaps he just downloaded the NYT data 160 times over the period of several days, and constructed the CSV file from the data->races->0->candidates->*->votes fields. (But that doesn't make sense, because the vote counts change when the file and time stamp didn't change in his CSV.)

I did download the CSV file so generously shared, but I'm confused about exactly how the data was processed to be put into the CSV file.

Now don't get me wrong, the NYT data shows huge discrepancies too. It's just harder to look for some specific patterns relating to exact ratios because, well, the vote shares are rounded off to the nearest few hundred votes during the end.