12
posted ago by PLcoder ago by PLcoder +12 / -0

Greetings! My first post!

I'm a coder and I'd like to look for more clues in the SCTYL data that Edward Solomon has, but I'm trying to find the original format.

It seems it was in over a hundred different files or json objects, but I can't figure out where to get a copy of that.

Any tips or suggestions would be greatly appreciated!

Comments (10)
sorted by:
2
2
PLcoder [S] 2 points ago +2 / -0

Thanks! I did however already get the NYT data. But it's not very good because it doesn't actually list vote counts in the timeseries, just total votes and %share to the nearest tenth of a percent.

My understanding is that Edison Research got their data from SCTYL (a Spanish company....? Why are our votes being counted by a Spanish company? Nothing against Spain, but you'd think we should run our own elections.)

As to the NYT data:

For example, if you look in data -> races -> 0 -> timeseries -> 714 -> voteshares

You see the following: Total votes: 6917583 bidenj: 0.5 (which would be 50.0%) trumpd: 0.488 (which would be 48.8%)

The difficulty is that the percentage is only shown to the nearest tenth of a percent. So you can "calculate" the votes, but not exactly. 48.75% to 48.8499999999999% would all round to 48.8%. That's a 0.01% range of ambiguity, which translates to 692 votes if the total vote count is 6917583.

So obviously a big change in votes that is greater than a tenth of a percent is obviously a problem, it becomes harder to figure out what's going on with changes that are less than a tenth of a percent.

There is a field earlier in the json as follows:

data -> races -> 0 -> candidates -> 0,1,2 or 3 -> votes

and these are for bidenj, trumpd, jorgensenj, and "write-ins."

They show actual vote counts, but they are not a timeseries, just the final "result."

I'm under the impression that Edward Solomon did get a hold of the SCTYL data, and that it actually had time series integer vote counts in it. Or perhaps he just downloaded the NYT data 160 times over the period of several days, and constructed the CSV file from the data->races->0->candidates->*->votes fields. (But that doesn't make sense, because the vote counts change when the file and time stamp didn't change in his CSV.)

I did download the CSV file so generously shared, but I'm confused about exactly how the data was processed to be put into the CSV file.

Now don't get me wrong, the NYT data shows huge discrepancies too. It's just harder to look for some specific patterns relating to exact ratios because, well, the vote shares are rounded off to the nearest few hundred votes during the end.

1
deleted 1 point ago +1 / -0
1
deleted 1 point ago +1 / -0
1
PLcoder [S] 1 point ago +1 / -0

Thank you! Much appreciated!

I have all of the NYT timeseries data for all of the states, but it's based on that "percentage of total votes" model, which makes it hard to look for exact ratios and stuff.

I also have Edward Solomon's CSV which appears to be made from 175 separate files (but I'm not sure) and it seems to actually have exact vote count numbers. But some things in that file are confusing to me.

In writing this, I did examine it more carefully and noticed some patterns to the structure of the data.

There are 175 timestamps. Almost as if it was created from 175 updates. This most be what Edward calls a "Global update."

Each and every update has exactly 4 entries for each precinct.

But why? I don't know. It could be they had 4 ballot scanners at each precinct, but that doesn't make sense because the ballot scanners sent stuff to the main image cast server (? guessing?) or the adjudication server(?) regardless of which scanner it came from..

But there is a column called Locality and any number of precincts can live at any given locality.

But I don't know if that means the locality is where the voter lives, or where the ballot was counted.

There is a column in the CSV file which has a unique number for each entry in a given update, but I don't know if that was just a line number in the update or if it relates to the other data. It's only one not labeled in the CSV file.

So I guess my questions are "Why are there four entries per precinct per global update?" and other questions like "What is the meaning of life?" ha ha!

1
deleted 1 point ago +1 / -0
2
PLcoder [S] 2 points ago +2 / -0

Thanks! I've actually been watching quite a bit of Edward's videos - but it'd be a lie to say I was watching all of his daily 12 hour live streams :D (Gotta work and sleep too.)

I did reply to one of Edwards comments on this website, and also in his livestream chat and as comments to his videos, but I'm sure it's exceedingly busy.

That's why I signed up and asked here :D

1
TheCandorist 1 point ago +1 / -0

Ask the engineers on the front page. They will tell you where to pull it down.

1
PLcoder [S] 1 point ago +1 / -0

Thanks! How do I do that? I'm new to this amazing site :D I only saw one option for posting, and I didn't see an option to direct it to engineers or to the front page. Thanks!

1
TheCandorist 1 point ago +1 / -0

There is a thread stickied from one of the people in the data integrity group. Or there was yesterday.