125
Comments (10)
sorted by:
8
deleted 8 points ago +8 / -0
8
spacewave 8 points ago +8 / -0

As a web developer, I'd say there's a potential plausible explanation for oscillating counts like this. If you have a few different caching servers that the election sites are frequently hitting to fetch data from, and somehow hit an old count, depending on how the servers are setup there's a potential to temporarily show old data until the databases get consistent. In order to prove this you'd have to examine the frontend and backend code. I'd put odds at this being the explanation at extremely small, but there's a possibility that something like this or other technical issues could be the reason.

4
deleted 4 points ago +4 / -0
3
Orbital_Meme_Command [S] 3 points ago +3 / -0

When you say “extremely small” are you talking about the caching issue alternate causality? In a normal world I’d give your explanation dramatically higher probability...

Always great to get other ideas/explanations.

When you think about implementation of what NYT is showing (client side is still accessible), is there any way to get a sense of how they might have generated the graph? That might have clues if graph generation happens by, client side JS. I’m not a web developer. What do you think?

2
spacewave 2 points ago +2 / -0

I have no real knowledge of how media outlets get their data from states, what technology is involved and how this industry works, what the server setup is, etc.

All I can really contribute is this. Here is the page that graph appears on. https://www.nytimes.com/interactive/2020/11/03/us/elections/results-virginia-president.html

The source cited on the website is edison research which appears to be this company. https://www.edisonresearch.com/ I have no idea where they get their data, where and how it's transferred to the NYT, etc.

This appears to be the API endpoint for NYT where that page populates the data from. https://static01.nyt.com/elections-assets/2020/data/api/2020-11-03/race-page/virginia/president.json

You can look at the timeseries object in the API output to examine the data. Look at Timestamp 2020-11-04T05:07:23Z Biden had a 52.4% to 46% lead with 3,572,807 votes in, and an estimated total of 76% of the votes processed (I assume that's what eevp stands for)

**The next timestamp at 2020-11-04T05:12:38Z shows Biden at 48.2% and Trump at 50.2% with 3,199,165 votes in (fewer than the last time stamp) at an estimated 76% of the votes processed. (fewer votes)

The next timestamp at 2020-11-04T05:26:21Z shows 3,390,813 votes (also less than the 2020-11-04T05:07:23Z timestamp) in and the estimated votes at 76%**

The next timestamp at 2020-11-04T05:26:48Z has the estimated vote count go to 80% with 3,782,386 votes.

The bolded data in the middle is odd, why did the NYT API give out old data and showing the estimated vote total stay at 76%? Was it New York Times or Edison that might have screwed up? There's more than 1 explanation, here's a few possibilities.

  1. It was some attempt at rigging votes, and somehow they screwed up the process. It's impossible to know if it was Edison, NYT, or the state that did this. A mathematician diving into the timestamps here would probably be able to better tell if the numbers make sense.
  2. Some data entry somewhere in the process was wrong and some intern entered in old data by accident in their election reporting system.
  3. Software bugs. Running a big live event like this is really hard. You can run tests and hope everything is working, but you're under a lot of pressure with I assume a lot of moving parts while your website is under extreme traffic load.
  4. Somewhere in the process from state to edison to NYT, some caching server served up old data somehow. I don't know election technology setups at all so it's tough to comment further.
2
Orbital_Meme_Command [S] 2 points ago +2 / -0

Thank you; this is a very detailed and useful analysis, and your sorting out the JSON file is very helpful. I will see if I can come up with anything from this. It's fantastic that NYT has gifted us with this timestamped record, even if we can't know everything about where the values came from.

For a lot of this I'm thinking "out loud" I guess.... since this is a long post, don't feel like you have to read it (although you are certainly welcome to).

eevp_source fields are provided, and all are listed as "edison"; I notice that an Edison polls page is provided next to an AP polls page. I wonder if either AP or Edison could be used for the eevp data (i.e., AP provides redundancy should Edison ever go down). As you suggested, other than eevp it is not clear where data comes from. If vote values are from another source, they wouldn't necessarily update in tandem (I realize you're probably already aware of this possibility). So that's another explanation for the eevp could stay at 76 while the vote values change. I guess that makes sense, maybe the eevp source is monotonic in nature (of the data is sanitized and only larger number trigger updates) because if there was a risk that the x position could decrease the graph would look super sketchy (since it is a scaled absolute whereas the y-axis is a ratio where decreases are OK).

I agree; the decrease in total votes you've identified is strange/interesting.

I looked a little bit more and here's what I've noticed. Check out time stamps 2020-11-04T04:42:34Z and 2020-11-04T05:12:38Z. The first of these time stamps shows a 373642 vote increase in the total. The second shows a 373642 vote decrease (the Biden/Trump ratio for this chunk of votes is ca. 8.5 on both the increase and the decrease). Now see 2020-11-04T05:26:48Z and 2020-11-04T05:30:56Z for a 391573 vote increase and decrease (Biden/Trump ration of these chunks is ca. 3.84). Finally, see 2020-11-04T07:17:06Z for a 395771 vote increase to the total (Biden/Trump ratio of this chunk is ca. 3.84). It looks like this is the same reporting entity withdrawing and then re-adding its results.... EXCEPT that none of the precincts is larger than 200,000 (combined Fairfax Co absentee is ca. 200,000). It actually looks like this ca. 400,000 vote block (which appears and disappears) is all Fairfax Co precincts combined. Furthermore, it looks like they maybe decided that an almost 9:1 ratio was a little too absurd, and they decided to make less extremely pro-Biden (a 4:1 ratio instead). It's almost as though they use the stop-points of 76 and 80 eevp to make adjustments. It's also weird that every time they add the Fairfax chunk back in they do another eevp shift (the Fairfax chunk is about 9% of the total end vote, yet this pattern of removal and reinsertion means that it was used to shift eevp by 15 units. What would be the motive behind doing that?

I also checked for other decreases in total votes. There are two: timestamp 2020-11-06T17:24:48Z (a 7732 vote decrease almost exclusively of Trump ballots) and timestamp 2020-11-05T16:24:41Z (a decrease of 16066 in the total vote count, about 60% of which are trump votes). The sum of these two decrease increments is 23,798.

Another weird detail: the last total vote count value reported by NYT is 4422764, but the VA state website says 4399221. That's a difference of 23,543. This number is quite close to the last number in the preceding paragraph, but maybe this is just a coincidence.

4
troferar 4 points ago +4 / -0

Send it to the campaign.

3
Orbital_Meme_Command [S] 3 points ago +3 / -0

Where/how exactly?

4
Krig2 4 points ago +4 / -0

Fraud line. Check Eric or DTjr twitter account

3
deleted 3 points ago +3 / -0