The problem is that they fitted a curve, then used the curve on their training data, and found a correlation of 0.997. There was no attempt at generalization.
Can you please expland on what you're saying a bit.
I read all the documents and understand the allegations, but
something seems "to good to be true about it", because I don't understand
the modeling they used.
It sort of seems like they created their "key" from the data they were analyzing and then just ran it "backwards" in a sense and it low-and-behold it matched up...as in why wouldn't it????
Is that what you are basically saying?
They took data like 1,2 and 3 (and added them up) 1+2+3=6 and
then they went 6-1-2-3=0 and were like Holy shit-- look it matches up!
(I'm just making a bad analogy.)
Can you really explain what you see as a potential issue here???
Somethings feels off about it -- but I'm not wise enough to know what (if anything!).
Yes you are on the right track. Here's the methodology they used:
First they plotted registered voter turnout (number of actual votes / number of registered voters) against age.
Then, they fitted a 6th-order polynomial to this plot. This function lets you predict voter turnout at each age.
Then they multiplied these predictions by the number of registered voters at each age. This gives them the predicted number of votes at each age.
Finally they compared the predicted number of votes at each age against the number of actual votes at each age. This give them a correlation of 0.997.
The issue is they are just making predictions on the data they trained on to begin with, so of course the fit is near perfect.
What is interesting is that they trained on data aggregate across countries but the correlation is very high when making single county predictions. This seems to speak to correlations between counties.
The problem is that they fitted a curve, then used the curve on their training data, and found a correlation of 0.997. There was no attempt at generalization.
Can you please expland on what you're saying a bit.
I read all the documents and understand the allegations, but something seems "to good to be true about it", because I don't understand the modeling they used.
It sort of seems like they created their "key" from the data they were analyzing and then just ran it "backwards" in a sense and it low-and-behold it matched up...as in why wouldn't it????
Is that what you are basically saying?
They took data like 1,2 and 3 (and added them up) 1+2+3=6 and then they went 6-1-2-3=0 and were like Holy shit-- look it matches up!
(I'm just making a bad analogy.)
Can you really explain what you see as a potential issue here???
Somethings feels off about it -- but I'm not wise enough to know what (if anything!).
Yes you are on the right track. Here's the methodology they used:
First they plotted registered voter turnout (number of actual votes / number of registered voters) against age.
Then, they fitted a 6th-order polynomial to this plot. This function lets you predict voter turnout at each age.
Then they multiplied these predictions by the number of registered voters at each age. This gives them the predicted number of votes at each age.
Finally they compared the predicted number of votes at each age against the number of actual votes at each age. This give them a correlation of 0.997.
The issue is they are just making predictions on the data they trained on to begin with, so of course the fit is near perfect.
What is interesting is that they trained on data aggregate across countries but the correlation is very high when making single county predictions. This seems to speak to correlations between counties.
would a good test be to leave out a county or two, then test the model on the two counties left out?