I just did a search across 3.6 million PA residents, grouping by first name, mi, last name, and dob and got 30K pairs. They seem to have different addresses. This dataset (unlike ones I used to work with) is supposed to be heavily processed to remove duplicates.
Only .8% but a big number because the denominator is so large. Dunno the cause, could still be bad data I suppose.
I just did a search across 3.6 million PA residents, grouping by first name, mi, last name, and dob and got 30K pairs. They seem to have different addresses. This dataset (unlike ones I used to work with) is supposed to be heavily processed to remove duplicates.
Only .8% but a big number because the denominator is so large. Dunno the cause, could still be bad data I suppose.