Skip to content
Advertisement

Find Duplicates from Arrays with Substring

I have 2 arrays that have different ways of writing the same address. For instance, Ex:

let array1 = [
    '12345 Baker Street Lexington, KY 12345',
   '20385 Money Road New York, NY 12035'
];

let array2 = [
    '12345 Baker St. Lexington, Kentucky 12345',
    '96969 Smithfield Parkway. Boise, Idaho 56845'
];

Because the way the addresses are structured, I figured I could get the substring of each item in the array and then filter it, but I’m running into an issue where it doesn’t seem to be storing the matches when it should find about 100 matching addresses for the first 12 chars.

for (let i = 0; i < array1.length; i++) {
        let array1 = array1[i];
        let arr1Substring = array1.substring(0, 12);
        console.log(arr1Substring);

        let intersection = array1.filter(arr1Substring => array2.includes(arr1Substring));
        console.log(intersection);
    };

Advertisement

Answer

Fixing the original code

Names should help you write code, not fight you. Let’s try your example, using better names:

let addresses1 = [
  '12345 Baker Street Lexington, KY 12345',
  '20385 Money Road New York, NY 12035'
];

let addresses2 = [
  '12345 Baker St. Lexington, Kentucky 12345',
  '96969 Smithfield Parkway. Boise, Idaho 56845'
];

for (let i = 0; i < addresses1.length; i++) {
  let address = addresses1[i];
  const first12LettersOfAddress = address.substring(0, 12);
  console.log(first12LettersOfAddress);

  const commonAddresses = addresses1.filter(address => addresses2.includes(address));
  console.log(intersections);
};

I’ve changed the names here to help clarify. You should stop using the same name for multiple variables, as once you redeclare the variable, you’ll no longer be able to access the original.

A better approach – Geocoding

That being said, you should use a different approach to fix this. If you continue trying to compare tidbits of strings, you’ll probably run into issues. For example, “123 Stack Ave” and “123 Stack Avenue” might not show up as duplicates when in fact they are. You should geocode every address to make sure they are formatted the same, and compare the results.

You can do this using the Mapbox Geocoding API or the Google Geocoding API.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement