Regex for Extracting the Country Name

Tags: ,



What regular expression would extract the country name when used with any of the lines below?

I’ve got a dropdown with all of these as choices and I’m trying to extract the country only, but I’m failing miserably since JavaScript doesn’t seem to support lookbehinds and I have no idea how to exclude the emoji part otherwise. (Not to mention that special characters such as that Ã… in Ã…land Islands don’t make it any easier.)

Thanks!

🇦🇫 Afghanistan +93
🇦🇽 Åland Islands +358
🇦🇱 Albania +355
🇩🇿 Algeria +213
🇦🇸 American Samoa +1684
🇦🇩 Andorra +376
🇦🇴 Angola +244
🇦🇮 Anguilla +1264
🇦🇬 Antigua & Barbuda +1268
🇦🇷 Argentina +54
🇦🇲 Armenia +374
🇦🇼 Aruba +297
🇦🇺 Australia +61
🇦🇹 Austria +43
🇦🇿 Azerbaijan +994
🇧🇸 Bahamas +1242
🇧🇭 Bahrain +973
🇧🇩 Bangladesh +880
🇧🇧 Barbados +1246
🇧🇾 Belarus +375
🇧🇪 Belgium +32
🇧🇿 Belize +501
🇧🇯 Benin +229
🇧🇲 Bermuda +1441
🇧🇹 Bhutan +975
🇧🇴 Bolivia +591
🇧🇦 Bosnia & Herzegovina +387
🇧🇼 Botswana +267
🇧🇷 Brazil +55
🇮🇴 British Indian Ocean Territory +246
🇻🇬 British Virgin Islands +1284
🇧🇳 Brunei +673
🇧🇬 Bulgaria +359
🇧🇫 Burkina Faso +226
🇧🇮 Burundi +257
🇰🇭 Cambodia +855
🇨🇲 Cameroon +237
🇨🇦 Canada +1
🇨🇻 Cape Verde +238
🇳🇱 Carribbean Netherlands +599
🇰🇾 Cayman Islands +1345
🇨🇫 Central African Republic +236
🇹🇩 Chad +235
🇨🇱 Chile +56
🇨🇳 China +86
🇨🇽 Christmas Islands +61
🇨🇨 Cocos Islands +61
🇨🇴 Colombia +57
🇰🇲 Comoros +269
🇨🇩 Congo-Kinshasa +243
🇨🇬 Congo-Brazzaville +242
🇨🇰 Cook Islands +682
🇨🇷 Costa Rica +506
🇭🇷 Croatia +385
🇨🇺 Cuba +53
🇨🇼 Curaçao +599
🇨🇾 Cyprus +357
🇨🇿 Czechia +420
🇩🇰 Denmark +45
🇩🇯 Djibouti +253
🇩🇲 Dominica +1767
🇩🇴 Dominican Republic +1
🇪🇨 Ecuador +593
🇪🇬 Egypt +20
🇸🇻 El Salvador +503
🇬🇶 Equatorial Guinea +240
🇪🇷 Eritrea +291
🇪🇪 Estonia +372
🇪🇹 Ethiopia +251
🇫🇰 Falkland Islands +500
🇫🇴 Faroe Islands +298
🇫🇯 Fiji +679
🇫🇮 Finland +358
🇫🇷 France +33
🇬🇫 French Guiana +594
🇵🇫 French Polynesia +689
🇬🇦 Gabon +241
🇬🇲 Gambia +220
🇬🇪 Georgia +995
🇩🇪 Germany +49
🇬🇭 Ghana +233
🇬🇮 Gibraltar +350
🇬🇷 Greece +30
🇬🇱 Greenland +299
🇬🇩 Grenada +1473
🇬🇵 Guadeloupe +590
🇬🇺 Guam +1671
🇬🇹 Guatemala +502
🇬🇬 Guernsey +44
🇬🇳 Guinea +224
🇬🇼 Guinea-Bissau +245
🇬🇾 Guyana +592
🇭🇹 Haiti +509
🇭🇳 Honduras +504
🇭🇰 Hong Kong +852
🇭🇺 Hungary +36
🇮🇸 Iceland +354
🇮🇳 India +91
🇮🇩 Indonesia +62
🇮🇷 Iran +98
🇮🇶 Iraq +964
🇮🇪 Ireland +353
🇮🇲 Isle of Man +44
🇮🇱 Israel +972
🇮🇹 Italy +39
🇨🇮 Ivory Coast +225
🇯🇲 Jamaica +1
🇯🇵 Japan +81
🇯🇪 Jersey +44
🇯🇴 Jordan +962
🇰🇿 Kazakhstan +7
🇰🇪 Kenya +254
🇰🇮 Kiribati +686
🇽🇰 Kosovo +383
🇰🇼 Kuwait +965
🇰🇬 Kyrgyzstan +996
🇱🇦 Laos +856
🇱🇻 Latvia +371
🇱🇧 Lebanon +961
🇱🇸 Lesotho +266
🇱🇷 Liberia +231
🇱🇾 Libya +218
🇱🇮 Liechtenstein +423
🇱🇹 Lithuania +370
🇱🇺 Luxembourg +352
🇲🇴 Macau +853
🇲🇬 Madagascar +261
🇲🇼 Malawi +265
🇲🇾 Malaysia +60
🇲🇻 Maldives +960
🇲🇱 Mali +223
🇲🇹 Malta +356
🇲🇭 Marshall Islands +692
🇲🇶 Martinique +596
🇲🇷 Mauritania +222
🇲🇺 Mauritius +230
🇾🇹 Mayotte +262
🇲🇽 Mexico +52
🇫🇲 Micronesia +691
🇲🇩 Moldova +373
🇲🇨 Monaco +377
🇲🇳 Mongolia +976
🇲🇪 Montenegro +382
🇲🇸 Montserrat +1664
🇲🇦 Morocco +212
🇲🇿 Mozambique +258
🇲🇲 Myanmar +95
🇳🇦 Namibia +264
🇳🇷 Nauru +674
🇳🇵 Nepal +977
🇳🇱 Netherlands +31
🇳🇨 New Caledonia +687
🇳🇿 New Zealand +64
🇳🇮 Nicaragua +505
🇳🇪 Niger +227
🇳🇬 Nigeria +234
🇳🇺 Niue +683
🇳🇫 Norfolk Island +6723
🇰🇵 North Korea +850
🇲🇰 North Macedonia +389
🇲🇵 Northern Mariana Islands +1670
🇳🇴 Norway +47
🇴🇲 Oman +968
🇵🇰 Pakistan +92
🇵🇼 Palau +680
🇵🇦 Panama +507
🇵🇬 Papua New Guinea +675
🇵🇾 Paraguay +595
🇵🇪 Peru +51
🇵🇭 Philippines +63
🇵🇱 Poland +48
🇵🇹 Portugal +351
🇵🇷 Puerto Rico +1
🇶🇦 Qatar +974
🇫🇷 Réunion +262
🇷🇴 Romania +40
🇷🇺 Russia +7
🇷🇼 Rwanda +250
🇧🇱 Saint-Barthélemy +590
🇸🇭 Saint Helena +290
🇰🇳 Saint Kitts & Nevis +1869
🇱🇨 Saint Lucia +1758
🇫🇷 Saint Martin +590
🇵🇲 Saint Pierre & Miquelon +508
🇻🇨 Saint Vincent & Grenadines +1784
🇼🇸 Samoa +685
🇸🇲 San Marino +378
🇸🇹 São Tomé & Príncipe +239
🇸🇦 Saudi Arabia +966
🇸🇳 Senegal +221
🇷🇸 Serbia +381
🇸🇨 Seychelles +248
🇸🇱 Sierra Leone +232
🇸🇬 Singapore +65
🇸🇽 Sint Maarten +1721
🇸🇰 Slovakia +421
🇸🇮 Slovenia +386
🇸🇧 Solomon Islands +677
🇸🇴 Somalia +252
🇿🇦 South Africa +27
🇰🇷 South Korea +82
🇸🇸 South Sudan +211
🇪🇸 Spain +34
🇱🇰 Sri Lanka +94
🇸🇩 Sudan +249
🇸🇷 Suriname +597
🇳🇴 Svalbard & Jan Mayen +47
🇸🇿 Swaziland +268
🇸🇪 Sweden +46
🇨🇭 Switzerland +41
🇸🇾 Syria +963
🇹🇼 Taiwan +886
🇹🇯 Tajikistan +992
🇹🇿 Tanzania +255
🇹🇭 Thailand +66
🇹🇱 Timor-Leste +670
🇹🇬 Togo +228
🇹🇰 Tokelau +690
🇹🇴 Tonga +676
🇹🇹 Trinidad & Tobago +1868
🇹🇳 Tunisia +216
🇹🇷 Turkey +90
🇹🇲 Turkmenistan +993
🇹🇨 Turks & Caicos Islands +1649
🇹🇻 Tuvalu +688
🇻🇮 U.S. Virgin Islands +1340
🇺🇬 Uganda +256
🇺🇦 Ukraine +380
🇦🇪 United Arab Emirates +971
🇬🇧 United Kingdom +44
🇺🇸 United States +1
🇺🇾 Uruguay +598
🇺🇿 Uzbekistan +998
🇻🇺 Vanuatu +678
🇻🇦 Vatican City +39
🇻🇪 Venezuela +58
🇻🇳 Vietnam +84
🇼🇫 Wallis & Futuna +681
🇪🇭 Western Sahara +212
🇾🇪 Yemen +967
🇿🇲 Zambia +260
🇿🇼 Zimbabwe +263

Answer

Maybe,

Ss([^rn]*?)s*+[0-9]+$

might return the country names in the capturing group $1.


Using lookaround, we can likely write some expression similar to:

S[A-Za-zéã.].*(?=s+[0-9])

which we would get the second letter using,

[A-Za-zéã.]   

prior to which, there is another S, and we would then bypass the emojis.

Demo 2

const regex = /Ss([^rn]*?)s*+[0-9]+$/gm;
const str = `🇦🇫 Afghanistan +93
🇦🇽 Åland Islands +358
🇦🇱 Albania +355
🇩🇿 Algeria +213
🇦🇸 American Samoa +1684
🇦🇩 Andorra +376
🇦🇴 Angola +244
🇦🇮 Anguilla +1264`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

If you wish to simplify/modify/explore the expression, it’s been explained on the top right panel of regex101.com. If you’d like, you can also watch in this link, how it would match against some sample inputs.




Source: stackoverflow