I am trying to write Regex for New Zealand address validation. This is the valid character set which I want to capture, must start with a number and case insensitive which includes letters A to Z, numbers (0-9) and hyphen “–” and forward slash “/” as well as Maori accented characters for Maori vowels ā, ē, ī, ō, ū and works in JavaScript to display an invalid error message, just not with the HTML5 form validation.
... // JavaScript regex var regex = /^d[/a-zĀ-ū0-9s,'-]*$/i; ...
Because I am attempting to do this in BigCommerce and don’t have access to edit the input I am applying the “pattern” HTML input attribute with JavaScript. I really did think it was as simple as stripping “/^” from the start of the regex and “$/” from the end of the regex when applying to the HTML pattern attribute:
... /** @start JavaScript code for HTML5 form validation **/ let fulladdress = document.getElementById('addressLine1Input'); fulladdress.setAttribute("pattern", "d[/a-zĀ-ū0-9s,'-]*"); fulladdress.addEventListener('input', () => { fulladdress.setCustomValidity(''); fulladdress.checkValidity(); }); fulladdress.addEventListener('invalid', () => { fulladdress.setCustomValidity('No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive'); }); /** @end JavaScript code for HTML5 form validation **/ ...
HTML snippet:
... <input id="addressLine1Input" name="shippingAddress.address1" placeholder="Enter your address" onFocus="geolocate()" type="text" class="form-control" onblur="validateAddress()" required> ...
I created a JSFiddle, the lines of interest are 13 – 26 on the JavaScript area JSFiddle example
This is an invalid address string:
Flat 1 311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand
This is a valid address string:
1/311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand
The form validation pops up once you enter an address and click the Submit button
Thank you really appreciate the input from the community.
This code works perfectly for all the validation examples I want, if there is a way to use it with HTML5 tool tips and form validation that would serve as a very viable workaround:
var regex = /^.*(pos*box|privates*bag).*$|^d[/a-zĀ-ū0-9s,'-]*$/i; ... function validateAddress() { var str = getValue(); var match = str.match(regex); var tooltip = document.getElementById("notification"); var msg = document.getElementById("msg"); if (match && !match[1]) { // valid address msg.innerHTML = "<p>Address looks to be valid</p>"; tooltip.style.display = 'none'; } else { // invalid address msg.innerHTML = "<p>Invalid address (No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive)</p>"; tooltip.style.display = 'block'; } } ...
Advertisement
Answer
After the comments we have exchanged, I think there are several points to discuss.
A regular expression to validate an address may get complex
I am not really convinced that a regular expression can be used for the field in which the user will type his address with the Google autocomplete feature. Indeed, there are many cases to consider.
Let’s take in consideration the fact that you would like to use the pattern attribute on the field itself and also use a JavaScript regular expression.
The pattern attribute is already matching the full input value. In fact,
it’s automatically wrapped between ^(?:
and )$
. The parenthesis are there
to avoid changing the behaviour of the |
operator. It’s transparent for us and we cannot use the /patter/modifiers
syntax like in
/[a-z]/i
.
So, as explained above, unfortunately, the pattern attribute doesn’t accept
regex flags. And the stupid thing is that JS still doesn’t accept inline
modifiers such as (?i)
to turn on the case-insensitive flag. This means
that we cannot turn on the u
= unicode flag either. This would have been
great since the unicode flag lets you use p{L}
to match any char of any
language, such as à
, ã
or é
. The fact is that w
is equivalent to
[a-zA-Z0-9_]
so it will be ok for english letters but not for your Ā
which
you mentionned.
Now, if we use [wĀ-ū]
then we will actually match a
bunch of letters between 256 and 363,
including some like ŁĦŘ
that I think you don’t want. This is where
unicode and p{...}
classes would help writing the regex but this would only work in pure JS and not in the pattern attribute.
An address can contain a building name such as:
Totârä Farm, 2/12543 Farm Road, RD 1, Outram 9073
The user could be staying by someone, thus prefixing the address with c/o:
c/o James Bond, 007 Agent Street, London, Greater London, SW1A 2AA, United Kingdom.
I found these examples of addresses on the New Zealand Post and they have to be valid too.
Let’s have a try with your pattern: https://regex101.com/r/R8Bjy4/1
We see that it’s not really bullet proof and this is why I don’t think it will be as easy as we could think. This is why I think that using Google’s autocomplete and then validating the exploded address components would probably be easier.
But for the exercice, let’s try with a regex…
Matching any letter is more complicated without the unicode flag. But if we look at the unicode tables we see that we can add some ranges:
[À-ɏ]
covering Latin 1 supplement , Latin Extended-A and Latin Extended-B[Ḁ-ỹ]
covering Latin Extended Additional without the medievalist end.
This leads to
[/wÀ-ɏḀ-ỹ .,'#-]
in order to accept the slash, any letter or digit, almost all latin letters, simple space (not the same ass
which includes new lines and tabs), dot, comma, single quote, hashtag and the hypen.I saw that you often wrote
[,'-]
in your pattern. In fact it’s not necessary to escape chars inside a class of chars[...]
except for the “]” char and for ones that have a meaning likes
,n
ord
. The “.” or “|” chars outside a group of chars should effectively be escaped as.
and respectively|
but if they are inside a class of chars then you can write them directly. Example: to match any char of “.-,?|]” then you’ll use[.,?|]-]
as pattern. The hyphen is used for ranges. So if you want to match it then you have to put it at the beginning or end of the class:[-w]
and[w-]
are equivalent and both match any letter, digit, underscore and hyphen. In JS, the slash is used to delimit the pattern from the flags so you have to escape it anyway if you have it outside or inside a character class.The PO Box or Private Bag:
- Make it case-insensitive without the
i
flag and replaces
by
PO Box =>[pP][oO] +[bB][oO][xX]
Private Bag =>[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]
- Add a mandatory number (which we capture for debug purpose) and then
match anything else after it:
^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*
- Make it case-insensitive without the
Some street numbers contain letters or slashes but they must start with a digit:
(d[/w]*)
In front of the street number they may be a building name:
- We’ll assume that the building name itself doesn’t contain a
comma. So it could be almost any char but not a comma:
[/wÀ-ɏḀ-ỹ .'-]*
- It’s then followed by a comma and probably some spaces.
All of it is optional and it must be at the beginning of the
address:
^([/wÀ-ɏḀ-ỹ .'-]*, *)?
- We’ll assume that the building name itself doesn’t contain a
comma. So it could be almost any char but not a comma:
Putting it together:
The PO Box or Private Bag:
^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*
The standard address with at least a number and an optional building prefix:
^([/wÀ-ɏḀ-ỹ .'-]*, *)?(d[/w]*)[/wÀ-ɏḀ-ỹ .,'#-]*$
Testing it: https://regex101.com/r/P4bEVf/6
Ok, it’s working but it’s accepting to many invalid entries. As you see, it’s difficult to get something bullet proof… Yes, we can improve the regex but I don’t think it will be easy!
And trying the same regex in the pattern attribute:
let address = document.getElementById('address'); let log_ul = document.getElementById('log'); let submit = document.getElementById('submit'); document.getElementById('demo-form').addEventListener('submit', (e) => { let li = document.createElement('li'); li.textContent = address.value; log_ul.appendChild(li); e.preventDefault(); });
input[type="text"] { min-width: 30em; }
<form id="demo-form" action=""> <input type="text" id="address" name="address" placeholder="Type your full address here" pattern="([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*|([/wÀ-ɏḀ-ỹ .'-]*, *)?(d[/w]*)[/wÀ-ɏḀ-ỹ .,'#-]*" title="Address format: '45 Street Name, 2000 City, Country' or 'PO Box 2365, City'" /> <input type="submit" value="submit" id="submit"> </form> <ul id="log"> </ul>