Skip to content
Advertisement

Regex validation working with JavaScript but not with HTML5 input validation pattern

I am trying to write Regex for New Zealand address validation. This is the valid character set which I want to capture, must start with a number and case insensitive which includes letters A to Z, numbers (0-9) and hyphen “” and forward slash “/” as well as Maori accented characters for Maori vowels ā, ē, ī, ō, ū and works in JavaScript to display an invalid error message, just not with the HTML5 form validation.

...

// JavaScript regex
var regex = /^d[/a-zĀ-ū0-9s,'-]*$/i;

...

Because I am attempting to do this in BigCommerce and don’t have access to edit the input I am applying the “pattern” HTML input attribute with JavaScript. I really did think it was as simple as stripping “/^” from the start of the regex and “$/” from the end of the regex when applying to the HTML pattern attribute:

...

/** @start JavaScript code for HTML5 form validation **/ 
let fulladdress = document.getElementById('addressLine1Input');

fulladdress.setAttribute("pattern", "d[/a-zĀ-ū0-9s,'-]*");

fulladdress.addEventListener('input', () => {
  fulladdress.setCustomValidity('');
  fulladdress.checkValidity();
});

fulladdress.addEventListener('invalid', () => {
  fulladdress.setCustomValidity('No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive');
});

/** @end JavaScript code for HTML5 form validation **/

...

HTML snippet:

...

<input id="addressLine1Input" name="shippingAddress.address1" placeholder="Enter your address" onFocus="geolocate()" type="text" class="form-control" onblur="validateAddress()" required>

...

I created a JSFiddle, the lines of interest are 13 – 26 on the JavaScript area JSFiddle example

This is an invalid address string:

Flat 1 311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

This is a valid address string:

1/311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

The form validation pops up once you enter an address and click the Submit button

Thank you really appreciate the input from the community.

This code works perfectly for all the validation examples I want, if there is a way to use it with HTML5 tool tips and form validation that would serve as a very viable workaround:

var regex = /^.*(pos*box|privates*bag).*$|^d[/a-zĀ-ū0-9s,'-]*$/i;

...

function validateAddress() {
  var str = getValue();
  var match = str.match(regex);
  var tooltip = document.getElementById("notification");
  var msg = document.getElementById("msg");

  if (match && !match[1]) {

    // valid address
    msg.innerHTML = "<p>Address looks to be valid</p>";
    tooltip.style.display = 'none';

  } else {

    // invalid address
    msg.innerHTML = "<p>Invalid address (No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive)</p>";
    tooltip.style.display = 'block';

  }
}

...

Advertisement

Answer

After the comments we have exchanged, I think there are several points to discuss.

A regular expression to validate an address may get complex

I am not really convinced that a regular expression can be used for the field in which the user will type his address with the Google autocomplete feature. Indeed, there are many cases to consider.

Let’s take in consideration the fact that you would like to use the pattern attribute on the field itself and also use a JavaScript regular expression.

The pattern attribute is already matching the full input value. In fact, it’s automatically wrapped between ^(?: and )$. The parenthesis are there to avoid changing the behaviour of the | operator. It’s transparent for us and we cannot use the /patter/modifiers syntax like in /[a-z]/i.

So, as explained above, unfortunately, the pattern attribute doesn’t accept regex flags. And the stupid thing is that JS still doesn’t accept inline modifiers such as (?i) to turn on the case-insensitive flag. This means that we cannot turn on the u = unicode flag either. This would have been great since the unicode flag lets you use p{L} to match any char of any language, such as à, ã or é. The fact is that w is equivalent to [a-zA-Z0-9_] so it will be ok for english letters but not for your Ā which you mentionned.

Now, if we use [wĀ-ū] then we will actually match a bunch of letters between 256 and 363, including some like ŁĦŘ that I think you don’t want. This is where unicode and p{...} classes would help writing the regex but this would only work in pure JS and not in the pattern attribute.

An address can contain a building name such as:
Totârä Farm, 2/12543 Farm Road, RD 1, Outram 9073

The user could be staying by someone, thus prefixing the address with c/o:
c/o James Bond, 007 Agent Street, London, Greater London, SW1A 2AA, United Kingdom.

I found these examples of addresses on the New Zealand Post and they have to be valid too.

Let’s have a try with your pattern: https://regex101.com/r/R8Bjy4/1

We see that it’s not really bullet proof and this is why I don’t think it will be as easy as we could think. This is why I think that using Google’s autocomplete and then validating the exploded address components would probably be easier.

But for the exercice, let’s try with a regex…

  • Matching any letter is more complicated without the unicode flag. But if we look at the unicode tables we see that we can add some ranges:

    This leads to [/wÀ-ɏḀ-ỹ .,'#-] in order to accept the slash, any letter or digit, almost all latin letters, simple space (not the same as s which includes new lines and tabs), dot, comma, single quote, hashtag and the hypen.

    I saw that you often wrote [,'-] in your pattern. In fact it’s not necessary to escape chars inside a class of chars [...] except for the “]” char and for ones that have a meaning like s, n or d. The “.” or “|” chars outside a group of chars should effectively be escaped as . and respectively | but if they are inside a class of chars then you can write them directly. Example: to match any char of “.-,?|]” then you’ll use [.,?|]-] as pattern. The hyphen is used for ranges. So if you want to match it then you have to put it at the beginning or end of the class: [-w] and [w-] are equivalent and both match any letter, digit, underscore and hyphen. In JS, the slash is used to delimit the pattern from the flags so you have to escape it anyway if you have it outside or inside a character class.

  • The PO Box or Private Bag:

    1. Make it case-insensitive without the i flag and replace s by since we don’t want to match a newline or a tab:
      PO Box => [pP][oO] +[bB][oO][xX]
      Private Bag => [pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]
    2. Add a mandatory number (which we capture for debug purpose) and then match anything else after it:
      ^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*
  • Some street numbers contain letters or slashes but they must start with a digit: (d[/w]*)

  • In front of the street number they may be a building name:

    1. We’ll assume that the building name itself doesn’t contain a comma. So it could be almost any char but not a comma: [/wÀ-ɏḀ-ỹ .'-]*
    2. It’s then followed by a comma and probably some spaces. All of it is optional and it must be at the beginning of the address: ^([/wÀ-ɏḀ-ỹ .'-]*, *)?

Putting it together:

  • The PO Box or Private Bag:

    ^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*
    
  • The standard address with at least a number and an optional building prefix:

    ^([/wÀ-ɏḀ-ỹ .'-]*, *)?(d[/w]*)[/wÀ-ɏḀ-ỹ .,'#-]*$
    

Testing it: https://regex101.com/r/P4bEVf/6

Ok, it’s working but it’s accepting to many invalid entries. As you see, it’s difficult to get something bullet proof… Yes, we can improve the regex but I don’t think it will be easy!

And trying the same regex in the pattern attribute:

let address = document.getElementById('address');
let log_ul = document.getElementById('log');
let submit = document.getElementById('submit');  

document.getElementById('demo-form').addEventListener('submit', (e) => {
  let li = document.createElement('li');
  li.textContent = address.value;
  log_ul.appendChild(li);
  e.preventDefault();
});
input[type="text"] {
  min-width: 30em;
}
<form id="demo-form" action="">
  <input type="text" id="address" name="address"
         placeholder="Type your full address here"
         pattern="([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(d+).*|([/wÀ-ɏḀ-ỹ .'-]*, *)?(d[/w]*)[/wÀ-ɏḀ-ỹ .,'#-]*"
         title="Address format: '45 Street Name, 2000 City, Country' or 'PO Box 2365, City'" />
  <input type="submit" value="submit" id="submit">
</form>
<ul id="log">
</ul>
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement