Regex windows path validator



I’ve tried to find a windows file path validation for Javascript, but none seemed to fulfill the requirements I wanted, so I decided to build it myself.

The requirements are the following:

  • the path should not be empty
  • may begin with x:, x:\, , // and followed by a filename (no file extension required)
  • filenames cannot include the following special characters: <>:”|?*
  • filenames cannot end with dot or space

Here is the regex I came up with: /^([a-z]:((|/|\|//))|(\|//))[^<>:”|?*]+/i

But there are some issues:

  • it validates also filenames that include the special characters mentioned in the rules
  • it doesn’t include the last rule (cannot end with: . or space)

var reg = new RegExp(/^([a-z]:((\|/|\\|//))|(\\|//))[^<>:"|?*]+/i);
var startList = [
  'C://test',
  'C://te?st.html',
  'C:/test',
  'C://test.html',
  'C://test/hello.html',
  'C:/test/hello.html',
  '//test',
  '/test',
  '//test.html',
  '//10.1.1.107',
  '//10.1.1.107/test.html',
  '//10.1.1.107/test/hello.html',
  '//10.1.1.107/test/hello',
  '//test/hello.txt',
  '/test/html',
  '/tes?t/html',
  '/test.html',
  'test.html',
  '//',
  '/',
  '\\',
  '\',
  '/t!esrtr',
  'C:/hel**o'
];

startList.forEach(item => {
  document.write(reg.test(item) + '  >>>   ' + item);
  document.write("<br>");
});

Answer

Unfortunately, JavaScript flavour of regex does not support lookbehinds, but fortunately it does support lookaheads, and this is the key factor how to construct the regex.

Let’s start from some observations:

  1. After a dot, slash, backslash or a space there can not occur another dot, slash or backslash. The set of “forbidden” chars includes also n, because none of these chars can be the last char of the file name or its segment (between dots or (back-)slashes).

  2. Other chars, allowed in the path are the chars which you mentioned (other than …), but the “exclusion list” must include also a dot, slash, backslash, space and n (the chars mentioned in point 1).

  3. After the “initial part” (C:) there can be multiple instances of char mentioned in point 1 or 2.

Taking these points into account, I built the regex from 3 parts:

  • “Starting” part, matching the drive letter, a colon and up to 2 slashes (forward or backward).
  • The first alternative – either a dot, slash, backslash or a space, with negative lookahead – a list of “forbidden” chars after each of the above chars (see point 1).
  • The second alternative – chars mentioned in point 2.
  • Both the above alternatives can occur multiple times (+ quantifier).

So the regex is as follows:

  • ^ – Start of the string.
  • (?:[a-z]:)? – Drive letter and a colon, optional.
  • [/\]{0,2} – Either a backslash or a slash, between 0 and 2 times.
  • (?: – Start of the non-capturing group, needed due to the + quantifier after it.
    • [./\ ] – The first alternative.
    • (?![./\n]) – Negative lookahead – “forbidden” chars.
  • | – Or.
    • [^<>:"|?*./\ n] – The second alternative.
  • )+ – End of the non-capturing group, may occur multiple times.
  • $ – End of the string.

If you attempt to match each path separately, use only i option.

But if you have multiple paths in separate rows, and match them globally in one go, add also g and m options.

For a working example see https://regex101.com/r/4JY31I/1

Note: I suppose that ! should also be treated as a forbidden character. If you agree, add it to the second alternative, e.g. after *.



Source: stackoverflow