Skip to content
Advertisement

Use just regexp to split a string into a ‘tuple’ of filename and extension?

I know there are easier ways to get file extensions with JavaScript, but partly to practice my regexp skills I wanted to try and use a regular expression to split a filename into two strings, before and after the final dot (. character).

Here’s what I have so far

const myRegex = /^((?:[^.]+(?:.)*)+?)(w+)?$/
const [filename1, extension1] = 'foo.baz.bing.bong'.match(myRegex);
// filename1 = 'foo.baz.bing.'
// extension1 = 'bong'
const [filename, extension] = 'one.two'.match(myRegex);
// filename2 = 'one.'
// extension2 = 'two'
const [filename, extension] = 'noextension'.match(myRegex);
// filename2 = 'noextension'
// extension2 = ''

I’ve tried to use negative lookahead to say ‘only match a literal . if it’s followed by a word that ends in, like so, by changing (?:.)* to (?:.(?=w+.))*:

/^((?:[^.]+(?:.(?=(w+.))))*)(w+)$/gm

But I want to exclude that final period using just the regexp, and preferably have ‘noextension’ be matched in the initial group, how can I do that with just regexp?

Here is my regexp scratch file: https://regex101.com/r/RTPRNU/1

Advertisement

Answer

For the first capture group, you could start the match with 1 or more word characters. Then optionally repeat a . and again 1 or more word characters.

Then you can use an optional non capture group matching a . and capturing 1 or more word characters in group 2.

As the second non capture group is optional, the first repetition should be on greedy.

^(w+(?:.w+)*?)(?:.(w+))?$

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • w+(?:.w+)*? Match 1+ word characters, and optionally repeat . and 1+ word characters
  • ) Close group 1
  • (?: Non capture group to match as a whole
    • .(w+) Match a . and capture 1+ word chars in capture group 2
  • )? Close non capture group and make it optional
  • $ End of string

Regex demo

const regex = /^(w+(?:.w+)*?)(?:.(w+))?$/;
[
  "foo.baz.bing.bong",
  "one.two",
  "noextension"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
    console.log(m[2]);
    console.log("----");
  }
});

Another option as @Wiktor Stribiżew posted in the comments, is to use a non greedy dot to match any character for the filename:

^(.*?)(?:.(w+))?$

Regex demo

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement