Match text between single quotes, double quotes, or no quotes at all

Tags: , ,



I’m trying to parse CLI-like arguments that could be enclosed in single quotes, double quotes, or no quotes at all.
Here’s an example of what I’m trying to get:

// --message "This is a 'quoted' message" --other 'This uses the "other" quotes'
const str = "--message "This is a 'quoted' message" --other 'This uses the "other" quotes'"

matchGitArgs(str) // ['--message', 'This is a 'quoted' message', '--other', 'This uses the "other" quotes']

I’ve found a lot of similar questions, so this is what makes it different from them:

  • It’s important that it matches the arguments not in quotes too, and keeps the original order
  • It should be able to parse single and double quote arguments in the same string
  • It should not match the quotes themselves:
matchGitArgs('This is "quoted"')
// Correct: ['This', 'is', 'quoted']
// Wrong: ['This', 'is', '"quoted"']
  • It should allow escape quotes and other quotes inside it:
matchGitArgs('It is "ok" to use 'these'')
// ["It", "is", "ok", "to", "use", "these"]

I’ve tried using a lot of different Regex patterns I’ve found here but they all didn’t satisfy one of these conditions. I’ve also tried using libraries meant to parse CLI arguments, but it seems like they all rely on the process.argv (in Node.js), which is already split correctly based on the quotes, and so doesn’t help me.
What I essentially need to do is generate an array like process.argv.

It doesn’t need to be a single regex, a js/ts function that does the same it’s ok too.

Answer

“Verbose” expressions and named groups work especially well for tokenizing problems:

function* parseArgs(cmdLine) {

    const re = String.raw`
        (
            -- (?<longOpt> w+)
            (s+ | =)
        )

        | (
            - (?<shortOpt> w+)
            s+
        )

        | (
            ('
                (?<sq> (\. | [^'])* )
            ')
            s+
        )

        | (
            ("
                (?<dq> (\. | [^"])* )
            ")
            s+
        )

        | (
            (?<raw> [^s"'-]+)
            s+
        )

        | (?<error> S)

    `.replace(/s+/g, '');

    for (let m of (cmdLine + ' ').matchAll(re)) {
        let g = Object.entries(m.groups).filter(p => p[1]);

        let [type, val] = g[0];

        switch (type) {
            case 'error':
                throw new Error(m.index);
            case 'sq':
            case 'dq':
                yield ['value', val.replace(/\/g, '')];
                break;
            case 'raw':
                yield ['value', val];
                break;
            case 'longOpt':
            case 'shortOpt':
                yield ['option', val];
        }
    }
}

//

args = String.raw`
    --message "This is "a" 'quoted' message"
    -s
    --longOption 'This uses the "other" quotes'
    --foo 1234
    --file=message.txt
    --file2="Application Support/message.txt"
`

for (let [type, s] of parseArgs(args))
    console.log(type, ':', s)


Source: stackoverflow