regular expression to capture pdf data in nodejs

Question

I have this code to get specific data from a pdf that is already converted to a string. basically, this is the string i have after that. I need a regular expression that captures de numbers only, I expect something like this: [1308906.95, 230942.51] this is my NodeJS code this is the code I have so far, I wou…

Accepted Answer

You can useconst text = 'Valor del Fondo (Mill COP)n1,308,906.95nValor fondo deninversión  (Mill COP)nn                          230,942.51 Inversión inicial mínima (COP)\';console.log(  Array.from(text.matchAll(    /valor(?:s+del)?s+fondo(?:s+des+inversi[óo]n)?D*(d(?:[.,d]*d)?)/gi),    x=>x[1])  .map(x => x.replace(/,/g, '')));See the regex demo. Regex details:valor &#8211; a valor string(?:s+del)? &#8211; an optional sequence of one or more whitespaces and then dels+ &#8211; one or more whitespacesfondo &#8211; a fixed string(?:s+des+inversi[óo]n)? &#8211; an optional sequence of one or more whitespaces, de, one or more whitespaces, inversionD* &#8211; zero or more non-digit chars(d(?:[.,d]*d)?) &#8211; Group 1: a digit and then an optional sequence of zero or more digits, commas or dots and then a digit.String#matchAll finds all non-overlapping occurrences, Array.from(..., x=>x[1]) gets Group 1 values and .map(x => x.replace(/,/g, '') removes commas from the values obtained.

Advertisement

Answer