How to grab a specific text value from a big text or html file [closed]

Tags: , , ,



I would want to get only the path value form the below text/html. Actually it contains 10k lines, it would be very difficult to manually take the all path values. Is this possible to get the only path values through regex or through excel or any other possible way?

I would want to grab and take all the path value alone from the href attribute

<table>
   <tbody>
      <tr>
         <th>account</th>
         <th>size</th>
         <th>nodes</th>
         <th>props</th>
         <th></th>
      </tr>
      <tr>
         <td><a href=" /reports/?path=/root/en/products-services/course-products">course-products</a></td>
         <td class="number">955MB</td>
         <td class="number">80607</td>
         <td class="number">549393</td>
         <td width="100%">
            <table style="border: none;" width="100%">
               <tbody>
                  <tr>
                     <td style="border-width:1;width:58%" class="bar"></td>
                     <td style="border: none; width:42%"><b>58%</b></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
      <tr>
         <td><a href="/reports/?path=/root/products-services/silverthorn-7e-info">silverthorn-7e-info</a></td>
         <td class="number">83.5MB</td>
         <td class="number">149</td>
         <td class="number">778</td>
         <td width="100%">
            <table style="border: none;" width="100%">
               <tbody>
                  <tr>
                     <td style="border-width:1;width:5%" class="bar"></td>
                     <td style="border: none; width:95%"><b>5%</b></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
      <tr>
         <td><a href="/reports/?path =/root/products-services/sanders-2e-info">sanders-2e-info</a></td>
         <td class="number">45.5MB</td>
         <td class="number">9609</td>
         <td class="number">67184</td>
         <td width="100%">
            <table style="border: none;" width="100%">
               <tbody>
                  <tr>
                     <td style="border-width:1;width:3%" class="bar"></td>
                     <td style="border: none; width:97%"><b>3%</b></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
      <tr>
         <td><a href="/reports/?path=/root/products-services/davidson-10e-info">davidson-10e-info</a></td>
         <td class="number">39MB</td>
         <td class="number">53</td>
         <td class="number">288</td>
         <td width="100%">
            <table style="border: none;" width="100%">
               <tbody>
                  <tr>
                     <td style="border-width:1;width:2%" class="bar"></td>
                     <td style="border: none; width:98%"><b>2%</b></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
      <tr>

Answer

In javascript, with .each, you can do something like that

$( "tr" ).each(function( index ) {
    let ahref = $(this).find('a').attr('href');
    console.log(ahref);
});


Source: stackoverflow