Skip to content
Advertisement

Puppeteer: compare return result with a text file

So I have this code, which is now scraping the web and returns the result (message and username):

function containsWords(words, message) {
    return words.some(w => message.includes(w));
}

async function grabResult(page) {
    const message = await page.$eval(
        'div > div:nth-child(2)',
        (el) => el.innerText
    );
    
    const username = await page.$eval(
        'child(15) .username',
        (el) => el.innerText
    );

    return {
        message: containsWords(['http', 'https'], message) ? '' : message,
        username: username
    };
};


module.exports = grabResult;

Since the above code is scraping a website that is dynamically changing, what I’m trying to achieve is to avoid returning duplicate messages.

One of the ways I thought it could be possible done is by:

creating a .txt file, in which would be stored the previous result.

So, anytime a fresh data is retrieved, before return, it would compare the new ‘message’ result with the .txt ‘message’ result, and if it’s the same data, it would return an empty message:

{ message: '', username: 'John' }

If the message data is unique however, it would return the data as it normally would:

{ message: 'message text', username: 'John' }

And update the .txt file with that data (so that it could compare with the freshly data again next time).

So basically, using a .txt file for comparison, before returning (logging) the data in the terminal.

My question is: is this process even possible?

If yes, any clues or help would be greatly appreciated.

I’m not a coder, I hope I made it clear.

thanks.

Advertisement

Answer

I advise you to use JSON instead of plain text — it would make all the checking easier.

  1. Create a file data.json in the same folder as your script placed in. The file must contain just two curly brackets: []. It will designate an empty object for starters.

  2. Your script will read the file using fs module and will make a JS array from it. Then it will check if the array has the current message. If so, the message will be converted to an empty string. If not, the array will be updated and the file will be rewritten.

Here is a script example:

const { readFileSync, writeFileSync } = require('fs');

function containsWords(words, message) {
    return words.some(w => message.includes(w));
}

async function grabResult(page) {
    const username = await page.$eval(
        'child(15) .username',
        (el) => el.innerText
    );

    let message = await page.$eval(
        'div > div:nth-child(2)',
        (el) => el.innerText
    );

    if(containsWords(['http', 'https'], message)) message = '';

    const dataArray = JSON.parse(readFileSync('./data.json', 'utf8'));

    if (dataArray.includes(message)) {
      message = '';
    } else {
      dataArray.push(message);
      writeFileSync('./data.json', JSON.stringify(dataArray));
    }

    return { message, username };
};

module.exports = grabResult;
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement