Skip to content
Advertisement

Playwright auto-scroll to bottom of infinite-scroll page

I am trying to automate the scraping of a site with “infinite scroll” with Python and Playwright.

The issue is that Playwright doesn’t include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.

From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate() function and some Javascript code.

For example, this works:

for i in range(20):
    page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight')
    page.wait_for_timeout(500)

The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True loop.

I need to find a way to tell it to keep scrolling until the final content loads.

This is the Javascript that I am currently trying in page.evaluate():

var intervalID = setInterval(function() {
    var scrollingElement = (document.scrollingElement || document.body);
    scrollingElement.scrollTop = scrollingElement.scrollHeight;
    console.log('fail')
}, 1000);
var anotherID = setInterval(function() {
    if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
        clearInterval(intervalID);
    }}, 1000)

This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn’t execute the code in intervals.

I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.

Advertisement

Answer

The new Playwright version has a scroll function. it’s called mouse.wheel(x, y). In the below code, we’ll be attempting to scroll through youtube.com which has an “infinite scroll”:

from playwright.sync_api import Playwright, sync_playwright
import time


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()

    # Open new page
    page = context.new_page()

    page.goto('https://www.youtube.com/')

    # page.mouse.wheel(horizontally, vertically(positive is 
    # scrolling down, negative is scrolling up)
    for i in range(5): #make the range as long as needed
        page.mouse.wheel(0, 15000)
        time.sleep(2)
        i += 1
    
    time.sleep(15)
    # ---------------------
    context.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement