I am trying to automate the scraping of a site with “infinite scroll” with Python and Playwright.
The issue is that Playwright doesn’t include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.
From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate()
function and some Javascript code.
For example, this works:
for i in range(20): page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight') page.wait_for_timeout(500)
The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True
loop.
I need to find a way to tell it to keep scrolling until the final content loads.
This is the Javascript that I am currently trying in page.evaluate()
:
var intervalID = setInterval(function() { var scrollingElement = (document.scrollingElement || document.body); scrollingElement.scrollTop = scrollingElement.scrollHeight; console.log('fail') }, 1000); var anotherID = setInterval(function() { if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) { clearInterval(intervalID); }}, 1000)
This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn’t execute the code in intervals.
I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.
Advertisement
Answer
The new Playwright version has a scroll function. it’s called mouse.wheel(x, y)
. In the below code, we’ll be attempting to scroll through youtube.com which has an “infinite scroll”:
from playwright.sync_api import Playwright, sync_playwright import time def run(playwright: Playwright) -> None: browser = playwright.chromium.launch(headless=False) context = browser.new_context() # Open new page page = context.new_page() page.goto('https://www.youtube.com/') # page.mouse.wheel(horizontally, vertically(positive is # scrolling down, negative is scrolling up) for i in range(5): #make the range as long as needed page.mouse.wheel(0, 15000) time.sleep(2) i += 1 time.sleep(15) # --------------------- context.close() browser.close() with sync_playwright() as playwright: run(playwright)