This article will show you how to remove elements from HTML DOM (Document Object Model) when using automated web-browser framework Selenium. If you want to remove anything you do not want on your site during web-scraping, you came to the right Selenium tutorial.
Let’s start with a practical example of removing elements from DOM. Imagine scrapping a website with Selenium and taking a screenshot of the page. However, the timed modal window will unannounced pop up when Selenium’s driver spin the browser window. So it is probably a good idea to remove the window before taking a picture in such a case.
When scrapping sites, removing excessive advertisement and disturbing elements is hard enough. But pop-up windows in the wrong moments are an overall user experience nowadays on many sites and blogs.
For example, we want to scrap stock time-series from famous sites like finviz.com. Unfortunately, after the first few seconds on the page, the cookies modal window pop-up and block the stock graph. Therefore, I will need to remove the cookie allowance modal window to take a clear screenshot.
Let me show you two different ways to remove elements from Selenium’s HTML DOM. One way will be about eliminating parts that load with the page load and can be removed immediately. The second way will require calling WebDriverWait instance, and the way will be used for elements that pop up or load to page letter.
However, both options do not remove the element from the DOM. Instead, they hide it and make it invisible in the Selenium web driver.
Remove elements from Selenium’s DOM without waiting
FirefoxOptions options = new FirefoxOptions(); options.addArguments("-width=1920"); options.addArguments("-height=1080"); FirefoxDriver driver = new FirefoxDriver(options); String stock = "T"; // AT&T Inc. driver.get("https://finviz.com/quote.ashx?t=" + stock + "&ty=c&ta=1&p=d"); driver.executeScript("return document.getElementsByClassName('snapshot-table2').remove();");
If we look at the code, it might look straightforward. But I will go through it briefly. Initially, we set options for our Selenium Driver. Then, we pick Firefox as Selenium driver and put options (browser windows dimensions) into the driver itself.
After setting drivers options and instantiating Firefox drivers, we are clear to the call page in Firefox. We call finviz.com stock page for AT&T Inc. company. The page which is loaded is immediately available for changes. If we want to remove any element we do not like, we can call script execution upon the loaded HTML DOM. We call executed script, which will remove the first (0-th) element of the specific class name. It will be the first data table under the stock time-series graph.
Note : If you have better idea for elements removal from Selenium’s HTML DOM, let me know in comments below.
Remove elements from Selenium’s DOM with waiting
FirefoxOptions options = new FirefoxOptions(); options.addArguments("-width=1920"); options.addArguments("-height=1080"); FirefoxDriver driver = new FirefoxDriver(options); String stock = "T"; // AT&T Inc. driver.get("https://finviz.com/quote.ashx?t=" + stock + "&ty=c&ta=1&p=d"); WebDriverWait wait = new WebDriverWait(driver, 10); WebElement cookieWindow = wait.until(ExpectedConditions.visibilityOfElementLocated(By.className("ConsentManager__Overlay-np32r2-0"))); driver.executeScript("return arguments.remove();", cookieWindow);
The code above looks straightforward also. Initially, we set options for our Selenium Driver and picked Firefox as Selenium driver.
After setting drivers options and instantiating Firefox driver, we are clear to the call page in Firefox. We call the finviz.com stock page for AT&T Inc. company and wait for 10 seconds or until HTML DOM does not contain an element with a specific class name. If it does, we will execute the script upon it, which will remove the first (the 0-th) element when it is found. Action means removing the whole pop-up modal window that appeared in the first 10 seconds of finviz.com page duration.
Class name “ConsentManager__Overlay-np32r2-0” is class name of pop-up cookie modal window. However, this class name might and surely will change in the future. So I would not rely on it for 100%. But for now, it serves our purpose.
This article has shown how remove elements from HTML DOM when using automated web-browser framework Selenium. Articles provide two different ways how to remove element on the page. Elements which load with page load can be removed immediately, without calling WebDriverWait instance. However, elements which pop-up or load to page letter needs to be removed with help of WebDriverWait instance.
However, if you look better, code snippets in the article can help you to build your Selenium web-scrapper 😉
Did you find element removal easy? Do you have your trick or know another way how to remove elements from HTML DOM in Selenium? Let us know in the comments below the article. We would like to hear your ideas and stories, and you might help others as well.