Puppeteer is the NodeJs library that provides API to automate Chrome or Chromium browsers. It can be used to get the inner text of any element on the page however the approach differs slightly for the individual type of elements.
Let’s explore how we can scrape the inner text of headings, links, paragraphs, list, table, button, input, text area elements using puppeteer.
We will be using the following test page which contains all types of HTML elements.
I will be using below code as starting point
First we will get the text of
h1 heading which is displayed inside
The selector of
h1 element is
"body > div > div > h1"
To get element of heading we need to use
After, you put above code just below comment
// rest of code goes below and execute the file using
node index.js command, it should output text
Hey 👋, I'm Gulshan Saini on terminal console.
Getting link text is very much similar to getting the text of the heading. We will be getting the text of the first item in the navigation list i.e.
Portfolio at the time of writing.
The CSS selector of element is
#nav-menu > li:nth-child(1)
Once, you save above code and run the script again you should see text
Portfolio on console.
Next, we will be targetting paragraph element displayed inside
The CSS selector of paragraph is
body > div > div > p
You should get below output after saving above code and running the
So far everything was simple and the technique was common to get the text. Let’s now see how we can iterate over list elements and print individual item text
We will be selecting the list items in the
Services section having selector
#services ul li
So to get innerText is like following,
Let’s understand what is happening here
document.querySelectorAll('#services ul li')selects all nodes
Array.fromconverts all nodes to array list as
NodeListinstead of array
We first get all elements using
page.evaluate method which captures all the nodes using
NodeList of matching selector i.e.
document.querySelectorAll("#services ul li").
services variable holds the inner text of all list elements is order they were present on page. After saving above code you should see below output on console
Getting the text of input element or input element of type submit i.e. button works differently as the text is contained inside the
To get the text of the input element we need to use
element.value instead of
We will be using the below code to get the text of the input button that is of type
Once you save the above code and run
node index.js, this should return the inner text of the button i.e.
Below is the final code which contains all the scenarios to get the inner text of the element