HomeTutorsContact

How to get the text of element using puppeteer?

By Gulshan Saini
Published in Puppeteer
September 25, 2020
2 min read

Puppeteer is the NodeJs library that provides API to automate Chrome or Chromium browsers. It can be used to get the inner text of any element on the page however the approach differs slightly for the individual type of elements.

Let’s explore how we can scrape the inner text of headings, links, paragraphs, list, table, button, input, text area elements using puppeteer.

We will be using the following test page which contains all types of HTML elements.

https://gulshansainis.github.io/portfolio/

Boilerplate code

I will be using below code as starting point

Getting Heading text

First we will get the text of h1 heading which is displayed inside Hero element

The selector of h1 element is "body > div > div > h1"

To get element of heading we need to use element.textContent method

After, you put above code just below comment // rest of code goes below and execute the file using node index.js command, it should output text Hey 👋, I'm Gulshan Saini on terminal console.

Getting link text is very much similar to getting the text of the heading. We will be getting the text of the first item in the navigation list i.e. Portfolio at the time of writing.

The CSS selector of element is #nav-menu > li:nth-child(1)

Once, you save above code and run the script again you should see text Portfolio on console.

Scraping paragraph element text

Next, we will be targetting paragraph element displayed inside Hero element

The CSS selector of paragraph is body > div > div > p

You should get below output after saving above code and running the index.js

Getting the text of all list elements

So far everything was simple and the technique was common to get the text. Let’s now see how we can iterate over list elements and print individual item text

We will be selecting the list items in the Services section having selector #services ul li

So to get innerText is like following,

Let’s understand what is happening here

  • document.querySelectorAll('#services ul li') selects all nodes
  • Array.from converts all nodes to array list as document.querySelectorAll returns, NodeList instead of array

We first get all elements using page.evaluate method which captures all the nodes using Array.from method. Array.from takes NodeList of matching selector i.e. document.querySelectorAll("#services ul li").

The services variable holds the inner text of all list elements is order they were present on page. After saving above code you should see below output on console

Getting the text of input element

Getting the text of input element or input element of type submit i.e. button works differently as the text is contained inside the value attribute

To get the text of the input element we need to use element.value instead of element.textContent.

We will be using the below code to get the text of the input button that is of type submit

Once you save the above code and run node index.js, this should return the inner text of the button i.e. Submit

Final code & Test

Below is the final code which contains all the scenarios to get the inner text of the element


Tags

#puppeteer
Previous Article
How to launch Chrome browser from command line?

Gulshan Saini

Fullstack Developer

Subscribe to our newsletter!

We'll send you the best of our blog just once a month. We promise.

Related Posts

How to launch the Firefox browser using puppeteer?

July 05, 2020
1 min
© 2020, All Rights Reserved.

Quick Links

Advertise with usContact Us

Social Media