What are parent/child pages? Why do they matter?

What are parent and child pages?

Knowing website structures can help you better understand what and how much data you’re scraping with Listly. Simply put, a parent page is a home page that has multiple child pages. Let’s say you’re collecting product information from Amazon. Amazon’s product listing page that previews multiple items at one could be a parent page and respective product detail pages could be child pages. Information on websites can be organized in flat or hierarchical structures, but in many cases you’d see the latter in e-commerce.

So why do they matter?

Listly’s group extraction results can be varied depending on a source URL, which is the URL for a single page on a website where web scraping starts. If the multiple web pages you'd like to scrape look like parent-child pages, you should be a little more careful to make sure that your data scraping will not fail when you do the group extraction.