Skip to content
The TechnoTreat
  • Home
  • Contact

Capturing background requests with Puppeteer

By Thetechnotreat

November 16, 2020in Tutorial, Web crawling 0 comments

  • ajax
  • crawling
  • Curl
  • Nodejs
  • Puppeteer
  • scraping
  • Selenium
  • Wget
Spread the love

Puppeteer can be very handy for getting background requests. There could be several background requests sent by a website. Curl, wget or selenium can not get you all those requests. Using Puppeteer we can collect and filter all requests. Below is the code snippet to get cloudflare urls sent by flightrader24.com

const puppeteer = require('puppeteer')
try {
	(async () => {
		const browser = await puppeteer.launch();
		const page = await browser.newPage();
		var cloudflare_urls = [];
		page.on('request', request => {
			if(request.url().includes('cloudflare')) {
				cloudflare_urls.push(request.url());
			}
		});
		await page.goto('https://www.flightradar24.com/data/airlines/arz');
		await browser.close();
		console.log(cloudflare_urls);
	})()
} catch (err) {
	console.error(err);
}

In the above Puppeteer snippet we are visiting url ‘https://www.flightradar24.com/data/airlines/arz’ and collecting all urls sent to Cloudflare.

[
  'https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css',
  'https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/css/bootstrap.min.css',
  'https://cdnjs.cloudflare.com/ajax/libs/toastr.js/2.1.4/toastr.min.css',
  'https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.24.0/moment.min.js',
  'https://cdnjs.cloudflare.com/ajax/libs/moment-duration-format/2.2.2/moment-duration-format.min.js',
  'https://cdnjs.cloudflare.com/ajax/libs/URI.js/1.19.1/URI.min.js',
  'https://cdnjs.cloudflare.com/ajax/libs/localforage/1.4.3/localforage.min.js',
  'https://cdnjs.cloudflare.com/ajax/libs/highcharts/4.1.10/highcharts.js',
  'https://cdnjs.cloudflare.com/ajax/libs/toastr.js/2.1.4/toastr.min.js',
  'https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/fonts/fontawesome-webfont.woff2?v=4.7.0'
]

This could be very handy to know the ajax or any other background requests sent by a website.

  • Previous Post
    Web scraping with Python 3, Requests and Beautifulsoup (bs4)

Reply or Comment Cancel reply

Your email address will not be published. Required fields are marked *

*
*

Search

Follow me

Recent Posts

  • Capturing background requests with Puppeteer
  • Web scraping with Python 3, Requests and Beautifulsoup (bs4)
  • James Donkey 008 Tactical Master Gaming – Headphone For PUBG Mobile
  • Importing bulk content into WordPress – part 2
  • Importing bulk content into WordPress – Part 1

Archives

  • November 2020
  • July 2019
  • June 2019
  • May 2019
© Copyright thetechnotreat.com 2019