Scraping speedup

as a browser extension, NDS use browser's features to do data scraping. So any actions that make browser's page loading faster, will help to make NDS scraping faster too.

Speed up scraping by making page loading faster

  • block advertisments
  • block images/videos if you do not want to scrape them
  • block Javascript if you want to scrape static content only

There are many extensions on web store which help you to make it easily.

Except these methods, there are some other scenario:

  • scrape a search result and continue to do deep scraping for each item, the forward and backward navigation is time consuming
  • scrape a list of URL with same structure, can we scrape mutilple URLs in parallel?

For the first scenario, we can split the deep scraping recipe into several small one, then integrate them into workflow, and the whole deep scraping process is still automatic.

Speed up scraping by spliting a complicate recipe into small ones

  • Step 1: create a recipe to scrape search result items only. The recipe simulate search and do pagination to scrape each item's basic information and detail URL address.

  • Step 2: create another recipe to accept the first recipe's output table as input table, open detail URL in the input table one by one, and scrape details for each item. More details on how to create a detail scraping, please refer to Detail page scraping
  • Step 3: update the first recipe's global trigger to start the second recipe one the first recipe done. More details on how to integrate recipes into workflow, please refer to Recipe workflow

How we have split a complicate recipe into several small ones. Here we find the second recipe is executed repeatedly for each URL. Next we will show how to make such repeating scraping in parallel.

Speed up scraping by running mulitple recipe instances in parallel

If only a recipe accept argument input, no matter URL, keywords, or multiple arguments, NDS can accelerate it by parallel execution in your own browser.

What you need to do is easy: input multiple argument(s) line or specify a argument input table; specify how many parallel instance to run when starting the recipe;

More details please refer to Batch scraping

The video demonstrates how to speed up scraping by runing multiple instances.