Keywords search and scraping
Up to now, we have learned how to do list scraping and deep scraping with list and details.
More often, we scrape the search result list on the target website. For example:
- scraping Los Angeles's sushi restaurants on Yelp
- scraping San Jose's hotel on Google Map
- scraping Bing's search result list
- scraping Surface Pro's price on Amazon
All these data scraping tasks start from query, and then scrape data from the search result list, either simple list scraping or deep scraping.
This section shows how to make the whole query and scraping automatically with NDS.
Here we still take Amazon for example:
Here we want to input the query keywords, submit the query, and then scrape the search result.
So what we need to do is add additional actions on the Start Transit node , and then all nodes on Multiple paginated list scraping are reusable.
Step 0: open the foregoing Amazon url, click NDS icon, click 'Pro Scrape' on the popup window, and select 'Search and Scrape Result' scattfold:
The default template adds two actions after open URL.
'Input Text' with 3 parameters:
- target input element
- keywords to input
- a star to configure argument, we will introduce it later
'Click' with 1 parameter:
- target element to click
Once all three actions configured, each time, NDS will excute them one by one:
- open the URL
- type the keywords in the query input box
- click the query button
- and then go to next List node
Here we will find that all remaining steps are identical as that described in Multiple paginated list scraping section.
The recipe makes query and scraping automatic, but we would like to make query more flexible, for example,
- can we scrape a list of URL if we already has such a list?
- can we change the keywords when starting the recipe?
- can we scrape the result of a list of keywords?
- can we scrape the result of a list of keywords, but the number of pages to scrape for each keywords is different?
- can we simulate multiple input, such as keywords and location, to search, and then scrape the result list?
We will answer these questions in the section Batch scraping