What is node?
To make scraping simple, NDS provides 3 kinds of Node. Transit, List and Detail. You can define a complete recipe with the combination of these nodes.
Transit node is a container of actions. You can add actions here and they will be executed sequentially. Usually we use Transit node to open url, submit search or make preparation for the next node.
Transit node executes actions one by one from top to bottom. When encounting action '>Enter Next Node Here<', Transit will enter the next node immediately, and continues to execute the remaining actions after all following nodes processed. If there is no '>Enter Next Node Here<' in the action list, Transit will enter the next node automatically after all actions executed.
List node is to handle repeating data scraping.
- rows on structured table
- items on eCommerce list page
- entries on search result page
All these pages have multiple blocks with similar structure or layout, and each block contains similar fields, such as title, price, brief description etc. Following is a Google Map screenshot:
Here each resturant highlighted in left is a block, and the restaurant name in the block is highlighted as a field in the right.
The website usually load more blocks by scrolling mouse down or clicking page turning button.
List node has 3 tabs: Data, Pages and Navigation.
- Data Tab
to declare block and fields, and any actions to be executed before each block processed
- Pages Tab
to declare how to turn pages or scroll mouse to load more blocks, and actions to be executed before new page/list loaded
- Navigation Tab
to declare how to navigate to the next node
Different from List node, Detail node scrapes one-time content from the current page.
In List and Detail node, you can link the current page with the node via the pin icon () after Node Name, and browser will load the linked page automatically when you switch to the node via click or navigation.