Webcrawler collector

Contents

Introduction

The webcrawler collector is included with Explorer as a standard.

Configuration

The webcrawler collector has three configuration options.

Uri

This is the uri at which the collector will start crawling. This is usually a domain name.

Example:

https://www.example.com

Exclude patterns

A list of Regular expressions which are matched against the full url of each link found during crawling. If any link matches any exclude pattern it will be skipped.

Example:

^https://www.example.com/whatever.*$

Custom User-Agent

Some webservers handle user-agents differently. Here you can pretend to be a specific browser or robot. If you leave the field empty it will use a default useragent.


Contents

Contents