GitHub
ESC

Spider

Traditional HTTP-based web crawler.

Accessor: client.spider

Actions

Method Description
scan(url, context_name, subtree_only, in_scope) Start spidering
scan_as_user(context_name, user_name, url, subtree_only) Spider as user
pause(scan_id) Pause spider
pause_all Pause all spiders
resume(scan_id) Resume spider
resume_all Resume all
stop(scan_id) Stop spider
stop_all Stop all spiders
exclude_from_scan(regex) Exclude URL pattern
clear_excluded_from_scan Clear exclusions
add_allowed_resource(regex, enabled) Allow resource pattern
remove_allowed_resource(regex) Remove allowed resource
remove_all_allowed_resources Clear allowed resources

Options

Method Description
set_option_max_depth(depth) Max crawl depth
set_option_max_children(max) Max children per node
set_option_max_duration(minutes) Max duration in minutes
set_option_max_parse_size_bytes(bytes) Max response parse size
set_option_user_agent(ua) Custom User-Agent
set_option_request_wait_time(ms) Delay between requests
set_option_parse_comments(bool) Parse HTML comments
set_option_parse_robots_txt(bool) Parse robots.txt
set_option_post_form(bool) Submit POST forms
set_option_process_form(bool) Process forms
set_option_send_referer_header(bool) Send Referer header

Views

Method Description
status(scan_id) Spider progress (0-100)
results(start, count) Discovered URLs
full_results Complete results
number_of_results URL count
excluded_from_scan Exclusion list
allowed_resources Allowed resources
option_max_depth Current max depth
option_max_children Current max children
option_max_duration Current max duration
option_user_agent Current User-Agent

Example

result = client.spider.scan(url: "http://target.com")
scan_id = result["scan"].as_s.to_i

loop do
  progress = client.spider.status(scan_id)["status"].as_s.to_i
  break if progress >= 100
  sleep 2.seconds
end

urls = client.spider.results