Spider

Traditional HTTP-based web crawler.

Accessor: client.spider

Actions

Method	Description
`scan(url, context_name, subtree_only, in_scope)`	Start spidering
`scan_as_user(context_name, user_name, url, subtree_only)`	Spider as user
`pause(scan_id)`	Pause spider
`pause_all`	Pause all spiders
`resume(scan_id)`	Resume spider
`resume_all`	Resume all
`stop(scan_id)`	Stop spider
`stop_all`	Stop all spiders
`exclude_from_scan(regex)`	Exclude URL pattern
`clear_excluded_from_scan`	Clear exclusions
`add_allowed_resource(regex, enabled)`	Allow resource pattern
`remove_allowed_resource(regex)`	Remove allowed resource
`remove_all_allowed_resources`	Clear allowed resources

Options

Method	Description
`set_option_max_depth(depth)`	Max crawl depth
`set_option_max_children(max)`	Max children per node
`set_option_max_duration(minutes)`	Max duration in minutes
`set_option_max_parse_size_bytes(bytes)`	Max response parse size
`set_option_user_agent(ua)`	Custom User-Agent
`set_option_request_wait_time(ms)`	Delay between requests
`set_option_parse_comments(bool)`	Parse HTML comments
`set_option_parse_robots_txt(bool)`	Parse robots.txt
`set_option_post_form(bool)`	Submit POST forms
`set_option_process_form(bool)`	Process forms
`set_option_send_referer_header(bool)`	Send Referer header

Views

Method	Description
`status(scan_id)`	Spider progress (0-100)
`results(start, count)`	Discovered URLs
`full_results`	Complete results
`number_of_results`	URL count
`excluded_from_scan`	Exclusion list
`allowed_resources`	Allowed resources
`option_max_depth`	Current max depth
`option_max_children`	Current max children
`option_max_duration`	Current max duration
`option_user_agent`	Current User-Agent

Example

result = client.spider.scan(url: "http://target.com")
scan_id = result["scan"].as_s.to_i

loop do
  progress = client.spider.status(scan_id)["status"].as_s.to_i
  break if progress >= 100
  sleep 2.seconds
end

urls = client.spider.results