How to include or avoid URLs and paths in Application Scanning

One of the ways to allow more control over Application Scanning coverage is to define specific paths on the asset that should or should not be assessed during the scan. Under Application Scanning settings, you can specify included and avoided paths/URLs

  • Use included paths/URLs to specify root-relative paths that should be assessed during the scan.
    While Application Scanning can discover pages that are accessible from the main page, as well as those that are hidden, we can’t guarantee we find and assess all potential URLs. By specifying included paths you set paths that are mandatory for the scanner to assess.

Ex. including path /secret-admin-panel/ for scan profile example.com means the scan will assess the http(s)://example.com/secret-admin-panel/ URL, and discover all additional content that’s linked from that page.


Note that limits of Application Scan still apply. Hence if the scope of the scan is too large, the scan may not cover the specified URL.

  • Use avoided paths/URLs to specify root-relative paths that should not be assessed during the scan.
    If you’re aware of any pages or other content that Application Scanning shouldn’t tamper with in any manner, or take too much scanning time (ex. blog posts, product listing), these can be easily avoided.

Ex. avoiding path /blog for scan profile example.com means the scan won’t cover any URL starting with http(s)://example.com/blog

  • Use included and avoided paths/URLs in combination to assess a specific part of the web application.

Ex. disallowing path /, while allowing path /blog/ for scan profile example.com means the scan will only cover URLs starting with http(s)://example.com/blog/, and even won’t assess the root page (http(s)://example.com/) of the website.

Ex. disallowing path /items/, while allowing path /items/1001 could be a useful combination if you have various product pages with the same structure under /items/, and you want to avoid the scan taking time to map out all those pages, and instead just want to assess only one.

In order to see which URLs were assessed during the scan, we provide a finding titled “Crawled URLs” that contains crawled URLs, You can find this in the Scan Reports page.

Settings up included/avoided paths/URLs

Under Application Scanning settings, use “Which paths/URLs must we include?” to add one or more included URLs. Use “Which paths/URLs must we avoid?” for paths you don't want us to touch.


Using wildcards

Both included and disallowed paths support asterisk wildcards in case you want to apply a rule for a group of pages.

Ex. the website provides product details under paths /product/5/details/product/6/details, etc. To set all of these as included/avoided paths/URLs, you can use /product/*/details

Note that asterisks are implicitly included in all paths by default. Using /admin as allowed/disallowed path is equivalent with using /admin*. When using asterisk explicitly will remove this functionality, for example using /product/*/details is not the same as using /product/*/details*.

There is no limit to wildcards you can include for the path patterns. As an example, adding blog/*/guestblog/*/details would be supported as well.

FAQ

Q: If I want to add a Recorded Login, would it be affected by the included/avoided paths setting?

A: Yes, all the Application Scan Settings including the list of included/avoided paths will be passed to the scanner at the very beginning of the scan. This means when replaying the recorded scenarios, the settings provided by you will be respected.

Q: Can I avoid scanning dynamic URLs?

A: No, it is currently impossible to avoid dynamic URLs from scanning. 

Q: Will avoiding /language/select also avoid en/language/select?

A: No, if you wish to avoid /en/language/select you would need to include the beginning part of the path as well (which must start with a leading slash). Here the following path would need to be avoided: /en/language/select

Q: Can I avoid a subdomain together with its path, e.g. shop-prod.example.com/products?

A: No, in this case you would need to avoid the entire /products path for any subdomain or add the subdomain shop-prod to the list of disallowed subdomains:

Q: Can I use a regular expression to include/avoid paths?

A: Regular expressions are currently not supported.

Q: Is there any difference between disallowing /product and /product/?

A: As the specified path is matched from the start of the root-relative URL, including /product will disallow any path starting with such text, ex. /product//products/productimages/, while in case of disallowing /product/ will still allow us to scan /products/productimages/ if such paths exist on the website.

Q: If I disallow '/' and only allow '/products/', will that allow everything that starts with '/products/'? Or do I have to add an additional path If I want the scanner to go to e.g. /products/notebooks/example-notebook?

A: Allowing the path /products/ should be enough. If you however will still see these paths missing in your Crawled URLs finding after running a new scan, you can try adding the /products/notebooks/ to the list of allowed URLs.

Q: Is there a difference between allowing products/notebooks and products/*?

A: That depends on what you want to achieve.

If you want us to assess /products/notebooks specifically, allowing exactly this path /products/notebooks will be fine. If you however want to allow anything under /products you can go for a pattern (/products/*).

As an example, to allow e.g. products/notebooks/A4 as well as /accounts/sketchbooks/A4 would be /accounts/*/A4.