How to include or avoid URLs and paths in Application Scanning

One of the ways to allow more control over Application Scanning coverage is to define specific paths on the asset that should or should not be assessed during the scan. Under Application Scanning settings, you can specify included and avoided paths/URLs

  • Use included paths/URLs to specify root-relative paths that should be assessed during the scan.
    While Application Scanning can discover pages that are accessible from the main page, as well as those that are hidden, we can’t guarantee we find and assess all potential URLs. By specifying included paths you set paths that are mandatory for the scanner to assess.

Ex. including path /secret-admin-panel/ for scan profile example.com means the scan will assess the http(s)://example.com/secret-admin-panel/ URL, and discover all additional content that’s linked from that page.

For including a massive number of paths in the scan, you can also use forced browsing, which allows you to upload a list of relative paths to include in the scan.

Note that limits of Application Scan still apply. Hence if the scope of the scan is too large, the scan may not cover the specified URL.

  • Use avoided paths/URLs to specify root-relative paths that should not be assessed during the scan.
    If you’re aware of any pages or other content that Application Scanning shouldn’t tamper with in any manner, or take too much scanning time (ex. blog posts, product listing), these can be easily avoided.

Ex. avoiding path /blog for scan profile example.com means the scan won’t cover any URL starting with http(s)://example.com/blog

  • Use included and avoided paths/URLs in combination to assess a specific part of the web application.

Ex. disallowing path /, while allowing path /blog/ for scan profile example.com means the scan will only cover URLs starting with http(s)://example.com/blog/, and even won’t assess the root page (http(s)://example.com/) of the website.

Ex. disallowing path /items/, while allowing path /items/1001 could be a useful combination if you have various product pages with the same structure under /items/, and you want to avoid the scan taking time to map out all those pages, and instead just want to assess only one.

Included and avoided paths/URLs are applied to all origins the scan covers. Ex. for the scan profile example.com if /admin is specified as included path, with the site having a subdomain blog.example.com, and both sites being accessible via HTTP and HTTPS protocols, the following URLs will be assessed:

http://example.com/admin

https://example.com/admin

http://blog.example.com/admin

https://blog.example.com/admin

To avoid this behavior for subdomains, the subdomain should be avoided.

In order to see which URLs were assessed during the scan, we provide a finding titled “Crawled URLs” that contains crawled URLs, whilst the finding “Forced Browsing” specifically reports on paths provided via Forced Browsing. Both of these you can find in the results.

Settings up included/avoided paths/URLs

Under Application Scanning settings, use “Which paths/URLs must we include?” to add one or more included URLs. Note that for a large range of URLs, consider using our Forced Browsing functionality. Use “Which paths/URLs must we avoid?” for paths you don't want us to touch. 

Setting up forced browsing for a Scan Profile

Under Application Scanning settings, you find “Use Forced Browsing to make sure we don't reach sensitive resources?”. Here, you can upload a file containing a list of root-relative paths (one per line), which are handled in the manner as included paths. You can upload multiple files.

Once files are uploaded, you can individually toggle which files are active during the scan, and thereby select which paths are taken into consideration.

This feature is only accessible to dedicated customers, please contact your customer support representative or support@detectify.com for more information.

Setting up forced browsing for a Scan Profile

Under Account settings, “Team” tab, you can find “Forced Browsing for all scan profiles”, which applies for all Scan Profiles, including ones that have additional Forced Browsing files uploaded. Use “Configure” and then “Upload wordlist file” to add forced browsing paths. You can upload multiple files.

Once files are uploaded, you can individually toggle which files are active during the scan, and thereby select which paths are taken into consideration.

This feature is only accessible to dedicated customers, please contact your customer support representative or support@detectify.com for more information.

Using wildcards

Both included and disallowed paths support asterisk wildcards in case you want to apply a rule for a group of pages.

Ex. the website provides product details under paths /product/5/details/product/6/details, etc. To set all of these as included/avoided paths/URLs, you can use /product/*/details

Note that asterisks are implicitly included in all paths by default. Using /admin as allowed/disallowed path is equivalent with using /admin*. When using asterisk explicitly will remove this functionality, for example using /product/*/details is not the same as using /product/*/details*.

There is no limit to wildcards you can include for the path patterns. As an example, adding blog/*/guestblog/*/details would be supported as well.

FAQ

Q: If I want to add a Recorded Login, would it be affected by the included/avoided paths setting?

A: Yes, all the Application Scan Settings including the list of included/avoided paths will be passed to the scanner at the very beginning of the scan. This means when replaying the recorded scenarios, the settings provided by you will be respected.

Q: Can I avoid scanning dynamic URLs?

A: No, it is currently impossible to avoid dynamic URLs from scanning. 

Q: Will avoiding /language/select also avoid en/language/select?

A: No, if you wish to avoid /en/language/select you would need to include the beginning part of the path as well (which must start with a leading slash). Here the following path would need to be avoided: /en/language/select

Q: Can I avoid a subdomain together with its path, e.g. shop-prod.example.com/products?

A: No, in this case you would need to avoid the entire /products path for any subdomain or add the subdomain shop-prod to the list of disallowed subdomains:

Q: Can I use a regular expression to include/avoid paths?

A: Regular expressions are currently not supported.

Q: Is there any difference between disallowing /product and /product/?

A: As the specified path is matched from the start of the root-relative URL, including /product will disallow any path starting with such text, ex. /product//products/productimages/, while in case of disallowing /product/ will still allow us to scan /products/productimages/ if such paths exist on the website.

Q: If I disallow '/' and only allow '/products/', will that allow everything that starts with '/products/'? Or do I have to add an additional path If I want the scanner to go to e.g. /products/notebooks/example-notebook?

A: Allowing the path /products/ should be enough. If you however will still see these paths missing in your Crawled URLs finding after running a new scan, you can try adding the /products/notebooks/ to the list of allowed URLs.

Q: Is there a difference between allowing products/notebooks and products/*?

A: That depends on what you want to achieve.

If you want us to assess /products/notebooks specifically, allowing exactly this path /products/notebooks will be fine. If you however want to allow anything under /products you can go for a pattern (/products/*).

As an example, to allow e.g. products/notebooks/A4 as well as /accounts/sketchbooks/A4 would be /accounts/*/A4.