Extract endpoints marked as disallow in robots files to generate wordlists.

Overview

roboXtractor

This tool has been developed to extract endpoints marked as disallow in robots.txt file. It crawls the file directly on the web and has a wayback machine query mode (1 query for each of the previous 5 years).

Possible uses of roboXtractor:

  • Generate a customized wordlist of endpoints for later use in a fuzzing tool (-m 1).
  • Generate a list of URLs to visit (-m 0).


🛠️ Installation

If you want to make modifications locally and compile it, follow the instructions below:

> git clone https://github.com/Josue87/roboxtractor.git
> cd roboxtractor
> go build

If you are only interested in using the program:

> go get -u github.com/Josue87/roboxtractor

Note If you are using version 1.16 or higher and you have any errors, run the following command:

> go env -w GO111MODULE="auto"

🗒 Options

The flags that can be used to launch the tool:

Flag Type Description Example
u string URL to extract endpoints marked as disallow in robots.txt file. -u https://example.com
m uint Extract URLs (0) // Extract endpoints to generate a wordlist (>1 default) -m 1
wb bool Check Wayback Machine. Check 5 years (Slow mode) -wb
v bool Verbose mode. Displays additional information at each step -v
s bool Silen mode doesn't show banner -s

You can ignore the -u flag and pass a file directly as follows:

cat urls.txt | roboxtractor -m 1 -v

Only the results are written to the standard output. The banner and information messages with the -v flag are redirected to the error output,

👾 Usage

The following are some examples of use:

roboxtractor --help
cat urls.txt | roboxtractor -m 0 -v
roboxtractor -u https://www.example.com -m 1 -wb
cat urls.txt | roboxtractor -m 1 -s > ./customwordlist.txt
cat urls.txt | roboxtractor -s -v | uniq > ./uniquewordlist.txt
echo http://example.com | roboxtractor -v
echo http://example.com | roboxtractor -v -wb

🚀 Examples

Let's take a look at some examples. We have the following file:

image

Extracting endpoints:

image

Extracting URLs:

image

Checking Wayback Machine:

image

Github had many entries in the file, which were not useful, a cleaning process is done to avoid duplicates or entries with *. Check the following image:

image

For example:

  • /gist/\*/\*/\* is transformed as gist.
  • /\*/tarball is trasformed as tarball.
  • /, /* or similar entries are removed.

🤗 Thanks to

The idea came from a tweet from @remonsec that did something similar in a bash script. Check the tweet.

You might also like...
Encrypt embedded go files using age.

encembed Encrypt embedded resource in compiled binary using age. Meant for usage with go generate. This tool will generate a go source file that embed

A Go-based program to find links from the list of Js files.
A Go-based program to find links from the list of Js files.

linkinjs - A Go based program to find links from list of Js files quickly Installation go get -u github.com/rc4ne/linkinjs Efficient Usage with other

Scans files for .jars potentially vulnerable to Log4Shell (CVE-2021-44228) by inspecting the class paths inside the .jar.

log4shelldetect Scans a file or folder recursively for jar files that may be vulnerable to Log4Shell (CVE-2021-44228) by inspecting the class paths in

Golang package for reading FoxPro DBF/FPT files.
Golang package for reading FoxPro DBF/FPT files.

go-foxpro-dbf Golang package for reading FoxPro DBF/FPT files. This package provides a reader for reading FoxPro database files. At this moment it is

Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228)
Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228)

Look4jar Look for JAR files that vulnerable to Log4j RCE (CVE‐2021‐44228) Objectives It differs from some other tools that scan for vulnerable remote

Gotator is a tool to generate DNS wordlists through permutations.
Gotator is a tool to generate DNS wordlists through permutations.

Gotator is a tool to generate DNS wordlists through permutations.

Endpoints-operator - Kubernetes endpoints balance for outsite apiserver

endpoints-operator 对于集群内访问集群外部服务场景使用固定的endpoint维护增加探活功能 背景 在实际使用中,两个K8s集群内的服务经常有

cli tools for list all pages in logseq repo, marked with public or private

logseq-pages A cli tool for list all pages in logseq repo, marked with public or private. When I using logseq to build my knowledge base and publish p

Simple application written in Go that combines two wordlists and a list of TLDs to form domain names and check if they are already registered.

Domainerator Domainerator was my first Go application. It combines two wordlists (prefixes and suffixes) and a list of TLDs to form domain names and c

grobotstxt is a native Go port of Google's robots.txt parser and matcher library.

grobotstxt grobotstxt is a native Go port of Google's robots.txt parser and matcher C++ library. Direct function-for-function conversion/port Preserve

gup aka Get All Urls parameters to create wordlists for brute forcing parameters.
gup aka Get All Urls parameters to create wordlists for brute forcing parameters.

Description GUP is a tool to create wrodlists from the urls. Purpose The purpose of this tool is to create wordlists for brute forcing parameters. Ins

Small program that takes in commands and moves one or more robots around the surface of Mars!

Mars Rover Build and Run the Image Build image from current directory: docker build -t marsrover . Run image interactively: docker run -i marsrover

Wise-mars-rover - Write a program that takes in commands and moves one or more robots around the surface of Mars

wise-mars-rover Write a program that takes in commands and moves one or more rob

Generate boilerplate + endpoints for Fiber REST APIs.

gomakeme Generate boilerplate + endpoints for Fiber REST APIs. Never spend 6 minutes doing something by hand when you can spend 1 week to automate it

Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

kubectl-slice: split Kubernetes YAMLs into files kubectl-slice is a neat tool that allows you to split a single multi-YAML Kubernetes manifest into mu

Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

go-exiftool go-exiftool is a golang library that wraps ExifTool. ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS,

Easily create & extract archives, and compress & decompress files of various formats

archiver Introducing Archiver 3.1 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CL

Owner
Josué Encinar
Offensive Security Engineer
Josué Encinar
Take a list of domains and scan for endpoints, secrets, api keys, file extensions, tokens and more...

Take a list of domains and scan for endpoints, secrets, api keys, file extensions, tokens and more... Coded with ?? by edoardottt. Share on Twitter! P

gilfoyle97 654 Dec 25, 2022
erchive is a go program that compresses and encrypts files and entire directories into .zep files (encrypted zip files).

erchive/zep erchive is a go program that compresses and encrypts files and entire directories into .zep files (encrypted zip files). it compresses usi

Christopher Walters 1 May 16, 2022
crowdsec 5.9k Jan 5, 2023
Generate mega-workflows using Wappalyzer outputs and existing tech-detect

Usage Usage of ./build/generate-nuclei-templates: -clone-path string Path to clone Wappalyzer repository (default "./wappalyzer") -debug

null 6 Nov 9, 2022
Generate client secret for Apple get token call

Generate client secret for Apple get token call A util to generate client secret used in Apple get token call. Create a config.json file with the foll

Yu Ke 0 Jan 6, 2022
Generate self-signed, trusted certificates for local development.

Development Certificates Generator devcert takes away the pain of creating self-signed certificates for development manually. Usage $ devcert my-proje

Primal Skill 11 Dec 13, 2022
Simple CLI to generate passwords

pwdrng Simple CLI to generate passwords $ pwdrng Copied password to clipboard: bfx861[X<26-b^UT Installation and Usage With Homebrew $ brew tap docto

Perry 2 Apr 8, 2022
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptograp

American Express 519 Dec 10, 2022
Encrypt your files or notes by your GPG key and save to MinIO or Amazon S3 easily!

Super Dollop Super Dollop can encrypt your files and notes by your own GPG key and save them in S3 or minIO to keep them safe and portability, also yo

Nedim AKAR 58 Jul 11, 2022
2FA (Two-Factor Authentication) application for CLI terminal with support to import/export andOTP files.

zauth zauth is a 2FA (Two-Factor Authentication) application for terminal written in Go. Features Supports both TOTP and HOTP codes. Add new entries d

Rijul Gulati 73 Nov 27, 2022