Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Overview

URL-Collector url采集器

Usage

NAME:
   URL-Collector - Collect URLs based on dork

USAGE:
   url-collector

VERSION:
   v0.2

AUTHOR:
   无在无不在 <[email protected]>

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --input value, -i value          input from a file
   --output value, -o value         specify the output file
   --engine value, -e value         specify the search engine(google,bing,baidu,google-image) (default: "google-image")
   --routine-count value, -r value  specify the count of goroutine (default: 5)
   --keyword value, -k value        specify the keyword
   --config-file value, -c value    specify the config file
   --format value, -f value         specify output format(url、domain、protocol_domain) (default: "url")
   --help, -h                       show help (default: false)
   --version, -v                    print the version (default: false)
ERRO[0000] specify -f or -k please         

Screenshot

avatar

Demo

#根据指定关键字采集
url-collector -k ".php?id=" 
#批量采集文件中所有的关键字,并将结果保存到result.txt
url-collector -f google-dork.txt -o result.txt
#和sqlmap联动
url-collector -f google-dork.txt -o result.txt && python3 sqlmap.py -m result.txt --batch --random-agents
#默认采用google镜像站点,如果你可以访问外网,可以手动指定搜索引擎为google
url-collector -e google -k ".php?id="
#百度url采集
url-collector -e baidu -k ".php?id=1"
#将常用配置写到配置文件中
url-collector -c config.json
#输出域名
url-collector -e baidu -k ".php?id=1" -f domain
#输出协议加域名
url-collector -e baidu -k ".php?id=1" -f protocol_domain

config.json 格式

{
    "output_file_path":"",
    "input_file_path":"",
    "keyword":"inurl:.php?id=",
    "search_engine":"google-image",
	"format":"url",
    "base_url":{
		"google":       "https://www.google.com/search?q=$keyword",
		"google-image": "https://g.luciaz.me/search?q=$keyword",
		"bing":         "https://cn.bing.com/search?q=$keyword",
		"baidu":        "https://www.baidu.com/s?wd=$keyword"	
	},
    "routine_count":5,
    "black_list":[
		"gov",
		"baidu.com",
		"cache.baiducontent.com",
		"g3.luciaz.me",
		"www.youtube.com",
		"gitee.com",
		"github.com",
		"stackoverflow.com",
		"developer.aliyun.com",
		"cloud.tencent.com",
		"www.zhihu.com/question",
		"blog.51cto.com",
		"zhidao.baidu.com",
		"www.cnblogs.com",
		"coding.m.imooc.com",
		"weibo.cn",
		"www.taobao.com",
		"www.google.com",
		"go.microsoft.com",
		"facebook.com",
		"blog.csdn.net",
		"books.google.com",
		"policies.google.com",
		"webcache.googleusercontent.com",
		"translate.google.com"
    ]
}

google-dork demo

<<<<<<< HEAD avatar

=======

18300e2f68cfa2602849ad50a26f14965b7858d2

You might also like...
A crawler/scraper based on golang + colly, configurable via JSON

Super-Simple Scraper This a very thin layer on top of Colly which allows configuration from a JSON file. The output is JSONL which is ready to be impo

Golang based web site opengraph data scraper with caching
Golang based web site opengraph data scraper with caching

Snapper A Web microservice for capturing a website's OpenGraph data built in Golang Building Snapper building the binary git clone https://github.com/

High-performance crawler framework based on fasthttp.

predator / 掠食者 基于 fasthttp 开发的高性能爬虫框架 使用 下面是一个示例,基本包含了当前已完成的所有功能,使用方法可以参考注释。 1 创建一个 Crawler import "github.com/go-predator/predator" func main() {

Fabric-Batch-Chaincode (FBC) is a library that enables batch transactions in chaincode without additional trusted systems.

Fabric-Batch-Chaincode Fabric-Batch-Chaincode (FBC) is a library that enables batch transactions in chaincode without additional trusted systems. Over

GoBatch is a batch processing framework in Go like Spring Batch in Java
GoBatch is a batch processing framework in Go like Spring Batch in Java

GoBatch English|中文 GoBatch is a batch processing framework in Go like Spring Batch in Java. If you are familiar with Spring Batch, you will find GoBat

A productivity tools to diagnose list of exported URL status from Google Search Console, Analytics, Sitemap URL...etc.

google-url-checker A productivity tools to diagnose list of exported URL status from Google Search Console, Analytics, Sitemap URL...etc. A quick way

Syntax-aware Go code search, based on the mvdan/gogrep
Syntax-aware Go code search, based on the mvdan/gogrep

gogrep WIP: this is an attempt to move modified gogrep from the go-ruleguard project, so it can be used outside of the ruleguard as a library. Acknowl

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications
Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.

:steam_locomotive: Decodes url.Values into Go value(s) and Encodes Go value(s) into url.Values. Dual Array and Full map support.

Package form Package form Decodes url.Values into Go value(s) and Encodes Go value(s) into url.Values. It has the following features: Supports map of

Search for Go code using syntax trees

gogrep GO111MODULE=on go get mvdan.cc/gogrep Search for Go code using syntax trees. Work in progress. gogrep -x 'if $x != nil { return $x, $*_ }' In

Search for HCL(v2) using syntax tree

hclgrep Search for HCL(v2) using syntax tree. The idea is heavily inspired by ht

Fast, Docker-ready image processing server written in Go and libvips, with Thumbor URL syntax

Imagor Imagor is a fast, Docker-ready image processing server written in Go. Imagor uses one of the most efficient image processing library libvips (w

A CLI tool that generates OpenTelemetry Collector binaries based on a manifest.

OpenTelemetry Collector builder This program generates a custom OpenTelemetry Collector binary based on a given configuration. TL;DR $ go get github.c

Metrics collector and ebpf-based profiler for C, C++, Golang, and Rust

Apache SkyWalking Rover SkyWalking Rover: Metrics collector and ebpf-based profiler for C, C++, Golang, and Rust. Documentation Official documentation

Go session management for web servers (including support for Google App Engine - GAE).

Session The Go standard library includes a nice http server, but unfortunately it lacks a very basic and important feature: HTTP session management. T

Article spinning and spintax/spinning syntax engine written in Go, useful for A/B, testing pieces of text/articles and creating more natural conversations

GoSpin Article spinning and spintax/spinning syntax engine written in Go, useful for A/B, testing pieces of text/articles and creating more natural co

Kakoune syntax highlighting for the Godot Engine / Godot Scripting Language gdscript
Kakoune syntax highlighting for the Godot Engine / Godot Scripting Language gdscript

gdscript-kak Kakoune syntax highlighting for the Godot Engine / Godot Scripting Language gdscript. Adds basic syntax highlighting to your .gd files fo

Tool that can parse Go files into an abstract syntax tree and translate it to several programming languages.
Tool that can parse Go files into an abstract syntax tree and translate it to several programming languages.

GoDMT GoDMT, the one and only Go Data Model Translator. The goal of this project is to provide a tool that can parse Go files that include var, const,

Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression

NFP (Number Format Parser) Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression. Installatio

Comments
  • 使用谷歌高级语法时的bug,另外再提点小建议

    使用谷歌高级语法时的bug,另外再提点小建议

    bug内容 运行-k "你好 site:.com"时会立刻结束程序,运行 -k "你好site:.com" 则正常

    需要添加的功能内容

    1. 添加debug输出,在出现这个bug时我一直以为我的ip被限制了,后来排查才发现是-k的问题
    2. 添加单行显示采集url的功能,可通过覆盖输出达成
    3. 添加显示当前的采集进度
    opened by AdiEcho 1
  • go编译失败

    go编译失败

    编译不成功

    [root@centos-s-1vcpu-1gb-sfo3-01 url-collector]# go build

    go: github.com/spf13/viper@v1.9.0 requires github.com/sagikazarmark/crypt@v0.1.0 requires google.golang.org/grpc@v1.40.0 requires github.com/golang/protobuf@v1.4.3 requires google.golang.org/protobuf@v1.23.0 requires github.com/golang/protobuf@v1.4.0 requires google.golang.org/protobuf@v1.21.0 requires github.com/golang/protobuf@v1.4.0-rc.4.0.20200313231945-b860323f09d0: invalid version: git fetch --unshallow -f origin in /root/go/pkg/mod/cache/vcs/6e18cbff36266c74e48dd81b4b672026ac74fb69c838ddb6240f256bb8edf590: exit status 128: fatal: git fetch-pack: expected shallow list

    我的Go版本: [root@centos-s-1vcpu-1gb-sfo3-01 url-collector]# go version go version go1.16.13 linux/amd64

    opened by 3hxks 0
  • 一些建议

    一些建议

    1.代理功能 如果能增加指定代理功能,这样就可以在windows,指定使用多个国外的搜索引擎。

    2.config.json config.json中的"keyword" 是否可以修改成列表或者字典的形式,这样可以一次搜索多个关键词的结果。

    3.优化 可以指定一个规则文件(默认调用),增加指定网站域名的参数,这样指定域名(site),然后自动调用规则文件 搜索某个网站或目标的多个敏感信息泄露或有漏掉的点,这样比较符合渗透实战中对某个目标进行渗透测试的实际需求情况。 仅仅建议。没有其他意思。

    opened by komomon 1
Owner
Re
Re
Apollo 💎 A Unix-style personal search engine and web crawler for your digital footprint.

Apollo ?? A Unix-style personal search engine and web crawler for your digital footprint Demo apollodemo.mp4 Contents Background Thesis Design Archite

Amir Bolous 1.3k Nov 23, 2022
Go-site-crawler - a simple application written in go that can fetch contentfrom a url endpoint

Go Site Crawler Go Site Crawler is a simple application written in go that can f

Shane Grech 1 Feb 5, 2022
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

胡聪 1.8k Nov 14, 2022
Multiplexer: HTTP-Server & URL Crawler

Multiplexer: HTTP-Server & URL Crawler Приложение представляет собой http-сервер с одним хендлером. Хендлер на вход получает POST-запрос со списком ur

Alexey Khan 1 Nov 3, 2021
Crawls web pages and prints any link it can find.

crawley Crawls web pages and prints any link it can find. Scan depth (by default - 0) can be configured. features fast SAX-parser (powered by golang.o

Alexei Shevchenko 111 Nov 26, 2022
DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

dorkscout dokrscout is a tool to automate the finding of vulnerable applications or secret files around the internet throught google searches, dorksco

R4yan 190 Nov 21, 2022
Interact with Chromium-based browsers' debug port to view open tabs, installed extensions, and cookies

WhiteChocolateMacademiaNut Description Interacts with Chromium-based browsers' debug port to view open tabs, installed extensions, and cookies. Tested

Justin Bui 92 Nov 2, 2022
Collyzar - A distributed redis-based framework for colly.

Collyzar A distributed redis-based framework for colly. Collyzar provides a very simple configuration and tools to implement distributed crawling/scra

Zarten 210 Nov 22, 2022
High-performance crawler framework based on fasthttp

predator / 掠食者 基于 fasthttp 开发的高性能爬虫框架 使用 下面是一个示例,基本包含了当前已完成的所有功能,使用方法可以参考注释。

null 15 May 2, 2022
A crawler/scraper based on golang + colly, configurable via JSON

A crawler/scraper based on golang + colly, configurable via JSON

Go Tripod 15 Aug 21, 2022