Verify IP addresses of respectful crawlers like Googlebot by reverse dns and forward dns lookups

Related tags

Network goodbots
Overview

goodbots - trust but verify

goodbots verifies the IP addresses of respectful crawlers like Googlebot by performing reverse dns and forward dns lookups.

  1. Given an IP address (ex. 66.249.87.225)
  2. It performs a reverse dns lookup to get a hostname (ex. crawl-66-249-87-225.googlebot.com)
  3. Then does a forward dns lookup on the hostname to get an IP (ex. 66.249.87.225)
  4. It compares the 1st IP to the 2nd IP
  5. If they match, goodbots outputs the IP and hostname

The Job-to-be-Done (#jtbd)

In search engine optimization (SEO), it is common to analyze a site's access logs (aka bot logs). Often there are various requests by spoofed user-agents pretending to be official search engine crawlers like Googlebot. In order to have an accurate understanding of the site's crawl rate, we want to verify the IP address of the various crawlers.

Getting Started

How to install/build goodbots

Clone the repo:

git clone [email protected]:eywu/goodbots.git

Change to the /cmd/goodbots directory:

cd goodbots/cmd/goodbots

Build the binary/executable main.go file:

go build

How to use goodbots

If you've built the main.go file that comes with goodbots above, you can simply feed goodbots IPs via standard-in.

Test a single IP

echo "203.208.60.1" | ./goodbots

Test a range of IPs with prips command line tool

prips 203.208.40.1 203.208.80.1 | ./goodbots

Test a list of IPs from a text or csv file

./goodbots < ip-list.txt

note: The CSV or text file expects only an IP on its own line.

Example:

66.249.87.224
203.208.23.146
203.208.23.126
203.208.60.227

Saving the results

goodbots prints to standard-out with tab (\t) delimiters, so you can capture the output with an output redirect.

Example Output

203.208.60.1 crawl-203-208-60-1.googlebot.com
66.249.85.123 google-proxy-66-249-85-123.google.com
66.249.87.12 rate-limited-proxy-66-249-87-12.google.com
66.249.85.224 google-proxy-66-249-85-224.google.com

Save verified bot IPs provide in a file name ip-list.txt to a filed named saved-results.tsv

./goodbots < ip-list.txt > saved-results.tsv

DNS Resolvers

goodbots randomly selects a different public DNS resolver for each DNS lookup to reduce the chances of being blocked or throttled by your DNS provider if you have lots of IPs to verify.

It uses these DNS providers:

Supported Crawlers

Currently verifying the domain name is a little imprecise. goodbots looks for just the domain name to match and does NOT match the TLD.

Future improvements will test for more precise domains based on the crawlers specifications.

  • googlebot
    • .googlebot.
    • .google.
  • msnbot
    • .msn.
  • bingbot
    • .msn.
  • pinterest
    • .pinterest.
  • yandex
    • .yandex.
  • baidu
    • .baidu.
  • coccoc
    • .coccoc.

Make it go faster!

By default we only set the concurrency of requests to 10. If you want to speed up the work, you can increase that number by modifying the main.go file before building the binary/executable.

Other usage of goodbots

In building goodbots, we created a general purpose function for simply resolving the hostnames of any IP address.

In main.go you can uncomment the line that calls ResolveNames() and comment out the GoodBots() function call.

This will not perform a forward DNS lookup to verify the hostname resolves to the same IP address. Additionally, it will output errors to the TSV output when it encounters IPs that error out when requesting the hostname.

➜  goodbots git:(main) ✗ prips -i 50 66.100.0.0 66.200.0.0 | ./goodbots
66.100.0.50	(error)	lookup 50.0.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
...
66.100.1.144	(error)	lookup 144.1.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
66.100.0.150	WebGods
66.100.0.250	(error)	lookup 250.0.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
...
66.100.4.76	(error)	lookup 76.4.100.66.in-addr.arpa. on 192.168.1.1:53: no such host
66.100.4.126	mail.esai.com

ToDo

Suggestions from others

via John Murch

  • Generate bad bot list for blacklist usage
  • Track search engine bot list over time to see changes

Other Resources


Written in Golang gopher Gopher courtesy of Gopherize.me

You might also like...
Use qs-forward with QuickSocket to enable easy local development and testing!
Use qs-forward with QuickSocket to enable easy local development and testing!

qs-forward Use qs-forward with QuickSocket to enable easy local development and testing! Getting Started Want to jump in quick? Head over to the relea

Go forward proxy with bandwidth limiting.
Go forward proxy with bandwidth limiting.

Goforward Go forward proxy with rate limiting. The code is based on Michał Łowicki's 100 LOC forward proxy. Download Releases can be downloaded from h

A http-relay server/client written in golang to forward requests to a service behind a nat router from web

http-relay This repo is WIP http-relay is a server/client application written in go(lang) to forward http(s) requests to an application behind a nat r

SmartGateway is smart gateway.It uses iptables to forward traffic.

SmartGateway SmartGateway is smart gateway.It uses iptables to forward traffic. The traffic outbounds is tproxy of v2ray. SmartGateway run in docker,

Udp forward - Forwarding UPD requests with golang

udp_forward About tool I want to check how GOlang work with network... It tool t

A simple port forward tools build on libp2p with holepunch support.

p2p-tun A simple port forward and tun2socks tools build on libp2p with holepunch support. Usage NAME: p2p-tun - port forward and tun2socks through

DNS Ping: to check packet loss and latency issues with DNS servers

DNSping DNS Ping checks packet loss and latency issues with DNS servers Installation If you have golang, easiest install is go get -u fortio.org/dnspi

The Dual-Stack Dynamic DNS client, the world's first dynamic DNS client built for IPv6.

dsddns DsDDNS is the Dual-Stack Dynamic DNS client. A dynamic DNS client keeps your DNS records in sync with the IP addresses associated with your hom

netcup DNS module for caddy: dns.providers.netcup

netcup DNS module for Caddy This package contains a DNS provider module for Caddy. It can be used to manage DNS records with the netcup DNS API using

Comments
  • make ForwardDNS safe

    make ForwardDNS safe

    The ForwardDNS func currently panics because ip can be empty. Before this error was returned but this does not make sense as the ip value remains empty on failure.

    This PR adds error handling before the return so this function will not panic.

    opened by sifr0x 0
Releases(v0.0.1)
Owner
Eric Wu
Eric Wu
Hetzner-dns-updater - A simple tool to update a DNS record via Hetzner DNS API. Used for simple HA together with Nomad

hetzner-dns-updater A small utility tool to update a single record via Hetzner D

Patrick Pacher 0 Feb 12, 2022
A library for working with IP addresses and networks in Go

IPLib I really enjoy Python's ipaddress library and Ruby's ipaddr, I think you can write a lot of neat software if some of the little problems around

Chad Robinson 95 Nov 17, 2022
Given a list of domains, you resolve them and get the IP addresses.

resolveDomains Given a list of domains, you resolve them and get the IP addresses. Installation If you want to make modifications locally and compile

Josué Encinar 33 Oct 19, 2022
A little tool to test IP addresses quickly against a geolocation and a reputation API

iptester A little tool to test IP addresses quickly against a geolocation and a

Axel Vanzaghi 2 May 19, 2022
oniongrok forwards ports on the local host to remote Onion addresses as Tor hidden services and vice-versa

oniongrok Onion addresses for anything. oniongrok forwards ports on the local host to remote Onion addresses as Tor hidden services and vice-versa. Wh

Casey Marshall 264 Nov 15, 2022
Vanitytorgen - Vanity Tor keys/onion addresses generator

Vanity Tor keys/onion addresses generator Assumptions You know what you are doing. You know where to copy the output files. You know how to set up a H

kexkey 2 May 12, 2022
Onion addresses for anything.

onionpipe Onion addresses for anything. onionpipe forwards ports on the local host to remote Onion addresses as Tor hidden services and vice-versa. Wh

Casey Marshall 264 Nov 15, 2022
A paywall bypassing reverse proxy and DNS server written in go 🔨💵🧱

FreeNews ?? ?? ?? A paywall bypassing reverse proxy and DNS server written in go. This project is still hard work in progress. Expect stuff to just no

fipso 14 Nov 13, 2022
Listmonk-messenger - Lightweight HTTP server to handle webhooks from listmonk and forward it to different messengers

listmonk-messenger Lightweight HTTP server to handle webhooks from listmonk and

Joe Paul 27 Nov 1, 2022