Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Overview

Antch

Build Status Coverage Status Go Report Card GoDoc

Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started.

Antch is a fast, powerful and extensible web crawling & scraping framework for Go, used to crawl websites and extract structured data from their pages.

Get Started

Getting Started

Follow the Getting Started instructions to start your first spider.

Features

  • Polite, highly concurrent web crawler.
  • Powerful and customizable HTTP middleware.
  • Item data pipeline for the web spider.
  • Built-in proxy support (HTTP, HTTPS, SOCKS5).
  • Built-in XPath query support for HTML/XML documents.
  • Easy to use and integrate with your project.

Examples

BingWallpaper - Bing daily wallpaper.

Documentation

See https://github.com/antchfx/antch/wiki

You might also like...
Fast conversions across various Go types with a simple API.

Go Package: conv Get: go get -u github.com/cstockton/go-conv Example: // Basic types if got, err := conv.Bool(`TRUE`); err == nil { fmt.Printf("conv.

Stargather is fast GitHub repository stargazers information gathering tool

Stargather is fast GitHub repository stargazers information gathering tool that can scrapes: Organization, Location, Email, Twitter, Follow

Count Dracula is a fast metrics server that counts entries while automatically expiring old ones

In-Memory Expirable Key Counter This is a fast metrics server, ideal for tracking throttling. Put values to the server, and then count them. Values ex

Fast Entity Component System in Golang

ECS Fast Entity Component System in Golang This module is the ECS part of the game engine i'm writing in Go. Features: as fast as packages with automa

F' - A flight software and embedded systems framework

F´ (F Prime) is a component-driven framework that enables rapid development and deployment of spaceflight and other embedded software applications.

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.

Highly customizable archive and index framework for EPITA
Highly customizable archive and index framework for EPITA

epitar.gz Highly customizable archive and index framework for EPITA. Get started

Web app built with Go/Golang and Buffalo, deployed on Heroku, using Heroku Postgres

hundred-go-buffalo Background Read Go Read Buffalo Read Getting Started on Heroku with Go Recommended Tools PowerShell terminal Chocolatey Windows pac

Radiant is used for rapid development of enterprise application in Go, including RESTful APIs, web apps and backend services.

Radiant is used for rapid development of enterprise application in Go, including RESTful APIs, web apps and backend services.

Comments
  • antch-getstarted demo looping forever

    antch-getstarted demo looping forever

    Hi, This is my first post to github - sorry if I did something wrong. When running antch-getstarted on my Win 7 (64bit) + golang 1.9, program outputs several json records and pausing/looping. Only Ctrl+c can terminate it. I'm new to golang, so I was not able to solve this issue internally. Thanks,

    opened by dlipiec 7
  • Allow option for antch to set HTTP2ForceUpgrade transport param

    Allow option for antch to set HTTP2ForceUpgrade transport param

    In follow up, I might propose to change crawler interface to allow a transport to be injected (or a http client). For now, this felt the minimal change to fix the bug where using a DialContext for HTTP Transport to use HTTP/2

    opened by vivekraghunathan 0
  • Can't set the request header

    Can't set the request header

    Sometimes we would build one request that having different request header ,but the antch don't have this setting method .Yeah ,Can we add this function in the antch spider?Just like doing one interface in the crawler settings.

    opened by liuzeng01 1
  • Please show another Exit Method ,ths

    Please show another Exit Method ,ths

    I had read the example spider, it used the signal chan as the exit method .But it's not very helpfully. Can you show more exit method ? For example , if the spider having done all the scrapy works ,it would exit automatically ?

    opened by liuzeng01 1
Owner
The open source web crawler framework project
null
An simple, easily extensible and concurrent health-check library for Go services

Healthcheck A simple and extensible RESTful Healthcheck API implementation for Go services. Health provides an http.Handlefunc for use as a healthchec

Ether Labs 234 Sep 27, 2022
An easy to use, extensible health check library for Go applications.

Try browsing the code on Sourcegraph! Go Health Check An easy to use, extensible health check library for Go applications. Table of Contents Example M

Claudemiro 434 Aug 25, 2022
Go-linq - A powerful language integrated query (LINQ) library for Golang

go-linq A powerful language integrated query (LINQ) library for Go. Written in v

Ahmet Alp Balkan 3.1k Sep 29, 2022
Entitas-Go is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity.

Entitas-Go Entitas-GO is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity. Code Generator Install the l

Vladislav Fedotov 19 Aug 15, 2022
Fast and secure initramfs generator

Booster - fast and secure initramfs generator Initramfs is a specially crafted small root filesystem that mounted at the early stages of Linux OS boot

Anatol Pomozov 282 Sep 28, 2022
The package manager for macOS you didn’t know you missed. Simple, functional, and fast.

Stew The package manager for macOS you didn’t know you missed. Built with simplicity, functionality, and most importantly, speed in mind. Installation

Stew 20 Mar 30, 2022
a really fast difficulty and pp calculator for osu!mania

gonia | mania star + pp calculator a very fast and accurate star + pp calculator for mania. gonia has low memory usage and very fast calculation times

Simon 5 Mar 10, 2022
Executor - Fast exec task with go and less mem ops

executor fast exec task with go and less mem ops Why we need executor? Go with g

Te Nguyen 4 Aug 23, 2022
A fast and easy-to-use gutenberg book downloader

Gutenberg Downloader A brief description of what this project does and who it's for Usage download books Download all english books as epubs with imag

Lukas f. Paluch 2 Jan 11, 2022
A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Robert K 212 Sep 18, 2022