A light libxml wrapper for Go

Overview

Gokogiri

LibXML bindings for the Go programming language.

By Zhigang Chen and Hampton Catlin

This is a major rewrite from v0 in the following places:

  • Separation of XML and HTML
  • Put more burden of memory allocation/deallocation on Go
  • Fragment parsing -- no more deep-copy
  • Serialization
  • Some API adjustment

Installation

# Linux
sudo apt-get install libxml2-dev
# Mac
brew install libxml2

go get github.com/moovweb/gokogiri

Running tests

go test github.com/moovweb/gokogiri/...

Basic example

package main

import (
  "net/http"
  "io/ioutil"
  "github.com/moovweb/gokogiri"
)

func main() {
  // fetch and read a web page
  resp, _ := http.Get("http://www.google.com")
  page, _ := ioutil.ReadAll(resp.Body)

  // parse the web page
  doc, _ := gokogiri.ParseHtml(page)

  // perform operations on the parsed page -- consult the tests for examples

  // important -- don't forget to free the resources when you're done!
  doc.Free()
}
Comments
  • memory leak under heavy load

    memory leak under heavy load

    i am parsing around 200-300 3kb html snippets per second. which in itself proves how cool your lib is ;) sadly it's leaking memory at around 1-2 mb/min. not constantly though, so i am guessing it could be some kind of error while parsing.

    if i can help you to fix this let me know

    thx,

    paul

    opened by elmacnifico 20
  • can't seem to easily build on OS X

    can't seem to easily build on OS X

    The README is obviously outdated since the makefile is gone, but I still didn't manage to build/install on Mountain Lion:

    https://gist.github.com/4383203

    I installed libxml2 from homebrew, updated the xpath import statement to reflect the path of the brew files. Tried to build and go some error.

    An updated readme would be very appreciated since this lib seems very useful.

    Thanks,

    • Matt
    opened by mattetti 16
  • change import paths; fix `go get`

    change import paths; fix `go get`

    Currently this package refers to imports as a path gokogiri/... that doesn't exist. This makes it imposible to install as any other path (ie: "github.com/moovweb/gokogiri"). I think this can be resolved by making all local references to gokogiri components as relative imports.

    ie: in gokogiri/html/document.go the reference to util should be "../util"

    ~$ pkg-config --cflags libxml-2.0 libxml-2.0
    -I/usr/include/libxml2  
    ~$ go get github.com/moovweb/gokogiri
    package gokogiri/html: unrecognized import path "gokogiri/html"
    package gokogiri/xml: unrecognized import path "gokogiri/xml"
    $ go get github.com/moovweb/gokogiri/html
    package gokogiri/util: unrecognized import path "gokogiri/util"
    package gokogiri/xml: unrecognized import path "gokogiri/xml"
    
    opened by jehiah 10
  • TestDisableOutputEscaping fails in Darwin

    TestDisableOutputEscaping fails in Darwin

    Not sure why, seems to work fine on other platforms (windows and linux included).

    Below is the output:

    gokogiri/xml $ go test .
    
    Testing: Basic Parsing [....]
    
    All (4) tests passed!
    
    Testing: Buffered Parsing [....]
    
    All (4) tests passed!
    --- FAIL: TestDisableOutputEscaping (0.00 seconds)
        node_test.go:364: TestDisableOutputEscaping (escaping disabled) Expected: <br/>
            Actual: &lt;br/&gt;
    FAIL
    FAIL    github.com/moovweb/gokogiri/xml 0.134s
    
    opened by mdayaram 9
  • clang: error: argument unused during compilation: '-fno-eliminate-unused-debug-types'

    clang: error: argument unused during compilation: '-fno-eliminate-unused-debug-types'

    I seem to get this both when trying to use Gokogiri and when I tried to go get gokogiri again. :S Hope this isn't just me being stupid haha.

    Cheers

    George

    opened by GeorgeMac 9
  • Better XPath support

    Better XPath support

    This pull request addresses both #42 and #39.

    Node.EvalXPath handles evaluating an XPath that returns a string or number instead of a nodeset. Unhandled return types are now coerced into a string.

    Node.SearchWithVariables and Node.EvalXPath both take a VariableScope that allows XPath expressions to resolve any variable names. This is specifically needed for my XSLT processor and may be useful in other contexts.

    opened by jbowtie 9
  • Inject HTML into a node

    Inject HTML into a node

    There should be a way to inject HTML into a node. For instance,

    node.String() // ""
    node.Inject("<div />")
    node.String() // "<div />"
    

    And, furthermore, this new div has to be properly doc'd.

    node.FirstElement().Doc() == node.Doc()
    // and ensure this happens in C-world too!
    
    opened by HamptonMakes 9
  • Encoding support

    Encoding support

    Gokogiri doesn't seem to support the encoding of some pages, although http://www.xmlsoft.org/encoding.html claims libxml will use iconv on unix systems. Here's a small test:

    package main
    
    import (
        "fmt"
        "io/ioutil"
        "net/http"
        "github.com/moovweb/gokogiri"
    )
    
    func get(url string) []byte {
        r, err := http.Get(url)
        if err != nil { panic(err) }
        body, err := ioutil.ReadAll(r.Body)
        if err != nil { panic(err) }
        return body
    }
    
    func main() {
        buf := get("http://bbs.chinaunix.net/thread-4080291-1-1.html")
        doc, err := gokogiri.ParseHtml(buf)
        if err != nil { panic(err) }
        fmt.Println("MetaEncoding:", doc.MetaEncoding())
        title, _ := doc.Search("//title")
        fmt.Println(title[0].Content())
    }
    

    Output:

    ~/gtest > go run gokogiritest.go
    MetaEncoding: gbk
    AIXÉÏlibxml2²»֧³Ögb2312±àÂë-AIX-ChinaUnix.net
    ~/gtest > go run gokogiritest.go | iconv -f gbk
    MetaEncoding: gbk
    AIX上libxml2不支持gb2312编码-AIX-ChinaUnix.net
    

    Any idea why it's not working? Did I misunderstand the libxml page?

    opened by lucy 8
  • cannot build, test, or install gokogiri

    cannot build, test, or install gokogiri

    I've tried several avenues, including what's detailed in the README. Here's the steps I took:

    [email protected]:~/incoming/gokogiri 1014:0% make test
    make: *** No rule to make target `test'.  Stop.
    
    [email protected]:~/incoming/gokogiri 1015:2% go build
    gokogiri.go:4:2: import "gokogiri/html": cannot find package
    gokogiri.go:5:2: import "gokogiri/xml": cannot find package
    
    [email protected]:~/incoming/gokogiri 1016:1% go get github.com/moovweb/gokogiri
    # pkg-config --cflags libxml-2.0 libxml-2.0
    exec: "pkg-config": executable file not found in $PATH
    
    [email protected]:~/incoming/gokogiri 1017:2% make install
    make: *** No rule to make target `install'.  Stop.
    
    [email protected]:~/incoming/gokogiri 1018:2% go test
    gokogiri.go:4:2: import "gokogiri/html": cannot find package
    gokogiri.go:5:2: import "gokogiri/xml": cannot find package
    
    opened by cmhobbs 8
  • Make gokogiri compile with go 1.6

    Make gokogiri compile with go 1.6

    In Go 1.6 it is basically forbidden to pass a Go pointer to Go functions that are used as callbacks from C.

    Fix this by funneling those pointers through global variables.

    Fixes #92

    opened by nightlyone 7
  • Node.Search() uses the wrong XPath context

    Node.Search() uses the wrong XPath context

    Node.Search() should create a new XPath context using the current node instead of using the document context to allow searching from the current node.

    opened by sorin-ionescu 7
  • Get error when start Docker container

    Get error when start Docker container

    Intall libxml2 in Dockerfile

    RUN apt-get update && apt-get install -y build-essential libxml2 libxml2-dev libxmlsec1-dev

    When start container, getting error: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory. How can i fix it?

    opened by seminmw 0
  • identifier

    identifier "_Ctype_struct__xmlDoc" may conflict with identifiers generated by cgo

    I'm trying to install gokogiri on a macOS 10.14.4 (Mojave) and Go 1.12.3. I've installed libxml2 using brew. Installing gokogiri with:

    LDFLAGS="-L/usr/local/opt/libxml2/lib" CPPFLAGS="-I/usr/local/opt/libxml2/include" PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig" go get github.com/moovweb/gokogiri

    Outputs the error:

    # github.com/moovweb/gokogiri/xml
    ../../github.com/moovweb/gokogiri/xml/document.go:330:19: identifier "_Ctype_struct__xmlDoc" may conflict with identifiers generated by cgo
    

    How may I compile gokogiri?

    opened by joaolsilva 4
  • pkg-config: exec:

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    go get github.com/moovweb/gokogiri

    pkg-config --cflags libxml-2.0

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    pkg-config --cflags libxml-2.0 libxml-2.0

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    Is it normal for a go get statement to require a dependency in the environment path?

    opened by niski84 0
  • build constraints exclude all Go files in /moovweb/gokogiri/help, failed to build with arch=386

    build constraints exclude all Go files in /moovweb/gokogiri/help, failed to build with arch=386

    $ GOOS=windows GOARCH=386 go build -o anan
    go build github.com/moovweb/gokogiri/help: build constraints exclude all Go files in /home/javier/go/src/github.com/moovweb/gokogiri/help
    go build github.com/moovweb/gokogiri/xpath: build constraints exclude all Go files in /home/javier/go/src/github.com/moovweb/gokogiri/xpath
    

    any thoughts?

    opened by oguzhantopcu 1
Owner
Moovweb
Moovweb XDN delivers unparalleled site speeds via progressive web apps with server-side rendering, auto AMP creation, and CDN-as-code.
Moovweb
Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

null 42 Nov 5, 2022
A light wrapper around R.

arr A light wrapper around R. Install go get github.com/devOpifex/arr or go install github.com/devOpifex/[email protected] Help arr -h Completion See documen

Opifex 2 Dec 11, 2021
Go-video-preview-ffmpeg-wrapper - A simple helper wrapper to generate small webm video previews using ffmpeg, useful for web previews.

Go-video-preview-ffmpeg-wrapper A simple helper wrapper to generate small webm video previews using ffmpeg, useful for web previews. Getting Started u

Robert van Alphen 0 Jan 5, 2022
Via Cep Wrapper is a api wrapper used to find address by zipcode (Brazil only)

Viacep Wrapper Viacep Wrapper is an API wrapper built with Golang used to find address by zipcode (Brazil only). This project was developed for study

PATRICK SEGANTINE 1 Jan 25, 2022
Light weight, extensible configuration management library for Go. Built in support for JSON, TOML, YAML, env, command line, file, S3 etc. Alternative to viper.

koanf (pronounced conf; a play on the Japanese Koan) is a library for reading configuration from different sources in different formats in Go applicat

Kailash Nadh 1.3k Nov 22, 2022
A simple and light excel file reader to read a standard excel as a table faster | 一个轻量级的Excel数据读取库,用一种更`关系数据库`的方式解析Excel。

Intro | 简介 Expect to create a reader library to read relate-db-like excel easily. Just like read a config. This library can read all xlsx file correct

Back Yu 163 Nov 21, 2022
A light package for generating and comparing password hashing with argon2 in Go

argon2-hashing argon2-hashing provides a light wrapper around Go's argon2 package. Argon2 was the winner of the Password Hashing Competition that make

Andrey Skurlatov 19 Sep 27, 2022
Muxie is a modern, fast and light HTTP multiplexer for Go. Fully compatible with the http.Handler interface. Written for everyone.

Muxie ?? ?? ?? ?? ?? ?? Fast trie implementation designed from scratch specifically for HTTP A small and light router for creating sturdy backend Go a

Gerasimos (Makis) Maropoulos 279 Oct 5, 2022
🦄🌈 YoyoGo is a simple, light and fast , dependency injection based micro-service framework written in Go.

???? YoyoGo is a simple, light and fast , dependency injection based micro-service framework written in Go. Support Nacos ,Consoul ,Etcd ,Eureka ,kubernetes.

YoyoFx 557 Nov 22, 2022
topolvm operator provide kubernetes local storage which is light weight and high performance

Topolvm-Operator Topolvm-Operator is an open source cloud-native local storage orchestrator for Kubernetes, which bases on topolvm. Supported environm

Alauda.io 23 Nov 1, 2022
A Light Golang RPC Framework

Glory Glory框架为一款Go语言的轻量级RPC框架,您可以使用它快速开发你的服务实例。如果您希望在微服务场景下使用gRPC进行网络通信,那么Glory会使您的开发、运维工作量减轻不少。 欢迎访问Glory主页: glory-go.github.io 示例仓库:github.com/glory

null 121 Oct 28, 2022
EasyTCP is a light-weight and less painful TCP server framework written in Go (Golang) based on the standard net package.

EasyTCP is a light-weight TCP framework written in Go (Golang), built with message router. EasyTCP helps you build a TCP server easily fast and less painful.

zxl 530 Nov 22, 2022
Easy to use, light enough, good performance Golang library

指令使用 特性 简单易用、足够轻量,避免过多的外部依赖,最低兼容 Window 7 等老系统 快速上手 安装 $ go get github.com/sohaha/zlsgo HTTP 服务 // main.go

影浅 505 Nov 18, 2022
Light JSON API for storing user ratings of NASA's Astronomy Picture of the Day (APOD).

nasa-apod-api-go Light JSON API for storing user ratings of NASA's Astronomy Picture of the Day (APOD). To run this server you must have access to a N

null 1 Oct 26, 2021
Fastest light-weight Discord server joiner written in GO

DiscordInviterGO! Fastest light-weight Discord server joiner written in GO Disclaimer For Educational purposes only. Use at your own risk. Automation

Vanshaj 98 Oct 27, 2022
Light weight http rate limiting proxy

Introduction Light weight http rate limiting proxy. The proxy will perform rate limiting based on the rules defined in the configuration file. If no r

DHIS2 Platform Engineering 13 Oct 31, 2022
Attempt to plot light sensor data from lunarsensor.

lightsensor Attempt to plot light sensor data from lunarsensor. Buy the components, install firmware on Ambient Light Sensor. Build the go app that po

Konstantin Chukhlomin 1 Nov 10, 2022
Handshake Query is a cross-platform library to trustlessly resolve and verify Handshake names using a p2p light client

Handshake Query ⚠️ Usage of this library is not currently recommended in your application as the API will likely change. Handshake Query is a cross-pl

Impervious Inc 7 Aug 1, 2022
GOLF(Go Light Filter), golf dependents Gorm and Gin.

GOLF (WIP) GOLF(Go Light Filter), golf dependents Gorm and Gin. golf can help you build model query as fast as,build model query like Django Rest Fram

1mtrue 4 Dec 12, 2021
Light weight Terminal User Interface (TUI) to pick material colors written by Go.

mcpick Light weight Terminal User Interface (TUI) to pick material colors. You do NOT need to take your hands off the keyboard to pick colors. Getting

tenkoh 5 Oct 24, 2022