Golang HTML to plaintext conversion library

Overview

html2text

Documentation Build Status Report Card

Converts HTML into text of the markdown-flavored variety

Introduction

Ensure your emails are readable by all!

Turns HTML into raw text, useful for sending fancy HTML emails with an equivalently nicely formatted TXT document as a fallback (e.g. for people who don't allow HTML emails or have other display issues).

html2text is a simple golang package for rendering HTML into plaintext.

There are still lots of improvements to be had, but FWIW this has worked fine for my [basic] HTML-2-text needs.

It requires go 1.x or newer ;)

Download the package

go get jaytaylor.com/html2text

Example usage

package main

import (
	"fmt"

	"jaytaylor.com/html2text"
)

func main() {
	inputHTML := `
<html>
  <head>
    <title>My Mega Service</title>
    <link rel=\"stylesheet\" href=\"main.css\">
    <style type=\"text/css\">body { color: #fff; }</style>
  </head>

  <body>
    <div class="logo">
      <a href="http://jaytaylor.com/"><img src="/logo-image.jpg" alt="Mega Service"/></a>
    </div>

    <h1>Welcome to your new account on my service!</h1>

    <p>
      Here is some more information:

      <ul>
        <li>Link 1: <a href="https://example.com">Example.com</a></li>
        <li>Link 2: <a href="https://example2.com">Example2.com</a></li>
        <li>Something else</li>
      </ul>
    </p>

    <table>
      <thead>
        <tr><th>Header 1</th><th>Header 2</th></tr>
      </thead>
      <tfoot>
        <tr><td>Footer 1</td><td>Footer 2</td></tr>
      </tfoot>
      <tbody>
        <tr><td>Row 1 Col 1</td><td>Row 1 Col 2</td></tr>
        <tr><td>Row 2 Col 1</td><td>Row 2 Col 2</td></tr>
      </tbody>
    </table>
  </body>
</html>`

	text, err := html2text.FromString(inputHTML, html2text.Options{PrettyTables: true})
	if err != nil {
		panic(err)
	}
	fmt.Println(text)
}

Output:

Mega Service ( http://jaytaylor.com/ )

******************************************
Welcome to your new account on my service!
******************************************

Here is some more information:

* Link 1: Example.com ( https://example.com )
* Link 2: Example2.com ( https://example2.com )
* Something else

+-------------+-------------+
|  HEADER 1   |  HEADER 2   |
+-------------+-------------+
| Row 1 Col 1 | Row 1 Col 2 |
| Row 2 Col 1 | Row 2 Col 2 |
+-------------+-------------+
|  FOOTER 1   |  FOOTER 2   |
+-------------+-------------+

Unit-tests

Running the unit-tests is straightforward and standard:

go test

License

Permissive MIT license.

Contact

You are more than welcome to open issues and send pull requests if you find a bug or want a new feature.

If you appreciate this library please feel free to drop me a line and tell me! It's always nice to hear from people who have benefitted from my work.

Email: jay at (my github username).com

Twitter: @jtaylor

Alternatives

https://github.com/k3a/html2text - Lightweight

Issues
  • Pretty Tables: large tables break down (unfortunately) with index out of range...

    Pretty Tables: large tables break down (unfortunately) with index out of range...

    I'm really loving your html2text functionality, especially because you turn some HTML features into quasi-Markdown text, which is more than awesome, it's close to perfection! Thank you very much!

    I have an application which attempts to write HTML logs to a WebSocket, but, when it fails to do so, it prints the logs out to the console, which, of course, looks awful for a human to read! I'm already using ANSI colour escapes to at least give some idea of what is urgent and what is not, but clearly a more legible output was necessary... and your package does exactly that, thank you!

    Unfortunately, with very large tables, it seems to fail with:

    
    goroutine 23 [running]:
    github.com/olekukonko/tablewriter.(*Table).printHeading(0xc4200126c0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/olekukonko/tablewriter/table.go:333 +0x47d
    github.com/olekukonko/tablewriter.(*Table).Render(0xc4200126c0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/olekukonko/tablewriter/table.go:112 +0x3a
    github.com/jaytaylor/html2text.(*textifyTraverseContext).handleTableElement(0xc4201fc0f0, 0xc42018b3b0, 0x1, 0x0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:270 +0x5f9
    github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc4201fc0f0, 0xc42018b3b0, 0xc4201d0160, 0xc420178000)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:217 +0x6ed
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc4201fc0f0, 0xc42018b3b0, 0x7, 0x444868)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:325 +0x46
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc4201fc0f0, 0xc42018b340, 0xa, 0xc4201f7098)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:331 +0x47
    github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc4201fc0f0, 0xc42018b340, 0xc4201a7a70, 0x81)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:160 +0x76a
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc4201fc0f0, 0xc42018b340, 0x68, 0x444868)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:325 +0x46
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc4201fc0f0, 0xc42018b2d0, 0xc4201f74ac, 0x1)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:331 +0x47
    github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc4201fc0f0, 0xc42018b2d0, 0x0, 0x0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:228 +0x48b
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc4201fc0f0, 0xc42018b2d0, 0x0, 0x0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:325 +0x46
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc4201fc0f0, 0xc42018b1f0, 0xc4201f78c0, 0x45d9c4)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:331 +0x47
    github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc4201fc0f0, 0xc42018b1f0, 0x932bc0, 0x1)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:228 +0x48b
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc4201fc0f0, 0xc42018b1f0, 0x7f1edd570000, 0x0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:325 +0x46
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc4201fc0f0, 0xc42018b180, 0x410e58, 0xf0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:331 +0x47
    github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc4201fc0f0, 0xc42018b180, 0x0, 0x402c7e)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:318 +0x77
    github.com/jaytaylor/html2text.FromHTMLNode(0xc42018b180, 0xc4201f7f1d, 0x1, 0x1, 0x0, 0x0, 0xc001, 0xc420200000)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:32 +0xe8
    github.com/jaytaylor/html2text.FromReader(0xba2ea0, 0xc420200000, 0xc4201f7f1d, 0x1, 0x1, 0xc000, 0x0, 0x0, 0x0)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:53 +0xce
    github.com/jaytaylor/html2text.FromString(0xc42061e000, 0xa046, 0xc4201f7f1d, 0x1, 0x1, 0x2, 0x0, 0xc4201f80b8, 0x4681d9)
    	/var/www/clients/client6/web61/home/bot/go/src/github.com/jaytaylor/html2text/html2text.go:59 +0x12c
    

    I'm assuming that the problem is mostly with tablewriter, and not with your package? My tables have some 50 rows and a dozen columns or so. I have not tried with smaller tables (there aren't any!).

    opened by GwynethLlewelyn 10
  • html2text is causing a fatal error

    html2text is causing a fatal error

    Here is how I'm using it:

    parsedHTML, err := html2text.FromString(string(actualContent), html2text.Options{PrettyTables: true})
    if err != nil {
    	// handle it
    }
    

    Here is the error in our logs:

    Mar 16 07:38:43 fatal error: runtime: out of memory
    Mar 16 07:38:43 runtime stack:
    Mar 16 07:38:43 runtime.throw(0x9f18ac, 0x16)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:605 +0x95
    Mar 16 07:38:43 runtime.sysMap(0xc45a090000, 0x100000, 0xc420000700, 0xd275b8)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mem_linux.go:216 +0x1d0
    Mar 16 07:38:43 runtime.(*mheap).sysAlloc(0xd0e260, 0x100000, 0x7f6ed8163748)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/malloc.go:470 +0xd7
    Mar 16 07:38:43 runtime.(*mheap).grow(0xd0e260, 0xa, 0x0)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mheap.go:887 +0x60
    Mar 16 07:38:43 runtime.(*mheap).allocSpanLocked(0xd0e260, 0xa, 0xd275c8, 0x7f6ed8145cb8)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mheap.go:800 +0x334
    Mar 16 07:38:43 runtime.(*mheap).alloc_m(0xd0e260, 0xa, 0xc420030101, 0x414c1c)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mheap.go:666 +0x118
    Mar 16 07:38:43 runtime.(*mheap).alloc.func1()
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mheap.go:733 +0x4d
    Mar 16 07:38:43 runtime.systemstack(0xc420035f08)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:360 +0xab
    Mar 16 07:38:43 runtime.(*mheap).alloc(0xd0e260, 0xa, 0xc420010101, 0x414284)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/mheap.go:732 +0xa1
    Mar 16 07:38:43 runtime.largeAlloc(0x13f20, 0x7f6ed83e0101, 0x7f6ed8203257)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/malloc.go:827 +0x98
    Mar 16 07:38:43 runtime.mallocgc.func1()
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/malloc.go:722 +0x46
    Mar 16 07:38:43 runtime.systemstack(0xc420016000)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:344 +0x79
    Mar 16 07:38:43 runtime.mstart()
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/proc.go:1135
    Mar 16 07:38:43 goroutine 20 [running]:
    Mar 16 07:38:43 runtime.systemstack_switch()
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:298 fp=0xc450ff5248 sp=0xc450ff5240 pc=0x454eb0
    Mar 16 07:38:43 runtime.mallocgc(0x13f20, 0x90af80, 0x301, 0x3b9)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/malloc.go:721 +0x7ae fp=0xc450ff52f0 sp=0xc450ff5248 pc=0x4108fe
    Mar 16 07:38:43 runtime.makeslice(0x90af80, 0x27e4, 0x27e4, 0xc45a076000, 0x27e4, 0x27e4)
    Mar 16 07:38:43 	/usr/local/Cellar/go/1.9.2/libexec/src/runtime/slice.go:54 +0x77 fp=0xc450ff5320 sp=0xc450ff52f0 pc=0x4409f7
    Mar 16 07:38:43 github.com/olekukonko/tablewriter.WrapWords(0xc4588f4000, 0x27e4, 0x27e4, 0x1, 0x3b9, 0x186a0, 0x27e4, 0xc44d854000, 0x5974)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/olekukonko/tablewriter/wrap.go:58 +0x14a fp=0xc450ff53e8 sp=0xc450ff5320 pc=0x703cca
    Mar 16 07:38:43 github.com/olekukonko/tablewriter.WrapString(0xc44d854000, 0x5974, 0x3b9, 0x9e24d8, 0x1, 0xc44d854000, 0x5974)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/olekukonko/tablewriter/wrap.go:36 +0x189 fp=0xc450ff54b8 sp=0xc450ff53e8 pc=0x7039a9
    Mar 16 07:38:43 github.com/olekukonko/tablewriter.(*Table).parseDimension(0xc42029d800, 0xc44e32e000, 0x5974, 0x0, 0x1, 0xc4204340e0, 0x0, 0x1)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/olekukonko/tablewriter/table.go:809 +0x35f fp=0xc450ff55c0 sp=0xc450ff54b8 pc=0x70292f
    Mar 16 07:38:43 github.com/olekukonko/tablewriter.(*Table).Append(0xc42029d800, 0xc4203d87d0, 0x1, 0x1)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/olekukonko/tablewriter/table.go:291 +0x103 fp=0xc450ff5688 sp=0xc450ff55c0 pc=0x6fde93
    Mar 16 07:38:43 github.com/olekukonko/tablewriter.(*Table).AppendBulk(0xc42029d800, 0xc4202ce360, 0x2, 0x2)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/olekukonko/tablewriter/table.go:303 +0x58 fp=0xc450ff56c8 sp=0xc450ff5688 pc=0x6fe108
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleTableElement(0xc42063c4b0, 0xc420450a80, 0xc4201977c0, 0x4409f7)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:282 +0x753 fp=0xc450ff5780 sp=0xc450ff56c8 pc=0x71c273
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c4b0, 0xc420450a80, 0x410b88, 0xf0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:226 +0x5ef fp=0xc450ff5970 sp=0xc450ff5780 pc=0x71af5f
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c4b0, 0xc420450a80, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff59b8 sp=0xc450ff5970 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.FromHTMLNode(0xc420450a80, 0xc42042eea2, 0x1, 0x1, 0xc420197920, 0xc420197948, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:33 +0xea fp=0xc450ff5b10 sp=0xc450ff59b8 pc=0x71a45a
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).renderEachChild(0xc42063c000, 0xc420450a10, 0x0, 0x0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:449 +0xb1 fp=0xc450ff5b70 sp=0xc450ff5b10 pc=0x71d441
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleTableElement(0xc42063c000, 0xc420450a10, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:315 +0x253 fp=0xc450ff5c28 sp=0xc450ff5b70 pc=0x71bd73
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc420450a10, 0x440c64, 0xc42041d1e0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:226 +0x5ef fp=0xc450ff5e18 sp=0xc450ff5c28 pc=0x71af5f
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc420450a10, 0x90d8a0, 0xc42041d1e0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff5e60 sp=0xc450ff5e18 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4204509a0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff5e98 sp=0xc450ff5e60 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleTableElement(0xc42063c000, 0xc4204509a0, 0xc4203f2a10, 0x7f6ed8396cd8)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:301 +0x128 fp=0xc450ff5f50 sp=0xc450ff5e98 pc=0x71bc48
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4204509a0, 0xc4203f2ad0, 0x7f6ed8396cd8)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:226 +0x5ef fp=0xc450ff6140 sp=0xc450ff5f50 pc=0x71af5f
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4204509a0, 0xc4203f2b00, 0xc420198318)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff6188 sp=0xc450ff6140 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc420450930, 0xc4203f2ae0, 0x7f6ed8396cd8)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff61c0 sp=0xc450ff6188 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc420450930, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:243 +0x4dc fp=0xc450ff63b0 sp=0xc450ff61c0 pc=0x71ae4c
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc420450930, 0x90d8a0, 0x9e2601)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff63f8 sp=0xc450ff63b0 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4204508c0, 0x2, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff6430 sp=0xc450ff63f8 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleTableElement(0xc42063c000, 0xc4204508c0, 0x1, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:274 +0x666 fp=0xc450ff64e8 sp=0xc450ff6430 pc=0x71c186
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4204508c0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:226 +0x5ef fp=0xc450ff66d8 sp=0xc450ff64e8 pc=0x71af5f
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4204508c0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff6720 sp=0xc450ff66d8 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc420450850, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff6758 sp=0xc450ff6720 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc420450850, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:162 +0xb96 fp=0xc450ff6948 sp=0xc450ff6758 pc=0x71b506
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc420450850, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff6990 sp=0xc450ff6948 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4200ffea0, 0x1, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff69c8 sp=0xc450ff6990 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4200ffea0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:162 +0xb96 fp=0xc450ff6bb8 sp=0xc450ff69c8 pc=0x71b506
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4200ffea0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff6c00 sp=0xc450ff6bb8 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4200fee70, 0x7f6ed83e8000, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff6c38 sp=0xc450ff6c00 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4200fee70, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:162 +0xb96 fp=0xc450ff6e28 sp=0xc450ff6c38 pc=0x71b506
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4200fee70, 0xc42029c900, 0xa0e170)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff6e70 sp=0xc450ff6e28 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4200fee00, 0x0, 0x10)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff6ea8 sp=0xc450ff6e70 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4200fee00, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:243 +0x4dc fp=0xc450ff7098 sp=0xc450ff6ea8 pc=0x71ae4c
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4200fee00, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff70e0 sp=0xc450ff7098 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4200fea10, 0xc420175500, 0xc420450860)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff7118 sp=0xc450ff70e0 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).handleElement(0xc42063c000, 0xc4200fea10, 0x704801, 0x1200000a0a533)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:243 +0x4dc fp=0xc450ff7308 sp=0xc450ff7118 pc=0x71ae4c
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4200fea10, 0x7f6ed83e8000, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:345 +0x10b fp=0xc450ff7350 sp=0xc450ff7308 pc=0x71c71b
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverseChildren(0xc42063c000, 0xc4200fe9a0, 0x410b88, 0xf0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:351 +0x4e fp=0xc450ff7388 sp=0xc450ff7350 pc=0x71c7ce
    Mar 16 07:38:43 github.com/jaytaylor/html2text.(*textifyTraverseContext).traverse(0xc42063c000, 0xc4200fe9a0, 0x0, 0x0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:333 +0x13c fp=0xc450ff73d0 sp=0xc450ff7388 pc=0x71c74c
    Mar 16 07:38:43 github.com/jaytaylor/html2text.FromHTMLNode(0xc4200fe9a0, 0xc42019963e, 0x1, 0x1, 0x0, 0x0, 0x3801, 0xc42013a4b0)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:33 +0xea fp=0xc450ff7528 sp=0xc450ff73d0 pc=0x71a45a
    Mar 16 07:38:43 github.com/jaytaylor/html2text.FromReader(0xcc1c00, 0xc42013a4b0, 0xc42019963e, 0x1, 0x1, 0x3800, 0xc4201995e0, 0xc4201995e0, 0x444469)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:54 +0xce fp=0xc450ff7578 sp=0xc450ff7528 pc=0x71a6fe
    Mar 16 07:38:43 github.com/jaytaylor/html2text.FromString(0xc42062e000, 0x3583, 0xc42019963e, 0x1, 0x1, 0x3583, 0x3000, 0x4000, 0x410427)
    Mar 16 07:38:43 	/Users/xxx/workspace/go/src/github.com/jaytaylor/html2text/html2text.go:60 +0x135 fp=0xc450ff75f0 sp=0xc450ff7578 pc=0x71a8b5
    Mar 16 07:38:43 xxx/controller.parseEmail(0xc420134000, 0x3833, 0xc450ff7a88, 0xc42037baa0, 0x1f) 
    
    opened by dmonay 8
  • Table formatting not useful

    Table formatting not useful

    I am using html2text in my unit-testing to extract the "essence" from html formatted emails. As you know, html-formatted emails use a specific html dialect that use (deeply nested) tables intensively. Since the introduction of the new table-formatting html2text become completely useless for me. For now I will vendor the former version, but do you have any ideas how to go on? I can for example add a feature-toggle to disable/enable formatted tables?

    opened by MarcGrol 7
  • CitationStyleLinks, OmitTableNodes options

    CitationStyleLinks, OmitTableNodes options

    Citation Style Links

    Example:

    Here is a <a href="http://example.com/">link</a> to some page and <a href="http://google.com">another link</a> too
    

    Output:

    Here is a link [1] to some page and another link [2] too
    
    [1] http://example.com
    [2] http://google.com
    

    Omit Table Nodes

    Just like the OmitLinks option, but for tables. Inserts a "[Table]" text in the place of the table.

    opened by adtac 5
  • html2text is inserting space after span tag

    html2text is inserting space after span tag

    I have been working on a personal project, and I noticed that after every <span> tag, 1 space is inserted. I am quite sure this is not desired functionality of this tool.

    Example: some<span>thing</span>abc goes into some thing abc

    I would love to create PR, but unfortunately I don't know what is going on in the code , and which part should be changed (I am really new to Go).

    opened by GreatDanton 5
  • Properly trim white space of text nodes

    Properly trim white space of text nodes

    Line https://github.com/jaytaylor/html2text/blob/master/html2text.go#L340 should use the strings.TrimSpace function instead of strings.Trim to properly trim white spaces (especially unicode ones).

    data = strings.TrimSpace(spacingRe.ReplaceAllString(node.Data, " "))
    
    opened by Kleissner 4
  • Removes all newlines in HTML doc

    Removes all newlines in HTML doc

    Example:

    <html><head></head><body><pre>foo1
    bar1
    
    foo2
    bar2
    
    foo3
    bar3
    
    foo4
    bar4
    
    foo5
    bar5
    
    foo6
    bar6
    
    </pre>
    </body></html>
    

    Results in:

    foo1 bar1 foo2 bar2 foo3 bar3 foo4 bar4 foo5 bar5 foo6 bar6
    
    opened by theanine 4
  • dispose of files encoded with utf8 with bom, and add some test

    dispose of files encoded with utf8 with bom, and add some test

    I have some question when parsing html files encoded by utf8 with bom, and some unexpected things occured, for example, the Title that should not exits in results. I find it's because of bom. So I add this test.

    Now, input will be check if it has bom header, if so, remove the header, then every things go on well.

    opened by ssor 4
  • Remove regex replaces.

    Remove regex replaces.

    Problem: the library spends too much time locking regexes, according to the discussion[1]

    "If the lock there is dominating performance, then it means the regexp is super trivial. In that case you will get an even bigger speedup by just writing some code instead of doing a regexp match."

    So, this commit replaces them with traversing the string by each byte O(n) and replacing them.

    [1] https://github.com/golang/go/issues/8232

    opened by artem-cliqz 3
  • html2text does not handle <pre> <code> right

    html2text does not handle
      right
    	                                    
    	                                 

    html2text does not handle "

    " "" right

    "

    " tags  should convert "\n" also, the current convert will just put the mutiple lines join into one line. all source code was not readable.

    opened by cnmade 3
  • Better formatting of links and lists; more tests

    Better formatting of links and lists; more tests

    The format of links and lists has been modify to be as much close as possible to the format in premailer (https://github.com/premailer/premailer).

    Most of the tests are taken from https://github.com/premailer/premailer/blob/master/test/test_html_to_plain_text.rb

    TL;DR: <a href="http://hr.ef/">Caption</a> becomes Caption (http://hr.ef/) <li>1</li><li>2</li> becomes * 1\n* 2

    opened by korya 3
  • skip over noscript tags

    skip over noscript tags

    The contents of the noscript tag is not really considered part of the raw text of an HTML document.

    This patch skips over such tags and I've added a little test to ensure this is correct.

    opened by nathj07 0
  • do not add punctuation

    do not add punctuation

    When writing the headers to the output there is no need to add punctation. The extracted text shoudl not have anything that is not in the actual text.

    In addition I updated the .gitignore to exclude .idea files -this is generated by my editor.

    opened by nathj07 0
  • Test suite failing

    Test suite failing

    After the merge of #49 I found the test suite is failing. Going back to the previous commit makes it pass again.

    starting phase `check' --- FAIL: TestLinks (0.00s) html2text_test.go:469: error: input did not match specified expression Input: >>>> http://www.google.com <<<<

        Output:
        >>>>
        http://www.google.com ( http://www.google.com )
        <<<<
    
        Expected:
        >>>>
        http://www.google.com
        <<<<
    

    --- FAIL: TestOmitLinks (0.00s) html2text_test.go:517: error: input did not match specified expression Input: >>>> <<<<

        Output:
        >>>>
        ( http://example.com/ )
        <<<<
    
        Expected:
        >>>>
    
        <<<<
    html2text_test.go:517: error: input did not match specified expression
        Input:
        >>>>
        <a href="http://example.com/">Link</a>
        <<<<
    
        Output:
        >>>>
        Link ( http://example.com/ )
        <<<<
    
        Expected:
        >>>>
        Link
        <<<<
    html2text_test.go:517: error: input did not match specified expression
        Input:
        >>>>
        <a href="http://example.com/"><span class="a">Link</span></a>
        <<<<
    
        Output:
        >>>>
        Link ( http://example.com/ )
        <<<<
    
        Expected:
        >>>>
        Link
        <<<<
    html2text_test.go:517: error: input did not match specified expression
        Input:
        >>>>
        <a href='http://example.com/'>
                <span class='a'>Link</span>
                </a>
        <<<<
    
        Output:
        >>>>
        Link ( http://example.com/ )
        <<<<
    
        Expected:
        >>>>
        Link
        <<<<
    html2text_test.go:517: error: input did not match specified expression
        Input:
        >>>>
        <a href="http://example.com/"><img src="http://example.ru/hello.jpg" alt="Example"></a>
        <<<<
    
        Output:
        >>>>
        Example ( http://example.com/ )
        <<<<
    
        Expected:
        >>>>
        Example
        <<<<
    

    FAIL FAIL github.com/jaytaylor/html2text 0.003s FAIL error: in phase 'check': uncaught exception: %exception #<&invoke-error program: "go" arguments: ("test" "github.com/jaytaylor/html2text") exit-status: 1 term-signal: #f stop-signal: #f> phase `check' failed after 1.0 seconds command "go" "test" "github.com/jaytaylor/html2text" failed with status 1

    opened by Millak 0
  •  Parsing problem

    Parsing problem

    Hi! When trying to read and recognize html, a very long response occurs, the file attached

    	file, err := os.Open("test.txt")
    	if err != nil {
    		log.Println("err", err)
    	}
    	b, err := ioutil.ReadAll(file)
    	if err != nil {
    		log.Printf("Got text error: %+v\n", err)
    	}
    	str := bytes.NewBuffer(b).String()
    	text, err := html2text.FromString(str, html2text.Options{PrettyTables: true})
    	if err != nil {
    		log.Println(err)
    	}
    	log.Println(text)
    

    test.txt

    opened by VladLeb13 1
  • Signal killed when parsing eppi.ioe.ac.uk url

    Signal killed when parsing eppi.ioe.ac.uk url

    Bug downloading url: http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx

    This is my code for parsing url: // Get plain text content plain, err := html2text.FromString(string(bodyBytes), html2text.Options{PrettyTables: true})

    I got error: signal: killed

    It works pretty well on many other links. Thanks a lot, great job!!!

    opened by computerphysicslab 1
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

bluemonday bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user generated content as

Microcosm 2.3k Jun 24, 2022
character-set conversion library implemented in Go

mahonia character-set conversion library implemented in Go. Mahonia is a character-set conversion library implemented in Go. All data is compiled into

axgle 765 Jun 19, 2022
PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Allen Dang 70 Apr 23, 2022
yview is a lightweight, minimalist and idiomatic template library based on golang html/template for building Go web application.

wview wview is a lightweight, minimalist and idiomatic template library based on golang html/template for building Go web application. Contents Instal

null 0 Dec 5, 2021
Golang library for converting Markdown to HTML. Good documentation is included.

md2html is a golang library for converting Markdown to HTML. Install go get github.com/wallblog/md2html Example package main import( "github.com/wa

null 0 Jan 11, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "astuart.co/goq" ) // Structured representation for github file name table type example struct { Title str

Andrew Stuart 217 May 30, 2022
htmlquery is golang XPath package for HTML query.

htmlquery Overview htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression. htmlque

null 491 Jun 24, 2022
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

foolin 62 Jun 13, 2022
Frongo is a Golang package to create HTML/CSS components using only the Go language.

Frongo Frongo is a Go tool to make HTML/CSS document out of Golang code. It was designed with readability and usability in mind, so HTML objects are c

Rewan_ 21 Jul 29, 2021
golang program that simpily converts html into markdown

Simpily converts html to markdown Just a simple project I wrote in golang to convert html to markdown, surprisingly works decent for a lot of websites

null 1 Oct 23, 2021
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

html-to-markdown Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent so

Johannes Kaufmann 366 Jun 24, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

ZoomIO 21 Jun 16, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Michał Derkacz 74 Mar 15, 2022
Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

Christian Mehlmauer 48 Jun 12, 2022
export stripTags from html/template as strip.StripTags

HTML StripTags for Go This is a Go package containing an extracted version of the unexported stripTags function in html/template/html.go. ⚠️ This pack

John Wang 110 Jun 13, 2022
Simple Markdown to Html converter in Go.

Markdown To Html Converter Simple Example package main import ( "github.com/gopherzz/MTDGo/pkg/lexer" "github.com/gopherzz/MTDGo/pkg/parser" "fm

Nikita Kazeka 2 Jan 29, 2022
This command line converts thuderbird's exported RSS .eml file to .html file

thunderbird-rss-html This command line tool converts .html to .epub with images fetching. Install > go get github.com/gonejack/thunderbird-rss-html Us

会有猫的 0 Dec 15, 2021
Develop Sites Faster with HTML-Includer!

HTML Includer Develop Sites Faster with HTML Includer! How to Install Install HTML Includer on your machine: go install github.com/GameWorkstore/html-

Game Workstore 0 Jan 1, 2022
HTML, CSS and SVG static renderer in pure Go

Web render This module implements a static renderer for the HTML, CSS and SVG formats. It consists for the main part of a Golang port of the awesome W

Benoit KUGLER 7 Apr 19, 2022