Package sanitize provides functions for sanitizing text in golang strings.

Overview

sanitize GoDoc Go Report Card CircleCI

Package sanitize provides functions to sanitize html and paths with go (golang).

FUNCTIONS

sanitize.Accents(s string) string

Accents replaces a set of accented characters with ascii equivalents.

sanitize.BaseName(s string) string

BaseName makes a string safe to use in a file name, producing a sanitized basename replacing . or / with -. Unlike Name no attempt is made to normalise text as a path.

sanitize.HTML(s string) string

HTML strips html tags with a very simple parser, replace common entities, and escape < and > in the result. The result is intended to be used as plain text.

sanitize.HTMLAllowing(s string, args...[]string) (string, error)

HTMLAllowing parses html and allow certain tags and attributes from the lists optionally specified by args - args[0] is a list of allowed tags, args[1] is a list of allowed attributes. If either is missing default sets are used.

sanitize.Name(s string) string

Name makes a string safe to use in a file name by first finding the path basename, then replacing non-ascii characters.

sanitize.Path(s string) string

Path makes a string safe to use as an url path.

Changes

Version 1.2

Adjusted HTML function to avoid linter warning Added more tests from https://githubengineering.com/githubs-post-csp-journey/ Chnaged name of license file Added badges and change log to readme

Version 1.1 Fixed type in comments. Merge pull request from Povilas Balzaravicius Pawka

  • replace br tags with newline even when they contain a space

Version 1.0 First release

Comments
  • improving regex speed 2x faster

    improving regex speed 2x faster

    Instantiating regex is expensive. Should always keep it global for speed. This PR improves performance over 2x times.

    Original speed:

    BenchmarkPath              10000        106790 ns/op
    BenchmarkName              20000         62479 ns/op
    BenchmarkHTMLAllowed        2000       1149116 ns/op
    

    Improvements in this pull request:

    BenchmarkPath              30000         51607 ns/op
    BenchmarkName             100000         23429 ns/op
    BenchmarkHTMLAllowed        3000        501696 ns/op
    

    Benchmarked on Go1.4.1

    opened by eduncan911 5
  • Don't require // for the mailto: scheme

    Don't require // for the mailto: scheme

    Mailto links have the form <a href="mailto:[email protected]">...</a> with no // after the colon. This removes the // from the expression to make mailto: href attributes work correctly.

    opened by tabacco 1
  • Path function

    Path function

    Path function is dealing correctly with this vector "http://localhost:8080/?file=..\etc/passwd" but when you use "http://localhost:8080/?file=../etc/passwd" the result path will be "/etc/passwd"

    opened by zyayaa 1
  • Added strange quote character

    Added strange quote character

    I have encountered a strange quote character in a recipe I was trying to crawl. I though you might be interested in adding it to your sanitizing list.

    opened by uraza 1
  • Differentiate Accents() and Umlauts() functions

    Differentiate Accents() and Umlauts() functions

    As stated in PR https://github.com/kennygrant/sanitize/pull/22, sanitize package function was missing of Accents()-like function to transform strings to their umlauts variant. That PR has been merged, but was misinterpreting the way - I personally think - Accents() function was thought. This commit should make some clarifications about their real distinction.

    opened by streambinder 0
  • how to remove all of <script>...</script>?

    how to remove all of ?

    <span style="color:#999;font-size:8px;">
            <script type="text/javascript">
                //something
            </script>
    </span>
    

    how to remove all of <script>...</script>?

    opened by herozzm 0
  • Multiple lines are joined after sanitizing.

    Multiple lines are joined after sanitizing.

    Go code:

    package main
    
    import (
        "fmt"
    
        "github.com/kennygrant/sanitize"
    )
    
    func main() {
        content := `<p>LINE 1<br />
    LINE 2<br />
    LINE 3</p>`
        fmt.Println(sanitize.HTML(content))
    }
    

    Will provide:

    LINE 1LINE 2LINE 3
    

    New lines are missing. I can fix this by myself, but want to be sure if you'll merge my PR as latest commit is 1 year old.

    opened by Pawka 0
  • making code and comments confirm to GoLang's RFC spec

    making code and comments confirm to GoLang's RFC spec

    just tweaking documentation to match GoLang's RFC specs for commenting:

    http://blog.golang.org/godoc-documenting-go-code

    FYI, I also use Dave Cheney's godoc2md package for documenting my projects:

    https://github.com/davecheney/godoc2md

    You can see an example of how it outputs on another project I work on:

    https://github.com/eduncan911/es ^- i did not create this README.md, the godoc2md did from my code's documentation.

    It's nice as you can change your "Usage:" comments to actually give example code, and it shows up as examples.

    opened by eduncan911 0
  • update import directory of net/html

    update import directory of net/html

    If I'm not mistaken, the html package has moved a few weeks ago to https://godoc.org/golang.org/x/net/html

    This causes an error on one of my application's build step, so I'd be glad if you could consider to update the dependency path. I tested my repo after making the changes and it passed.

    opened by jhvst 0
  • Use the list of accents beastaugh/urlify

    Use the list of accents beastaugh/urlify

    The list of accents there is more complete, and will give a better translation.

    Path to file: https://github.com/beastaugh/urlify/blob/master/lib/urlify/accents.rb

    opened by frankbille 0
  • sanitize.HTMLAllowing() breaks when encountering a self-closing iframe tag

    sanitize.HTMLAllowing() breaks when encountering a self-closing iframe tag

    package main
    
    import (
    	"fmt"
    
    	"github.com/kennygrant/sanitize"
    )
    
    func main() {
    	input1 := `<iframe></iframe><script>alert('uh oh');</script><p>hello</p>`
    	input2 := `<iframe /><script>alert('uh oh');</script><p>hello</p>`
    
    	allowedTags := []string{"p"}
    
    	output1, _ := sanitize.HTMLAllowing(input1, allowedTags)
    	fmt.Println(output1) // <p>hello</p>
    
    	output2, _ := sanitize.HTMLAllowing(input2, allowedTags)
    	fmt.Println(output2) // &lt;script&gt;alert(&#39;uh oh&#39;);&lt;/script&gt;&lt;p&gt;hello&lt;/p&gt;
    }
    
    opened by dy-dx 2
  • Sanitize doesn't adequately protect HTML

    Sanitize doesn't adequately protect HTML

    This has the makings of a great sanitization library but right now it appears to have some vulnerabilities, based on a quick read-through of the clear and well-written code.

    https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.md

    To quote the first cheatsheet: Even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into.

    It might be useful to develop a test suite based on this: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

    For example, escaping only <> isn't enough. OWASP used to have a list (as follows), but now even this isn't sufficient.

    &  -> &amp;
    < -> &lt;
    > -> &gt;
    " -> &quot;
    ' -> &#x27;
    / -> &#x2F;
    \n ->  <br>
    

    Also have a look at how https://github.com/microcosm-cc/bluemonday does it.

    This is another OWASP cheat sheet that might be valuable:

    https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Input_Validation_Cheat_Sheet.md

    opened by jamieson99 1
Owner
Kenny Grant
Building websites, mostly with Go
Kenny Grant
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Alex Saskevich 5.6k Dec 28, 2022
ByNom is a Go package for parsing byte sequences, suitable for parsing text and binary data

ByNom is a Go package for parsing byte sequences. Its goal is to provide tools to build safe byte parsers without compromising the speed or memo

Andrew Bashkatov 4 May 5, 2021
Package i18n is a middleware that provides internationalization and localization for Flamego

i18n Package i18n is a middleware that provides internationalization and localization for Flamego. Installation The minimum requirement of Go is 1.16.

Flamego 4 Dec 14, 2022
Golang metrics for calculating string similarity and other string utility functions

strutil strutil provides string metrics for calculating string similarity as well as other string utility functions. Full documentation can be found a

Adrian-George Bostan 156 Jan 3, 2023
Helpful functions to work with emoji in Golang

GoMoji work with emoji in the most convenient way GoMoji is a Go package that provides a fast and simple way to work with emojis in strings. It has fe

Vlad Gukasov 516 Jan 6, 2023
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

JF Technology 532 Jan 4, 2023
Paranoid text spacing in Go (Golang)

pangu.go Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 85 Oct 15, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Joseph Kato 3k Jan 4, 2023
A collection of well-known string hash functions, implemented in Go

This library is a collection of "well-known" 32-bit string hashes, implemented in Go. It includes: Java string hash ELF-32 Jenkins' One-A

Damian Gryski 64 Mar 3, 2022
Useful template functions for Go templates.

Sprig: Template functions for Go templates The Go language comes with a built-in template language, but not very many template functions. Sprig is a l

null 3.3k Dec 31, 2022
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

John Moore 78 Sep 27, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Sebastian Müller 55 Nov 24, 2022
Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Nikita Vershinin 56 Dec 26, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

ZoomIO 26 Dec 19, 2022
Extract urls from text

xurls Extract urls from text using regular expressions. Requires Go 1.13 or later. import "mvdan.cc/xurls/v2" func main() { rxRelaxed := xurls.Relax

Daniel Martí 999 Jan 7, 2023
Easy AWK-style text processing in Go

awk Description awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitt

Scott Pakin 94 Jul 25, 2022
Change the color of console text.

go-colortext package This is a package to change the color of the text and background in the console, working both under Windows and other systems. Un

Yi Deng 215 Oct 26, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Michał Derkacz 74 Mar 15, 2022
Small and fast FTS (full text search)

Microfts A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as

Bill Burdick 27 Jul 30, 2022