A Go library for fetching, parsing, and updating RSS feeds.

Related tags

Network rss
Overview

rss

GoDoc

RSS is a small library for simplifying the parsing of RSS and Atom feeds. The package could do with more testing, but it conforms to the RSS 1.0, 2.0, and Atom 1.0 specifications, to the best of my ability. I've tested it with about 15 different feeds, and it seems to work fine with them.

If anyone has any problems with feeds being parsed incorrectly, please let me know so that I can debug and improve the package.

Dependencies:

go get github.com/axgle/mahonia

Example usage:

package main

import "github.com/SlyMarbo/rss"

func main() {
	feed, err := rss.Fetch("http://example.com/rss")
	if err != nil {
		// handle error.
	}
	
	// ... Some time later ...
	
	err = feed.Update()
	if err != nil {
		// handle error.
	}
}

The output structure is pretty much as you'd expect:

type Feed struct {
	Nickname    string              // This is not set by the package, but could be helpful.
	Title       string
	Description string
	Link        string              // Link to the creator's website.
	UpdateURL   string              // URL of the feed itself.
	Image       *Image              // Feed icon.
	Items       []*Item
	ItemMap     map[string]struct{} // Used in checking whether an item has been seen before.
	Refresh     time.Time           // Earliest time this feed should next be checked.
	Unread      uint32              // Number of unread items. Used by aggregators.
}

type Item struct {
	Title     string
	Summary   string
	Content   string
	Link      string
	Date      time.Time
	DateValid bool
	ID        string
	Read      bool
}

type Image struct {
	Title   string
	URL     string
	Height  uint32
	Width   uint32
}

The library does its best to follow the appropriate specifications and not to set the Refresh time too soon. It currently follows all update time management methods in the RSS 1.0, 2.0, and Atom 1.0 specifications. If one is not provided, it defaults to 10 minute intervals. If you are having issues with feed providors dropping connections, please let me know and I can increase this default, or you can increase the Refresh time manually. The Feed.Update method uses this Refresh time, so if Update seems to be returning very quickly with no new items, it's likely not making a request due to the provider's Refresh interval.

This is seeing thorough use in RS3, but development is still active.

Issues
  • Parse metadata element: author

    Parse metadata element: author

    https://indieweb.org/payment#Implementations

    Would also parse atom:link (s) within items

      <author>
      	<name>Mark Pilgrim</name>
      	<email>[email protected]</email>
      	<uri>https://mysite.com</uri>
      	<atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
      </author>
    
    <entry>
    <id>abc</id>
    <title>Iabc</title>
    <link href="https://siasky.net/CADcPfMxnOgtgwllK9-kp12sIy9L8De7br9nvNFcslCKRg" rel="alternate"/>
    <summary>
    The story of abc
    </summary>
    <atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
    </entry>
    
    opened by t-900-a 2
  • CDATA tags inside content not parsed

    CDATA tags inside content not parsed

    package main
    
    import (
    	"github.com/SlyMarbo/rss"
    )
    
    func main() {
    	feed, err := rss.Fetch("http://www.ruanyifeng.com/blog/atom.xml")
    	if err != nil {
    		// handle error.
    	}
    
    	// ... Some time later ...
    
    	err = feed.Update()
    	if err != nil {
    		// handle error.
    	}
    }
    

    image

    Related issue on other library: https://github.com/mmcdole/gofeed/issues/98

    opened by gonejack 1
  • Possible to parse other XML fields?

    Possible to parse other XML fields?

    I would like to get the value of the field <newznab:attr name="group" value="alt.binaries.teevee"/> so ending up with the value alt.binaries.teevee.

    How do I do so?

    <?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" encoding="utf-8">
     <channel>
      <atom:link href="https://REMOVED.com/api" rel="self" type="application/rss+xml"/>
      <title>REMOVED</title>
      <description>API Details</description>
      <link>https://REMOVED.com/</link>
      <language>en-gb</language>
      <webMaster>[email protected]</webMaster>
      <category>Stuff</category>
      <generator>Me</generator>
      <ttl>10</ttl>
      <docs>https://removed.com/apihelp/</docs>
      <image url="https://removed.com/themes/shared/img/logo.png" title="REMOVED" link="https://removed.com/" description="Visit REMOVED"/>
      <newznab:response offset="0" total="125000"/>
      <item>
       <title>Fair.Go.2017.09.18.HDTV.x264-FiHTV </title>
       <guid isPermaLink="true">https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d</guid>
       <link>https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6</link>
       <comments>https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d#comments</comments>
       <pubDate>Tue, 19 Sep 2017 10:18:21 +0200</pubDate>
       <category>TV &gt; SD</category>
       <description>Fair.Go.2017.09.18.HDTV.x264-FiHTV </description>
       <enclosure url="https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6" length="168013625" type="application/x-nzb"/>
       <newznab:attr name="category" value="5030"/>
       <newznab:attr name="size" value="168013625"/>
       <newznab:attr name="files" value="17"/>
       <newznab:attr name="poster" value="[email protected] (yeahsure)"/>
       <newznab:attr name="prematch" value="1"/>
       <newznab:attr name="info" value="https://REMOVED.com/api?t=info&amp;id=427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d&amp;r=3bc4e94ef14337e4e2b490a3897c48f6"/>
       <newznab:attr name="grabs" value="0"/>
       <newznab:attr name="comments" value="0"/>
       <newznab:attr name="password" value="0"/>
       <newznab:attr name="usenetdate" value="Tue, 19 Sep 2017 10:07:47 +0200"/>
       <newznab:attr name="group" value="alt.binaries.teevee"/>
      </item>
    </channel>
    </rss>
    
    opened by Fossil01 3
  • Support RSS Content Module

    Support RSS Content Module

    opened by terinjokes 7
  • Item.Date issue with timezone (UTC vs PDT)

    Item.Date issue with timezone (UTC vs PDT)

    In an Item for the feed that I'm pulling from...

    <pubDate>Tue, 19 Jul 2016 13:14:13 PDT</pubDate>

    On my local laptop with local time set to PDT, I don't have a problem: 2016-07-19 13:14:13 -0700 PDT

    On my server with local time set to UTC, I have this odd timezone problem: 2016-07-19 13:14:13 +0000 PDT

    opened by homingli 11
  • make enclosure return error on parsing 'none' value

    make enclosure return error on parsing 'none' value

    Sorry about the last pull request got some troubles on my git.

    check if that was what you were meaning, I've got a little confuse about that answer.

    Thanks!

    opened by vasconcelosvcd 7
  • Enclosures attributes not parsed

    Enclosures attributes not parsed

    http://cyber.law.harvard.edu/rss/rss.html#ltenclosuregtSubelementOfLtitemgt

    According to the spec the enclosure tag has 3 attributes. The Rss2_0Enclosure is setup to parse those attributes as sub elements.

    opened by adamveld12 7
  • Problem with Feed Link attribute

    Problem with Feed Link attribute

    I can successfully parse http://www.ft.com/rss/home/europe feed, except that the Feed.Link attribute is the same as RSS feed address. So:

    RSS address is http://www.ft.com/rss/home/europe <link> attribute in the RSS feed is http://www.ft.com/home/europe Feed.Link value is http://www.ft.com/rss/home/europe (same as RSS address, not as <link> attribute in the RSS feed)

    I had the same issue with some other feeds as well.

    opened by rbatukaev 6
  • Fix the image compatible issue

    Fix the image compatible issue

    I found a potential image issue of the RSS parsing process. See the details below:

    An image to associate with your podcast. It must not be blocked to Googlebot. You can provide an image using any of the following tags:
    
    <itunes:image>
    <image>
      <link>...</link>
      <title>...</title>
      <url>...</url>
    </image>
    When using RSS <image> tags, you must include a nested <link> element that points to the podcast homepage, and a nested <title> tag that matches the <title> element in the homepage.
    
    Example: <itunes:image href="https://google.com/google_podcast_cover_art.jpg"/>
    

    Reference

    • https://support.google.com/podcast-publishers/answer/9889544?hl=en
    opened by LinuxSuRen 5
  • accept unparsable dates

    accept unparsable dates

    Sadly, there are feeds out there that do not follow the standard. Often I encouter unparsable date or pubDate elements:

    "Wed May 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time)"
    "Wed, 24 May 2017, 11:05"
    "31.05.2017"
    

    I've decided to not accept such dates that do not follow the RFCs but to drop them.

    This PR changes the behaviour and does not not stop parsing of such a feed. Instead the default value of 0001-01-01 00:00:00 +0000 UTC is used.

    opened by ghost 4
  • Some fixes + changes

    Some fixes + changes

    Hello

    The go 1.1 should be trivial to pull. The Authors for Atom does not break anything but adds new things The database change is probably something you don't want.

    opened by taruti 4
  • What happened to rss.CacheParsedItemIDs

    What happened to rss.CacheParsedItemIDs

    My code no longer works. I assume this was changed recently. How do we reproduce this behaviour in the new versions?

    See: https://gowalker.org/github.com/SlyMarbo/rss#CacheParsedItemIDs

    I only remember I had to use this function to work around a bug I was encountering. I can't remember why exactly I was using it, it was a while ago now. It was likely to do with either running out of memory in a long running process, or to seeing updates to feeds coming through.

    opened by zaddok 3
  • Enclosure fix

    Enclosure fix

    Fixes parsing failures on http://amharic.voanews.com/api/zt$gteitjt

    The enclosure attributes fields need to be marked as attr. Updated the test files to test for this issue.

    opened by shebaw 3
  • Not returning items for FeedBurner

    Not returning items for FeedBurner

    Example http://feeds.feedburner.com/ImgurGallery?format=xml

    I also created a custom FeedBurner for this same feed and attempted to convert it into several different formats.

    I am running inside of Google App engine and have substituted http.Client with urlfetch.Client in rss.go. There have been a handfull of other feeds that I have experienced trouble with, but so far your library has worked great for 95% of everything I've hit so far. Great Work!

    working as intended 
    opened by Z0M813K1LL3R 3
  • Support HTTP Basic Authentication

    Support HTTP Basic Authentication

    I suggest to add support for HTTP Basic Authentication. This would allow access to password protected feeds. net/http.Request provides SetBasicAuth which could be used.

    Do you have plans and/or time to implement this? If not, I'd try to add this and then submit a PR.

    enhancement 
    opened by ghost 2
  • Parse metadata element: author

    Parse metadata element: author

    https://indieweb.org/payment#Implementations

    Would also parse atom:link (s) within items

      <author>
      	<name>Mark Pilgrim</name>
      	<email>[email protected]</email>
      	<uri>https://mysite.com</uri>
      	<atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
      </author>
    
    <entry>
    <id>abc</id>
    <title>Iabc</title>
    <link href="https://siasky.net/CADcPfMxnOgtgwllK9-kp12sIy9L8De7br9nvNFcslCKRg" rel="alternate"/>
    <summary>
    The story of abc
    </summary>
    <atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
    </entry>
    
    opened by t-900-a 2
  • CDATA tags inside content not parsed

    CDATA tags inside content not parsed

    package main
    
    import (
    	"github.com/SlyMarbo/rss"
    )
    
    func main() {
    	feed, err := rss.Fetch("http://www.ruanyifeng.com/blog/atom.xml")
    	if err != nil {
    		// handle error.
    	}
    
    	// ... Some time later ...
    
    	err = feed.Update()
    	if err != nil {
    		// handle error.
    	}
    }
    

    image

    Related issue on other library: https://github.com/mmcdole/gofeed/issues/98

    opened by gonejack 1
  • Possible to parse other XML fields?

    Possible to parse other XML fields?

    I would like to get the value of the field <newznab:attr name="group" value="alt.binaries.teevee"/> so ending up with the value alt.binaries.teevee.

    How do I do so?

    <?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" encoding="utf-8">
     <channel>
      <atom:link href="https://REMOVED.com/api" rel="self" type="application/rss+xml"/>
      <title>REMOVED</title>
      <description>API Details</description>
      <link>https://REMOVED.com/</link>
      <language>en-gb</language>
      <webMaster>[email protected]</webMaster>
      <category>Stuff</category>
      <generator>Me</generator>
      <ttl>10</ttl>
      <docs>https://removed.com/apihelp/</docs>
      <image url="https://removed.com/themes/shared/img/logo.png" title="REMOVED" link="https://removed.com/" description="Visit REMOVED"/>
      <newznab:response offset="0" total="125000"/>
      <item>
       <title>Fair.Go.2017.09.18.HDTV.x264-FiHTV </title>
       <guid isPermaLink="true">https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d</guid>
       <link>https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6</link>
       <comments>https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d#comments</comments>
       <pubDate>Tue, 19 Sep 2017 10:18:21 +0200</pubDate>
       <category>TV &gt; SD</category>
       <description>Fair.Go.2017.09.18.HDTV.x264-FiHTV </description>
       <enclosure url="https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6" length="168013625" type="application/x-nzb"/>
       <newznab:attr name="category" value="5030"/>
       <newznab:attr name="size" value="168013625"/>
       <newznab:attr name="files" value="17"/>
       <newznab:attr name="poster" value="[email protected] (yeahsure)"/>
       <newznab:attr name="prematch" value="1"/>
       <newznab:attr name="info" value="https://REMOVED.com/api?t=info&amp;id=427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d&amp;r=3bc4e94ef14337e4e2b490a3897c48f6"/>
       <newznab:attr name="grabs" value="0"/>
       <newznab:attr name="comments" value="0"/>
       <newznab:attr name="password" value="0"/>
       <newznab:attr name="usenetdate" value="Tue, 19 Sep 2017 10:07:47 +0200"/>
       <newznab:attr name="group" value="alt.binaries.teevee"/>
      </item>
    </channel>
    </rss>
    
    opened by Fossil01 3
  • Support RSS Content Module

    Support RSS Content Module

    opened by terinjokes 7
Owner
Jamie Hall
Jamie Hall
Simple, yet powerful Adcell go client to import data feeds into you projects.

adcell-go Simple, yet powerful Adcell go client to import data feeds into you projects. Explore the docs » View Demo · Report Bug · Request Feature Ta

Matthias Bruns 0 Oct 31, 2021
JSON-annotated protobuf definitions for NVD feeds

PROTONVD: Protobuf definitions for NVD Features: Encapsulates all fields in the NIST NVD Vulnerability JSON feeds. JSON annotations in proto definitio

Frederick F. Kautz IV 3 Feb 17, 2022
cli for updating a GoDaddy DNS record

Installation go install github.com/xujiahua/[email protected] $ godaddy-dns cli for godaddy dns Usage: godaddy-dns [command] Available Commands:

许嘉华 0 Nov 9, 2021
Updating DNS records for dynamically changing IPs via the Cloudflare API

Cloudflare Dynamic IP Server About The Project About The Project Updating DNS re

null 0 Dec 24, 2021
KeeneticRouteToVpn is simple app updating Keenetic Router rules for some hosts to go through VPN interface.

KeeneticRouteToVpn KeeneticRouteToVpn is simple app updating Keenetic Router rules for some hosts to go through VPN interface. It has defaults values

Vasilii Blazhnov 6 May 5, 2022
Podcast RSS feed sharing website

Yarr A website for storing, sharing and viewing podcasts in RSS format. Powering yarr.ps Building go build --tags "fts5" TODO Add ability to add pods

Kevin Roleke 0 Nov 6, 2021
IPIP.net officially supported IP database ipdb format parsing library

IPIP.net officially supported IP database ipdb format parsing library

null 225 Aug 5, 2022
Kick dropper is a very simple and leightweight demonstration of SQL querying, and injection by parsing URl's

__ __ __ __ _____ ______ | |/ |__|.----.| |--.______| \.----.| |.-----.-----.-----.----.

RE43P3R 2 Feb 6, 2022
🔎Sniffing and parsing mysql,redis,http,mongodb etc protocol. 抓包截取项目中的数据库请求并解析成相应的语句。

go-sniffer Capture mysql,redis,http,mongodb etc protocol... 抓包截取项目中的数据库请求并解析成相应的语句,如mysql协议会解析为sql语句,便于调试。 不要修改代码,直接嗅探项目中的数据请求。 中文使用说明 Support List: m

Four 1.6k Aug 2, 2022
Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go A Parsing Expression Grammar ( hence peg) is a way to create grammars similar in

Andrew Snodgrass 860 Aug 3, 2022
A library for the MIGP (Might I Get Pwned) protocolA library for the MIGP (Might I Get Pwned) protocol

MIGP library This contains a library for the MIGP (Might I Get Pwned) protocol. MIGP can be used to build privacy-preserving compromised credential ch

Cloudflare 21 Dec 16, 2021
A golang library about socks5, supports all socks5 commands. That Provides server and client and easy to use. Compatible with socks4 and socks4a.

socks5 This is a Golang implementation of the Socks5 protocol library. To see in this SOCKS Protocol Version 5. This library is also compatible with S

chenhao zhang 40 Jul 4, 2022
A simple Go library to toggle on and off pac(proxy auto configuration) for Windows, MacOS and Linux

pac pac is a simple Go library to toggle on and off pac(proxy auto configuration

null 0 Dec 26, 2021
Guilherme Biff Zarelli 3 Jun 6, 2022
Maidenhead - This golang library compress and decompress latitude and longitude coordinates into Maidenhead locator

The Maidenhead Locator System (a.k.a. QTH Locator and IARU Locator) is a geocode system used by amateur radio operators to succinctly describe their geographic coordinates.

Alessandro Lucaferro 2 Jan 30, 2022
🚀Gev is a lightweight, fast non-blocking TCP network library based on Reactor mode. Support custom protocols to quickly and easily build high-performance servers.

gev 中文 | English gev is a lightweight, fast non-blocking TCP network library based on Reactor mode. Support custom protocols to quickly and easily bui

徐旭 1.5k Aug 8, 2022
Gmqtt is a flexible, high-performance MQTT broker library that fully implements the MQTT protocol V3.1.1 and V5 in golang

中文文档 Gmqtt News: MQTT V5 is now supported. But due to those new features in v5, there area lots of breaking changes. If you have any migration problem

null 709 Aug 5, 2022
golibwireshark - Package use libwireshark library to decode pcap file and analyse dissection data.

golibwireshark Package golibwireshark use libwireshark library to decode pcap file and analyse dissection data. This package can only be used in OS li

Xiaoguang Wang 24 Jul 7, 2022