Skip to content

docs: add html2rss-config #922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions feed-directory/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
title: Feed Directory
nav_order: 2
noindex: true
has_children: true
---

<div class="text-center">
Expand Down
23 changes: 20 additions & 3 deletions get-involved/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,24 @@ Here are some of the ways you can contribute to the `html2rss` project:

Are you missing an RSS feed for a website? You can create your own feed config and share it with the community. It's a great way to get started with `html2rss` and help other users.

[**Learn how to create a feed config**](https://github.com/html2rss/html2rss-configs)
The html2rss "ecosystem" is a community project. We welcome contributions of all kinds. This includes new feed configs, suggesting and implementing features, providing bug fixes, documentation improvements, and any other kind of help.

Which way you choose to add a new feed config is up to you. You can do it manually. Please [submit a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork)!

After you're done, you can test your feed config by running `bundle exec html2rss feed lib/html2rss/configs/<domainname.tld>/<path>.yml`.

#### Preferred way: manually

1. Fork the `html2rss-config` git repository and run `bundle install` (you need to have Ruby >= 3.3 installed).
2. Create a new folder and file following this convention: `lib/html2rss/configs/<domainname.tld>/<path>.yml`
3. Create the feed config in the `<path>.yml` file.
4. Add this spec file in the `spec/html2rss/configs/<domainname.tld>/<path>_spec.rb` file.

```ruby
RSpec.describe '<domainname.tld>/<path>' do
include_examples 'config.yml', described_class
end
```

### 2. Improve this Website

Expand All @@ -37,13 +54,13 @@ This website is built with Jekyll and is hosted on GitHub Pages. If you have any

The [`html2rss-web`](https://github.com/html2rss/html2rss-web) project is a web application that allows you to create and manage your RSS feeds through a user-friendly interface. You can host your own public instance to help other users create feeds.

[**Learn how to host a public instance**](https://github.com/html2rss/html2rss-web/wiki/Instances)
[**Learn how to host a public instance**]({{ '/web-application/how-to/deployment' | relative_url }})

### 4. Improve the `html2rss` Gem

Are you a Ruby developer? You can help us improve the core `html2rss` gem. Whether you're fixing a bug, adding a new feature, or improving the documentation, your contributions are welcome.

[**Check out the repository on GitHub**](https://github.com/html2rss/html2rss)
[**Check out the documentation for the `html2rss` Gem**]({{ '/ruby-gem/' | relative_url }})

### 5. Report Bugs & Discuss Features

Expand Down
173 changes: 173 additions & 0 deletions html2rss-configs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
layout: default
title: html2rss-configs
has_children: false
nav_order: 5
---

# Creating Feed Configurations

Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed.

You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).

---

## Core Concepts

An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`.

### The `channel` Block

The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL.

**Example:**

```yaml
channel:
url: https://example.com/blog
title: My Awesome Blog
```

For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}).

### The `selectors` Block

The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`).

**Example:**

```yaml
selectors:
items:
selector: "article.post"
title:
selector: "h2 a"
link:
selector: "h2 a"
```

For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}).

---

## Tutorial: Your First Config

This tutorial walks you through creating a basic configuration file from scratch.

### Step 1: Identify the Target Content

First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure:

```html
<div class="posts">
<article class="post">
<h2><a href="/post/1">First Post</a></h2>
<p>This is the summary of the first post.</p>
</article>
<article class="post">
<h2><a href="/post/2">Second Post</a></h2>
<p>This is the summary of the second post.</p>
</article>
</div>
```

### Step 2: Create the Config File and Define the Channel

Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`:

```yaml
# my-blog.yml
channel:
url: https://example.com/blog
title: My Awesome Blog
description: The latest news from my awesome blog.
```

### Step 3: Define the Selectors

Next, add the `selectors` block to extract the content for each post.

```yaml
# my-blog.yml
selectors:
items:
selector: "article.post"
title:
selector: "h2 a"
link:
selector: "h2 a"
description:
selector: "p"
```

- `items`: This CSS selector identifies the container for each article.
- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag.

---

## Advanced Techniques

### Handling Pagination

To aggregate content from multiple pages, use the `pagination` option within the `items` selector.

```yaml
selectors:
items:
selector: ".post-listing .post"
pagination:
selector: ".pagination .next-page"
limit: 5 # Optional: sets the maximum number of pages to follow
```

### Dynamic Feeds with Parameters

Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.

```yaml
# news-search.yml
parameters:
query:
type: string
default: "technology"

channel:
url: "https://news.example.com/search?q={query}"
title: "News results for '{query}'"
```

---

## Contributing Your Config

Have you created a config that others might find useful? We strongly encourage you to contribute it to the project! By sharing your config, you make it available to all users of the public `html2rss-web` service and the Feed Directory.

To contribute, please [create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) to the `html2rss-configs` repository.

---

## Usage and Integration

### With `html2rss-web`

Once your pull request is reviewed and merged, your config will become available on the public [`html2rss-web`]({{ '/web-application/' | relative_url }}) instance. You can then access it at the path `/<domainname.tld/path>.rss`.

### Programmatic Usage in Ruby

You can also use `html2rss-configs` programmatically in your Ruby applications.

Add this to your Gemfile:

```ruby
gem 'html2rss-configs', git: 'https://github.com/html2rss/html2rss-configs.git'
```

And use it in your code:

```ruby
require 'html2rss/configs'

config = Html2rss::Configs.find_by_name('domainname.tld/whatever')
rss = Html2rss.feed(config)
```