morpheus.controllers.rss_controller.RSSController#

class RSSController(
feed_input,
batch_size=128,
run_indefinitely=None,
enable_cache=False,
cache_dir='./.cache/http',
cooldown_interval=600,
request_timeout=2.0,
strip_markup=False,
stop_after=0,
interval_secs=600,
should_stop_fn=None,
df_type='cudf',
)[source]#

Bases: object

RSSController handles fetching and processing of RSS feed entries.

Parameters:
feed_inputstr or list[str]

The URL or file path of the RSS feed.

batch_sizeint, optional, default = 128

Number of feed items to accumulate before creating a DataFrame.

run_indefinitelybool, optional

Whether to run the processing indefinitely. If set to True, the controller will continue fetching and processing If set to False, the controller will stop processing after the feed is fully fetched and processed. If not provided any value and if feed_input is of type URL, the controller will run indefinitely. Default is None.

enable_cachebool, optional, default = False

Enable caching of RSS feed request data.

cache_dirstr, optional, default = “./.cache/http”

Cache directory for storing RSS feed request data.

cooldown_intervalint, optional, default = 600

Cooldown interval in seconds if there is a failure in fetching or parsing the feed.

request_timeoutfloat, optional, default = 2.0

Request timeout in secs to fetch the feed.

strip_markupbool, optional, default = False

When true, strip HTML & XML markup from the from the content, summary and title fields.

stop_after: int, default = 0

Stops ingesting after emitting stop_after records (rows in the dataframe). Useful for testing. Disabled if 0

interval_secsfloat, optional, default = 600

Interval in seconds between fetching new feed items.

should_stop_fn: Callable[[], bool]

Function that returns a boolean indicating if the watcher should stop processing files.

Attributes:
run_indefinitely

Property that determines to run the source indefinitely

Methods

feed_generator(subscription)

Fetch RSS feed entries and yield as MessageMeta object.

fetch_dataframes()

Fetch and process RSS feed entries.

get_feed_stats(feed_url)

Get feed url stats.

is_url(feed_input)

Check if the provided url is a valid URL.

parse_feeds()

Parse the RSS feed using the feedparser library.

feed_generator(subscription)[source]#

Fetch RSS feed entries and yield as MessageMeta object.

fetch_dataframes()[source]#

Fetch and process RSS feed entries.

Yields:
DataFrameType

A DataFrame containing feed entry data.

Raises:
Exception

If there is error fetching or processing feed entries.

get_feed_stats(feed_url)[source]#

Get feed url stats.

Parameters:
feed_urlstr

Feed URL that is part of feed_input passed to the constructor.

Returns:
FeedStats

FeedStats instance for the given feed URL if it exists.

Raises:
ValueError

If the feed URL is not found in the feed url provided to the constructor.

classmethod is_url(feed_input)[source]#

Check if the provided url is a valid URL.

Parameters:
feed_inputstr

The url string to be checked.

Returns:
bool

True if the url is a valid URL, False otherwise.

parse_feeds()[source]#

Parse the RSS feed using the feedparser library.

Yields:
feedparser.FeedParserDict

The parsed feed content.

property run_indefinitely#

Property that determines to run the source indefinitely