nemoguardrails.kb.utils

View as Markdown

Module Contents

Functions

NameDescription
split_markdown_in_topic_chunksSplits a markdown content into topic chunks.

API

nemoguardrails.kb.utils.split_markdown_in_topic_chunks(
content: str,
max_chunk_size: int = 400
) -> typing.List[dict]

Splits a markdown content into topic chunks.

This function takes a markdown content as input and divides it into topic chunks based on headings and subsections. Each chunk includes a title and body, with an optional maximum size.

Parameters:

  • content (str): The markdown content to be split.
  • max_chunk_size (int): The maximum size of a chunk. Default is 400.

Returns: List[dict]: A list of dictionaries, each representing a topic chunk with ‘title’ and ‘body’ keys.

Example:

1content = "# Introduction
2
3This is an introduction.
4## Section 1
5
6Content of section 1."
7chunks = split_markdown_in_topic_chunks(content, max_chunk_size=500)

Note:

  • The function considers ’#’ as heading markers.
  • Meta information can be included at the beginning of the markdown using triple backticks.