MarkdownTextSplitter
If your content is in Markdown format then MarkdownTextSplitter
. This class will split your content into documents based on the Markdown headers. For example, if you have the following Markdown content:
# Header 1
This is some content.
## Header 2
This is some more content.
# Header 3
This is even more content.
Then the MarkdownTextSplitter
will split the content into three documents:
import { MarkdownTextSplitter } from "langchain/text_splitter";
const text = `# Header 1
This is some content.
## Header 2
This is some more content.
# Header 3
This is even more content.`;
const splitter = new MarkdownTextSplitter();
const output = await splitter.createDocuments([text], {
metadata: "something",
});
/*
[
{
"pageContent": "# Header 1\n\nThis is some content.",
"metadata": "something"
},
{
"pageContent": "## Header 2\n\nThis is some more content.",
"metadata": "something"
},
{
"pageContent": "# Header 3\n\nThis is even more content.",
"metadata": "something"
}
]
*/