Docs
HTML Parser

HTML Parser

Copy paste from HTML to Slate.

HTML

By default, when you paste content into the Slate editor, it will utilize the clipboard's 'text/plain'data. While this is suitable for certain scenarios, there are times when you want users to be able to paste content while preserving its formatting. To achieve this, your editor should be capable of handling 'text/html'data.
To experience the seamless preservation of formatting, simply copy and paste rendered HTML rich text content (not the source code) from another website into this editor. You'll notice that the formatting of the pasted content is maintained.

Many Plate plugins include HTML deserialization rules. These rules define how HTML elements and styles are mapped to Plate's node types and attributes.

HTML -> Slate

Usage

The editor.api.html.deserialize function allows you to convert HTML content into Slate value:

import { createPlateEditor } from '@udecode/plate-common/react';
 
const editor = createPlateEditor({
  plugins: [
    // all plugins that you want to deserialize
  ]
})
editor.children = editor.api.html.deserialize('<p>Hello, world!</p>')

Plugin Deserialization Rules

Here's a comprehensive list of plugins that support HTML deserialization, along with their corresponding HTML elements and styles:

Text Formatting

  • BoldPlugin: <strong>, <b>, or style="font-weight: 600|700|bold"
  • ItalicPlugin: <em>, <i>, or style="font-style: italic"
  • UnderlinePlugin: <u> or style="text-decoration: underline"
  • StrikethroughPlugin: <s>, <del>, <strike>, or style="text-decoration: line-through"
  • SubscriptPlugin: <sub> or style="vertical-align: sub"
  • SuperscriptPlugin: <sup> or style="vertical-align: super"
  • CodePlugin: <code> or style="font-family: Consolas" (not within a <pre> tag)
  • KbdPlugin: <kbd>

Paragraphs and Headings

  • ParagraphPlugin: <p>
  • HeadingPlugin: <h1>, <h2>, <h3>, <h4>, <h5>, <h6>

Lists

  • ListPlugin:
    • Unordered List: <ul>
    • Ordered List: <ol>
    • List Item: <li>
  • IndentListPlugin:
    • List Item: <li>
    • Parses aria-level attribute for indentation

Blocks

  • BlockquotePlugin: <blockquote>
  • CodeBlockPlugin:
    • Deserializes <pre> elements
    • Deserializes <p> elements with fontFamily: 'Consolas' style
    • Splits content into code lines
    • Preserves language information if available
  • HorizontalRulePlugin: <hr>
  • LinkPlugin: <a>
  • ImagePlugin: <img>
  • MediaEmbedPlugin: <iframe>

Tables

  • TablePlugin:
    • Table: <table>
    • Table Row: <tr>
    • Table Cell: <td>
    • Table Header Cell: <th>

Text Styling

  • FontBackgroundColorPlugin: style="background-color: *"
  • FontColorPlugin: style="color: *"
  • FontFamilyPlugin: style="font-family: *"
  • FontSizePlugin: style="font-size: *"
  • FontWeightPlugin: style="font-weight: *"
  • HighlightPlugin: <mark>

Layout and Formatting

  • AlignPlugin: style="text-align: *"
  • LineHeightPlugin: style="line-height: *"

Deserialization Properties

Plugins can define various properties to control HTML deserialization:

  • parse: A function to parse an HTML element into a Plate node
  • query: A function that determines if the deserializer should be applied
  • rules: An array of rule objects that define valid HTML elements and attributes
  • isElement: Indicates if the plugin deserializes elements
  • isLeaf: Indicates if the plugin deserializes leaf nodes
  • attributeNames: List of HTML attribute names to store in node.attributes
  • withoutChildren: Excludes child nodes from deserialization
  • rules: Array of rule objects for element matching
    • validAttribute: Valid element attributes
    • validClassName: Valid CSS class name
    • validNodeName: Valid HTML tag names
    • validStyle: Valid CSS styles

Extending Deserialization

You can extend or customize the deserialization behavior of any plugin. Here's an example of how you might extend the CodeBlockPlugin:

import { CodeBlockPlugin } from '@udecode/plate-code-block/react';
 
const CustomCodeBlockPlugin = CodeBlockPlugin.extend({
  parsers: {
    html: {
      deserializer: {
        parse: ({ element }) => {
          const language = element.getAttribute('data-language');
          const textContent = element.textContent || '';
          const lines = textContent.split('\n');
          
          return {
            type: CodeBlockPlugin.key,
            language,
            children: lines.map((line) => ({
              type: CodeLinePlugin.key,
              children: [{ text: line }],
            })),
          };
        },
        rules: [
          ...CodeBlockPlugin.parsers.html.deserializer.rules,
          { validAttribute: 'data-language' },
        ],
      },
    },
  },
});

This customization adds support for a data-language attribute in code block deserialization and preserves the language information.

Advanced Deserialization Example

The IndentListPlugin provides a more complex deserialization process:

  1. It transforms HTML list structures into indented paragraphs.
  2. It handles nested lists by preserving the indentation level.
  3. It uses the aria-level attribute to determine the indentation level.

Here's a simplified version of its deserialization logic:

export const IndentListPlugin = createTSlatePlugin<IndentListConfig>({
  // ... other configurations ...
  parsers: {
    html: {
      deserializer: {
        isElement: true,
        parse: ({ editor, element, getOptions }) => ({
          indent: Number(element.getAttribute('aria-level')),
          listStyleType: element.style.listStyleType,
          type: editor.getType(ParagraphPlugin),
        }),
        rules: [
          {
            validNodeName: 'LI',
          },
        ],
      },
    },
  },
});

API

editor.api.html.deserialize

Deserialize HTML string into Slate value.

Parameters

    options.element HTMLElement | string

    The HTML element or string to deserialize.

    options.collapseWhiteSpace optional boolean

    Flag to enable or disable the removal of whitespace from the serialized HTML.

    • Default: true (Whitespace will be removed.)

Returns

Collapse all

    The deserialized Slate value.

Slate -> React -> HTML

Installation

npm install @udecode/plate-html

Usage

// ...
import { HtmlReactPlugin } from '@udecode/plate-html/react';
import { DndProvider } from 'react-dnd';
import { HTML5Backend } from 'react-dnd-html5-backend';
 
const editor = createPlateEditor({ 
  plugins: [
    HtmlReactPlugin
    // all plugins that you want to serialize
  ],
  override: {
    // do not forget to add your custom components, otherwise it won't work
    components: createPlateUI(),
  },
});
 
const html = editor.api.htmlReact.serialize({
  nodes: editor.children,
  // if you use @udecode/plate-dnd
  dndWrapper: (props) => <DndProvider backend={HTML5Backend} {...props} />,
});

API

editor.api.htmlReact.serialize

Convert Slate Nodes into HTML string.

Parameters

Collapse all

    Options to control the HTML serialization process.

Returns

Collapse all

    A HTML string representing the Slate nodes.