Warning
This is an internal project, and is not intended for public use. No support or stability guarantees are provided.
HAST is the core data structure used throughout this package for syntax highlighting and markdown processing. Understanding HAST helps you work effectively with the docs infra.
HAST (Hypertext Abstract Syntax Tree) is a specification for representing HTML as an abstract syntax tree. In this package, HAST serves as an intermediate format that bridges the gap between raw source code and rendered React components.
Key Characteristics:
Using HAST as an intermediate format provides several critical benefits for documentation infrastructure:
Syntax highlighting is computed once during compilation, not on every page load or request. This means:
No parsing or highlighting libraries are needed in the client bundle or server runtime:
Note
While server-side highlighting (e.g., with React Server Components) can offload work from the client, it still requires processing on every request. Build-time HAST generation eliminates this overhead entirely by computing highlighting once and caching the results.
HAST can be converted to any framework's component format:
hastToJsx()Unlike plain HTML strings, HAST can be converted to actual React components with custom handlers:
This is not possible with plain HTML strings, which can only be rendered via dangerouslySetInnerHTML without component-level control.
HAST is part of the unified ecosystem, which provides a vast collection of plugins for transforming content:
This plugin architecture allows you to build sophisticated content processing pipelines without writing transform logic from scratch. Since HAST is already a structured tree, you can traverse and modify it directly without the overhead of parsing HTML strings or the fragility of regex replacements.
Webpack can cache HAST as JSON between builds:
The typical flow of data through the HAST pipeline:
Source Code
↓
Parse & Format (Build Time)
↓
HAST (Syntax-highlighted AST)
↓
Serialize to JSON
↓
Store in Bundle
↓
Load at Runtime
↓
Convert to React.ReactNode
↓
Render Components
A HAST tree consists of nodes with the following structure:
interface HastRoot {
type: 'root';
children: HastContent[];
}
interface HastElement {
type: 'element';
tagName: string;
properties?: Record<string, any>;
children?: HastContent[];
}
interface HastText {
type: 'text';
value: string;
}
type HastContent = HastElement | HastText;
Here's what a simple HAST structure looks like for highlighted code:
{
"type": "root",
"children": [
{
"type": "element",
"tagName": "code",
"properties": { "className": ["language-ts"] },
"children": [
{
"type": "element",
"tagName": "span",
"properties": { "className": ["token", "keyword"] },
"children": [{ "type": "text", "value": "const" }]
},
{ "type": "text", "value": " " },
{
"type": "element",
"tagName": "span",
"properties": { "className": ["token", "variable"] },
"children": [{ "type": "text", "value": "x" }]
}
]
}
]
}
This represents: const x with syntax highlighting classes applied.
When working with HAST from untrusted sources (e.g., JSON from external APIs or user-generated content), sanitize the HAST tree before converting it to React components:
import { hastToJsx } from '@mui/internal-docs-infra/hastToJsx';
import { sanitize } from 'hast-util-sanitize';
// Untrusted HAST from external source
const untrustedHast = await fetch('/api/user-content').then((r) => r.json());
// Sanitize before rendering
const sanitizedHast = sanitize(untrustedHast);
const safeContent = hastToJsx(sanitizedHast);
The hast-util-sanitize package removes potentially dangerous elements and attributes, preventing XSS attacks.
Note
For HAST generated at build time by this package (from
loadPrecomputedCodeHighlighterorloadPrecomputedTypes), sanitization is not necessary since the content comes from your own source files.
parseSource: Converts source code to HAST with syntax highlighting using Starry NighthastToJsx: Converts HAST nodes to React elementshastOrJsonToJsx: Handles both HAST and serialized JSON formatsHAST transformers follow a naming convention based on which AST they operate on:
Rehype plugins (operate on HAST/HTML AST):
transformHtmlCodeBlock: Processes <pre><code> blocks in HAST and precomputes syntax highlighting datatransformHtmlCodeInline: Applies inline syntax highlighting to HASTtransformHtml* functions work with HAST nodesRemark plugins (operate on MDAST/Markdown AST):
transformMarkdownCode: Groups markdown code fences with variants and converts to semantic HTML/HASTtransformMarkdown* functions work with MDAST, often producing HAST as outputReact hooks (consume HAST at runtime):
useTypes: Converts type metadata (HAST) to React nodesEnhancers are rehype plugins that add visual annotations to an existing HAST tree without changing its plain text output. They may add wrapper elements, CSS class names, data-* attributes, or metadata — but must never modify or remove text nodes.
This constraint prevents layout shift: during initial render, the browser shows a plain text version of the code. When the enhanced HAST replaces it, the text must remain identical so the swap is seamless.
Built-in enhancers:
enhanceCodeInline: Fixes tag bracket wrapping, reclassifies syntax tokens, and styles nullish valuesenhanceCodeTypes: Links type identifiers and properties to documentation anchorsenhanceCodeEmphasis: Adds emphasis styling to code elementsEnhancers vs transformers:
| Aspect | Enhancers | Transformers |
|---|---|---|
| Purpose | Add visual annotations to HAST | Modify the source code itself |
| Operates on | HAST (parsed syntax tree) | Plain text source code |
| Plain text impact | Must NOT change plain text output | Can change plain text output |
Enhancers are used in two contexts:
sourceEnhancers to CodeHighlighter or useCode (these use the SourceEnhancer function signature)enhancers to abstractCreateTypes (these are standard rehype PluggableList entries)HAST is used extensively throughout the docs-infra package:
The CodeHighlighter component works with HAST:
The loadPrecomputedTypes loader:
Using HAST provides measurable performance improvements over both client-side and server-side highlighting:
For a documentation site with 100 code examples:
Client-side highlighting:
Server-side highlighting (per request):
Build-time HAST (this approach):
HAST is part of the unified ecosystem:
Build-time HAST is ideal for:
Note
Build-time HAST can still be enhanced and transformed at server or client render-time without reparsing. Since it's already a structured tree, you can traverse and modify it efficiently for dynamic customization while keeping the expensive syntax highlighting precomputed.
Server-side rendering (RSC) works well for:
Client-side rendering is necessary for:
Consider hybrid approaches:
Tip
The choice isn't binary - you can use different approaches for different types of content in the same application. For example, use build-time HAST for documentation pages while using server-side rendering for user-generated content sections.
You can still customize precomputed HAST at render-time on the server or client without reparsing HTML or using regex.
import { hastToJsx } from '@mui/internal-docs-infra/pipeline/hastUtils';
const components = {
code: (props: any) => <CodeBlock {...props} />, // enhance code blocks
a: (props: any) => <a {...props} rel={props.rel ?? 'noopener noreferrer'} />, // enforce safe links
};
export function RenderHast({ hast }: { hast: any }) {
return <>{hastToJsx(hast, components)}</>;
}
import type { Root, Element } from 'hast';
function addNoopener(hast: Root): Root {
const stack: any[] = [hast];
while (stack.length) {
const node: any = stack.pop();
if (node.type === 'element' && node.tagName === 'a') {
node.properties = {
...(node.properties || {}),
rel: node.properties?.rel ?? 'noopener noreferrer',
};
}
if (node.children) stack.push(...node.children);
}
return hast;
}
// Server or client render-time
const sanitized = sanitize(hast); // if the source is untrusted
const enhanced = addNoopener(sanitized);
const jsx = hastToJsx(enhanced, components);
Do:
hast)Don't: