first commit

This commit is contained in:
yuin 2019-04-26 20:27:01 +09:00
commit dd89404e04
46 changed files with 24605 additions and 0 deletions

13
.gitignore vendored Normal file
View file

@ -0,0 +1,13 @@
# Binaries for programs and plugins
*.exe
*.exe~
*.dll
*.so
*.dylib
# Test binary, build with `go test -c`
*.test
*.pprof
# Output of the go coverage tool, specifically when used with LiteIDE
*.out

7
Makefile Normal file
View file

@ -0,0 +1,7 @@
.PHONY: test
test:
go test -coverprofile=profile.out -coverpkg=github.com/yuin/goldmark,github.com/yuin/goldmark/ast,github.com/yuin/goldmark/extension,github.com/yuin/goldmark/extension/ast,github.com/yuin/goldmark/parser,github.com/yuin/goldmark/renderer,github.com/yuin/goldmark/renderer/html,github.com/yuin/goldmark/text,github.com/yuin/goldmark/util ./...
cov: test
go tool cover -html=profile.out

129
README.md Normal file
View file

@ -0,0 +1,129 @@
goldmark
==========================================
> A markdown parser written in Go. Easy to extend, standard compliant, well structured.
goldmark is compliant to CommonMark 0.29.
Motivation
----------------------
I need a markdown parser for Go that meets following conditions:
- Easy to extend.
- Markdown is poor in document expressions compared with other light markup languages like restructuredText.
- We have extended a markdown syntax. i.e. : PHPMarkdownExtra, Github Flavored Markdown.
- Standard compliant.
- Markdown has many dialects.
- Github Flavored Markdown is widely used and it is based on CommonMark aside from whether CommonMark is good specification or not.
- CommonMark is too complicated and hard to implement.
- Well structured.
- AST based, and preserves source potision of nodes.
- Written in pure Go.
[golang-commonmark](https://gitlab.com/golang-commonmark/markdown) may be a good choice, but it seems copy of the [markdown-it](https://github.com/markdown-it) .
[blackfriday.v2](https://github.com/russross/blackfriday/tree/v2) is a fast and widely used implementation, but it is not CommonMark compliant and can not be extended from outside of the package since it's AST is not interfaces but structs.
As mentioned above, CommonMark is too complicated and hard to implement, So Markdown parsers base on CommonMark barely exist.
Usage
----------------------
Convert Markdown documents with the CommonMark compliant mode:
```go
var buf bytes.Buffer
if err := goldmark.Convert(source, &buf); err != nil {
panic(err)
}
```
Customize a parser and a renderer:
```go
md := goldmark.NewMarkdown(
goldmark.WithExtensions(extension.GFM),
goldmark.WithParserOptions(
parser.WithHeadingID(),
),
goldmark.WithRendererOptions(
html.WithSoftLineBreak(true),
html.WithXHTML(true),
),
)
var buf bytes.Buffer
if err := md.Convert(source, &buf); err != nil {
panic(err)
}
```
Parser and Renderer options
------------------------------
### Parser options
| Functional option | Type | Description |
| ----------------- | ---- | ----------- |
| `parser.WithBlockParsers` | List of `util.PrioritizedSlice` whose elements are `parser.BlockParser` | Parsers for parsing block level elements. |
| `parser.WithInlineParsers` | List of `util.PrioritizedSlice` whose elements are `parser.InlineParser` | Parsers for parsing inline level elements. |
| `parser.WithParagraphTransformers` | List of `util.PrioritizedSlice` whose elements are `parser.ParagraphTransformer` | Transformers for transforming paragraph nodes. |
| `parser.WithHeadingID` | `-` | Enables custom heading ids( `{#custom-id}` ) and auto heading ids. |
| `parser.WithFilterTags` | `...string` | HTML tag names forbidden in HTML blocks and Raw HTMLs. |
### HTML Renderer options
| Functional option | Type | Description |
| ----------------- | ---- | ----------- |
| `html.WithWriter` | `html.Writer` | `html.Writer` for writing contents to an `io.Writer`. |
| `html.WithSoftLineBreak` | `-` | Render new lines as `<br>` if true, otherwise `\n` .|
| `html.WithXHTML` | `-` | Render as XHTML. |
### Built-in extensions
- `extension.Table`
- `extension.Strikethrough`
- `extension.Linkify`
- `extension.TaskList`
- `extension.GFM`
- This extension enables Table, Strikethrough, Linkify and TaskList.
In addition, this extension sets some tags to `parser.FilterTags` .
Create extensions
--------------------
**TODO**
See `extension` directory for examples of extensions.
Summary:
1. Define AST Node as a struct in which `ast.BaseBlock` or `ast.BaseInline` is embedded.
2. Write a parser that implements `parser.BlockParser` or `parser.InlineParser`.
3. Write a renderer that implements `renderer.NodeRenderer`.
4. Define your goldmark extension that implements `goldmark.Extender`.
Benchmark
--------------------
You can run this benchmark in the `_benchmark` directory.
blackfriday v2 is fastest, but it is not CommonMark compiliant and not an extensible.
Though goldmark builds clean extensible AST structure and get full compliance with
Commonmark, it is resonably fast and less memory consumption.
```
BenchmarkGoldMark-4 200 7291402 ns/op 2259603 B/op 16867 allocs/op
BenchmarkGolangCommonMark-4 200 7709939 ns/op 3053760 B/op 18682 allocs/op
BenchmarkBlackFriday-4 300 5776369 ns/op 3356386 B/op 17480 allocs/op
```
Donation
--------------------
BTC: 1NEDSyUmo4SMTDP83JJQSWi1MvQUGGNMZB
License
--------------------
MIT
Author
--------------------
Yusuke Inuzuka

9702
_benchmark/_data.md Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,55 @@
package main
import (
"bytes"
"io/ioutil"
"testing"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/renderer/html"
"gitlab.com/golang-commonmark/markdown"
"gopkg.in/russross/blackfriday.v2"
)
func BenchmarkGoldMark(b *testing.B) {
b.ResetTimer()
source, err := ioutil.ReadFile("_data.md")
if err != nil {
panic(err)
}
markdown := goldmark.New(goldmark.WithRendererOptions(html.WithXHTML()))
var out bytes.Buffer
markdown.Convert([]byte(""), &out)
for i := 0; i < b.N; i++ {
out.Reset()
if err := markdown.Convert(source, &out); err != nil {
panic(err)
}
}
}
func BenchmarkGolangCommonMark(b *testing.B) {
b.ResetTimer()
source, err := ioutil.ReadFile("_data.md")
if err != nil {
panic(err)
}
md := markdown.New(markdown.XHTMLOutput(true))
for i := 0; i < b.N; i++ {
md.RenderToString(source)
}
}
func BenchmarkBlackFriday(b *testing.B) {
b.ResetTimer()
source, err := ioutil.ReadFile("_data.md")
if err != nil {
panic(err)
}
for i := 0; i < b.N; i++ {
blackfriday.Run(source)
}
}

5194
_test/spec.json Normal file

File diff suppressed because it is too large Load diff

342
ast/ast.go Normal file
View file

@ -0,0 +1,342 @@
// Package ast defines AST nodes that represent markdown elements.
package ast
import (
"bytes"
"fmt"
textm "github.com/yuin/goldmark/text"
"strings"
)
// A NodeType indicates what type a node belongs to.
type NodeType int
const (
// BlockNode indicates that a node is kind of block nodes.
BlockNode NodeType = iota + 1
// InlineNode indicates that a node is kind of inline nodes.
InlineNode
)
// A Node interface defines basic AST node functionalities.
type Node interface {
// Type returns a type of this node.
Type() NodeType
// NextSibling returns a next sibling node of this node.
NextSibling() Node
// PreviousSibling returns a previous sibling node of this node.
PreviousSibling() Node
// Parent returns a parent node of this node.
Parent() Node
// SetParent sets a parent node to this node.
SetParent(Node)
// SetPreviousSibling sets a previous sibling node to this node.
SetPreviousSibling(Node)
// SetNextSibling sets a next sibling node to this node.
SetNextSibling(Node)
// HasChildren returns true if this node has any children, otherwise false.
HasChildren() bool
// ChildCount returns a total number of children.
ChildCount() int
// FirstChild returns a first child of this node.
FirstChild() Node
// LastChild returns a last child of this node.
LastChild() Node
// AppendChild append a node child to the tail of the children.
AppendChild(self, child Node)
// RemoveChild removes a node child from this node.
// If a node child is not children of this node, RemoveChild nothing to do.
RemoveChild(self, child Node)
// RemoveChildren removes all children from this node.
RemoveChildren(self Node)
// ReplaceChild replace a node v1 with a node insertee.
// If v1 is not children of this node, ReplaceChild append a insetee to the
// tail of the children.
ReplaceChild(self, v1, insertee Node)
// InsertBefore inserts a node insertee before a node v1.
// If v1 is not children of this node, InsertBefore append a insetee to the
// tail of the children.
InsertBefore(self, v1, insertee Node)
// InsertAfterinserts a node insertee after a node v1.
// If v1 is not children of this node, InsertBefore append a insetee to the
// tail of the children.
InsertAfter(self, v1, insertee Node)
// Dump dumps an AST tree structure to stdout.
// This function completely aimed for debugging.
// level is a indent level. Implementer should indent informations with
// 2 * level spaces.
Dump(source []byte, level int)
// Text returns text values of this node.
Text(source []byte) []byte
// HasBlankPreviousLines returns true if the row before this node is blank,
// otherwise false.
// This method is valid only for block nodes.
HasBlankPreviousLines() bool
// SetBlankPreviousLines sets whether the row before this node is blank.
// This method is valid only for block nodes.
SetBlankPreviousLines(v bool)
// Lines returns text segments that hold positions in a source.
// This method is valid only for block nodes.
Lines() *textm.Segments
// SetLines sets text segments that hold positions in a source.
// This method is valid only for block nodes.
SetLines(*textm.Segments)
// IsRaw returns true if contents should be rendered as 'raw' contents.
IsRaw() bool
}
// A BaseNode struct implements the Node interface.
type BaseNode struct {
firstChild Node
lastChild Node
parent Node
next Node
prev Node
}
func ensureIsolated(v Node) {
if p := v.Parent(); p != nil {
p.RemoveChild(p, v)
}
}
// HasChildren implements Node.HasChildren .
func (n *BaseNode) HasChildren() bool {
return n.firstChild != nil
}
// SetPreviousSibling implements Node.SetPreviousSibling .
func (n *BaseNode) SetPreviousSibling(v Node) {
n.prev = v
}
// SetNextSibling implements Node.SetNextSibling .
func (n *BaseNode) SetNextSibling(v Node) {
n.next = v
}
// PreviousSibling implements Node.PreviousSibling .
func (n *BaseNode) PreviousSibling() Node {
return n.prev
}
// NextSibling implements Node.NextSibling .
func (n *BaseNode) NextSibling() Node {
return n.next
}
// RemoveChild implements Node.RemoveChild .
func (n *BaseNode) RemoveChild(self, v Node) {
if v.Parent() != self {
return
}
prev := v.PreviousSibling()
next := v.NextSibling()
if prev != nil {
prev.SetNextSibling(next)
} else {
n.firstChild = next
}
if next != nil {
next.SetPreviousSibling(prev)
} else {
n.lastChild = prev
}
v.SetParent(nil)
v.SetPreviousSibling(nil)
v.SetNextSibling(nil)
}
// RemoveChildren implements Node.RemoveChildren .
func (n *BaseNode) RemoveChildren(self Node) {
for c := n.firstChild; c != nil; c = c.NextSibling() {
c.SetParent(nil)
c.SetPreviousSibling(nil)
c.SetNextSibling(nil)
}
n.firstChild = nil
n.lastChild = nil
}
// FirstChild implements Node.FirstChild .
func (n *BaseNode) FirstChild() Node {
return n.firstChild
}
// LastChild implements Node.LastChild .
func (n *BaseNode) LastChild() Node {
return n.lastChild
}
// ChildCount implements Node.ChildCount .
func (n *BaseNode) ChildCount() int {
count := 0
for c := n.firstChild; c != nil; c = c.NextSibling() {
count++
}
return count
}
// Parent implements Node.Parent .
func (n *BaseNode) Parent() Node {
return n.parent
}
// SetParent implements Node.SetParent .
func (n *BaseNode) SetParent(v Node) {
n.parent = v
}
// AppendChild implements Node.AppendChild .
func (n *BaseNode) AppendChild(self, v Node) {
ensureIsolated(v)
if n.firstChild == nil {
n.firstChild = v
v.SetNextSibling(nil)
v.SetPreviousSibling(nil)
} else {
last := n.lastChild
last.SetNextSibling(v)
v.SetPreviousSibling(last)
}
v.SetParent(self)
n.lastChild = v
}
// ReplaceChild implements Node.ReplaceChild .
func (n *BaseNode) ReplaceChild(self, v1, insertee Node) {
n.InsertBefore(self, v1, insertee)
n.RemoveChild(self, v1)
}
// InsertAfter implements Node.InsertAfter .
func (n *BaseNode) InsertAfter(self, v1, insertee Node) {
n.InsertBefore(self, v1.NextSibling(), insertee)
}
// InsertBefore implements Node.InsertBefore .
func (n *BaseNode) InsertBefore(self, v1, insertee Node) {
if v1 == nil {
n.AppendChild(self, insertee)
return
}
ensureIsolated(insertee)
if v1.Parent() == self {
c := v1
prev := c.PreviousSibling()
if prev != nil {
prev.SetNextSibling(insertee)
insertee.SetPreviousSibling(prev)
} else {
n.firstChild = insertee
insertee.SetPreviousSibling(nil)
}
insertee.SetNextSibling(c)
c.SetPreviousSibling(insertee)
insertee.SetParent(self)
}
}
// Text implements Node.Text .
func (n *BaseNode) Text(source []byte) []byte {
var buf bytes.Buffer
for c := n.firstChild; c != nil; c = c.NextSibling() {
buf.Write(c.Text(source))
}
return buf.Bytes()
}
// DumpHelper is a helper function to implement Node.Dump.
// name is a name of the node.
// kv is pairs of an attribute name and an attribute value.
// cb is a function called after wrote a name and attributes.
func DumpHelper(v Node, source []byte, level int, name string, kv map[string]string, cb func(int)) {
indent := strings.Repeat(" ", level)
fmt.Printf("%s%s {\n", indent, name)
indent2 := strings.Repeat(" ", level+1)
if v.Type() == BlockNode {
fmt.Printf("%sRawText: \"", indent2)
for i := 0; i < v.Lines().Len(); i++ {
line := v.Lines().At(i)
fmt.Printf("%s", line.Value(source))
}
fmt.Printf("\"\n")
fmt.Printf("%sHasBlankPreviousLines: %v\n", indent2, v.HasBlankPreviousLines())
}
if kv != nil {
for name, value := range kv {
fmt.Printf("%s%s: %s\n", indent2, name, value)
}
}
if cb != nil {
cb(level + 1)
}
for c := v.FirstChild(); c != nil; c = c.NextSibling() {
c.Dump(source, level+1)
}
fmt.Printf("%s}\n", indent)
}
// WalkStatus represents a current status of the Walk function.
type WalkStatus int
const (
// WalkStop indicates no more walking needed.
WalkStop = iota + 1
// WalkSkipChildren indicates that Walk wont walk on children of current
// node.
WalkSkipChildren
// WalkContinue indicates that Walk can continue to walk.
WalkContinue
)
// Walker is a function that will be called when Walk find a
// new node.
// entering is set true before walks children, false after walked children.
// If Walker returns error, Walk function immediately stop walking.
type Walker func(n Node, entering bool) (WalkStatus, error)
// Walk walks a AST tree by the depth first search algorighm.
func Walk(n Node, walker Walker) error {
status, err := walker(n, true)
if err != nil || status == WalkStop {
return err
}
if status != WalkSkipChildren {
for c := n.FirstChild(); c != nil; c = c.NextSibling() {
if err := Walk(c, walker); err != nil {
return err
}
}
}
status, err = walker(n, false)
if err != nil || status == WalkStop {
return err
}
return nil
}

364
ast/block.go Normal file
View file

@ -0,0 +1,364 @@
package ast
import (
"fmt"
textm "github.com/yuin/goldmark/text"
"strings"
)
// A BaseBlock struct implements the Node interface.
type BaseBlock struct {
BaseNode
blankPreviousLines bool
lines *textm.Segments
}
// Type implements Node.Type
func (b *BaseBlock) Type() NodeType {
return BlockNode
}
// IsRaw implements Node.IsRaw
func (b *BaseBlock) IsRaw() bool {
return false
}
// HasBlankPreviousLines implements Node.HasBlankPreviousLines.
func (b *BaseBlock) HasBlankPreviousLines() bool {
return b.blankPreviousLines
}
// SetBlankPreviousLines implements Node.SetBlankPreviousLines.
func (b *BaseBlock) SetBlankPreviousLines(v bool) {
b.blankPreviousLines = v
}
// Lines implements Node.Lines
func (b *BaseBlock) Lines() *textm.Segments {
if b.lines == nil {
b.lines = textm.NewSegments()
}
return b.lines
}
// SetLines implements Node.SetLines
func (b *BaseBlock) SetLines(v *textm.Segments) {
b.lines = v
}
// A Document struct is a root node of Markdown text.
type Document struct {
BaseBlock
}
// Dump impelements Node.Dump .
func (n *Document) Dump(source []byte, level int) {
DumpHelper(n, source, level, "Document", nil, nil)
}
// NewDocument returns a new Document node.
func NewDocument() *Document {
return &Document{
BaseBlock: BaseBlock{},
}
}
// A TextBlock struct is a node whose lines
// should be rendered without any containers.
type TextBlock struct {
BaseBlock
}
// Dump impelements Node.Dump .
func (n *TextBlock) Dump(source []byte, level int) {
DumpHelper(n, source, level, "TextBlock", nil, nil)
}
// NewTextBlock returns a new TextBlock node.
func NewTextBlock() *TextBlock {
return &TextBlock{
BaseBlock: BaseBlock{},
}
}
// A Paragraph struct represents a paragraph of Markdown text.
type Paragraph struct {
BaseBlock
}
// Dump impelements Node.Dump .
func (n *Paragraph) Dump(source []byte, level int) {
DumpHelper(n, source, level, "Paragraph", nil, nil)
}
// NewParagraph returns a new Paragraph node.
func NewParagraph() *Paragraph {
return &Paragraph{
BaseBlock: BaseBlock{},
}
}
// IsParagraph returns true if given node implements the Paragraph interface,
// otherwise false.
func IsParagraph(node Node) bool {
_, ok := node.(*Paragraph)
return ok
}
// A Heading struct represents headings like SetextHeading and ATXHeading.
type Heading struct {
BaseBlock
// Level returns a level of this heading.
// This value is between 1 and 6.
Level int
// ID returns an ID of this heading.
ID []byte
}
// Dump impelements Node.Dump .
func (n *Heading) Dump(source []byte, level int) {
m := map[string]string{
"Level": fmt.Sprintf("%d", n.Level),
}
DumpHelper(n, source, level, "Heading", m, nil)
}
// NewHeading returns a new Heading node.
func NewHeading(level int) *Heading {
return &Heading{
BaseBlock: BaseBlock{},
Level: level,
}
}
// A ThemanticBreak struct represents a themantic break of Markdown text.
type ThemanticBreak struct {
BaseBlock
}
// Dump impelements Node.Dump .
func (n *ThemanticBreak) Dump(source []byte, level int) {
DumpHelper(n, source, level, "ThemanticBreak", nil, nil)
}
// NewThemanticBreak returns a new ThemanticBreak node.
func NewThemanticBreak() *ThemanticBreak {
return &ThemanticBreak{
BaseBlock: BaseBlock{},
}
}
// A CodeBlock interface represents an indented code block of Markdown text.
type CodeBlock struct {
BaseBlock
}
// IsRaw implements Node.IsRaw.
func (n *CodeBlock) IsRaw() bool {
return true
}
// Dump impelements Node.Dump .
func (n *CodeBlock) Dump(source []byte, level int) {
DumpHelper(n, source, level, "CodeBlock", nil, nil)
}
// NewCodeBlock returns a new CodeBlock node.
func NewCodeBlock() *CodeBlock {
return &CodeBlock{
BaseBlock: BaseBlock{},
}
}
// A FencedCodeBlock struct represents a fenced code block of Markdown text.
type FencedCodeBlock struct {
BaseBlock
// Info returns a info text of this fenced code block.
Info *Text
}
// IsRaw implements Node.IsRaw.
func (n *FencedCodeBlock) IsRaw() bool {
return true
}
// Dump impelements Node.Dump .
func (n *FencedCodeBlock) Dump(source []byte, level int) {
m := map[string]string{}
if n.Info != nil {
m["Info"] = fmt.Sprintf("\"%s\"", n.Info.Text(source))
}
DumpHelper(n, source, level, "FencedCodeBlock", m, nil)
}
// NewFencedCodeBlock return a new FencedCodeBlock node.
func NewFencedCodeBlock(info *Text) *FencedCodeBlock {
return &FencedCodeBlock{
BaseBlock: BaseBlock{},
Info: info,
}
}
// A Blockquote struct represents an blockquote block of Markdown text.
type Blockquote struct {
BaseBlock
}
// Dump impelements Node.Dump .
func (n *Blockquote) Dump(source []byte, level int) {
DumpHelper(n, source, level, "Blockquote", nil, nil)
}
// NewBlockquote returns a new Blockquote node.
func NewBlockquote() *Blockquote {
return &Blockquote{
BaseBlock: BaseBlock{},
}
}
// A List structr represents a list of Markdown text.
type List struct {
BaseBlock
// Marker is a markar character like '-', '+', ')' and '.'.
Marker byte
// IsTight is a true if this list is a 'tight' list.
// See https://spec.commonmark.org/0.29/#loose for details.
IsTight bool
// Start is an initial number of this ordered list.
// If this list is not an ordered list, Start is 0.
Start int
}
// IsOrdered returns true if this list is an ordered list, otherwise false.
func (l *List) IsOrdered() bool {
return l.Marker == '.' || l.Marker == ')'
}
// CanContinue returns true if this list can continue with
// given mark and a list type, otherwise false.
func (l *List) CanContinue(marker byte, isOrdered bool) bool {
return marker == l.Marker && isOrdered == l.IsOrdered()
}
// Dump implements Node.Dump.
func (l *List) Dump(source []byte, level int) {
name := "List"
if l.IsOrdered() {
name = "OrderedList"
}
m := map[string]string{
"Marker": fmt.Sprintf("%c", l.Marker),
"Tight": fmt.Sprintf("%v", l.IsTight),
}
if l.IsOrdered() {
m["Start"] = fmt.Sprintf("%d", l.Start)
}
DumpHelper(l, source, level, name, m, nil)
}
// NewList returns a new List node.
func NewList(marker byte) *List {
return &List{
BaseBlock: BaseBlock{},
Marker: marker,
IsTight: true,
}
}
// A ListItem struct represents a list item of Markdown text.
type ListItem struct {
BaseBlock
// Offset is an offset potision of this item.
Offset int
}
// Dump implements Node.Dump.
func (n *ListItem) Dump(source []byte, level int) {
DumpHelper(n, source, level, "ListItem", nil, nil)
}
// NewListItem returns a new ListItem node.
func NewListItem(offset int) *ListItem {
return &ListItem{
BaseBlock: BaseBlock{},
Offset: offset,
}
}
// HTMLBlockType represents kinds of an html blocks.
// See https://spec.commonmark.org/0.29/#html-blocks
type HTMLBlockType int
const (
// HTMLBlockType1 represents type 1 html blocks
HTMLBlockType1 = iota + 1
// HTMLBlockType2 represents type 2 html blocks
HTMLBlockType2
// HTMLBlockType3 represents type 3 html blocks
HTMLBlockType3
// HTMLBlockType4 represents type 4 html blocks
HTMLBlockType4
// HTMLBlockType5 represents type 5 html blocks
HTMLBlockType5
// HTMLBlockType6 represents type 6 html blocks
HTMLBlockType6
// HTMLBlockType7 represents type 7 html blocks
HTMLBlockType7
)
// An HTMLBlock struct represents an html block of Markdown text.
type HTMLBlock struct {
BaseBlock
// Type is a type of this html block.
HTMLBlockType HTMLBlockType
// ClosureLine is a line that closes this html block.
ClosureLine textm.Segment
}
// IsRaw implements Node.IsRaw.
func (n *HTMLBlock) IsRaw() bool {
return true
}
// HasClosure returns true if this html block has a closure line,
// otherwise false.
func (n *HTMLBlock) HasClosure() bool {
return n.ClosureLine.Start >= 0
}
// Dump implements Node.Dump.
func (n *HTMLBlock) Dump(source []byte, level int) {
indent := strings.Repeat(" ", level)
fmt.Printf("%s%s {\n", indent, "HTMLBlock")
indent2 := strings.Repeat(" ", level+1)
fmt.Printf("%sRawText: \"", indent2)
for i := 0; i < n.Lines().Len(); i++ {
s := n.Lines().At(i)
fmt.Print(string(source[s.Start:s.Stop]))
}
fmt.Printf("\"\n")
for c := n.FirstChild(); c != nil; c = c.NextSibling() {
c.Dump(source, level+1)
}
if n.HasClosure() {
cl := n.ClosureLine
fmt.Printf("%sClosure: \"%s\"\n", indent2, string(cl.Value(source)))
}
fmt.Printf("%s}\n", indent)
}
// NewHTMLBlock returns a new HTMLBlock node.
func NewHTMLBlock(typ HTMLBlockType) *HTMLBlock {
return &HTMLBlock{
BaseBlock: BaseBlock{},
HTMLBlockType: typ,
ClosureLine: textm.NewSegment(-1, -1),
}
}

371
ast/inline.go Normal file
View file

@ -0,0 +1,371 @@
package ast
import (
"fmt"
"strings"
textm "github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
// A BaseInline struct implements the Node interface.
type BaseInline struct {
BaseNode
}
// Type implements Node.Type
func (b *BaseInline) Type() NodeType {
return InlineNode
}
// IsRaw implements Node.IsRaw
func (b *BaseInline) IsRaw() bool {
return false
}
// HasBlankPreviousLines implements Node.HasBlankPreviousLines.
func (b *BaseInline) HasBlankPreviousLines() bool {
panic("can not call with inline nodes.")
}
// SetBlankPreviousLines implements Node.SetBlankPreviousLines.
func (b *BaseInline) SetBlankPreviousLines(v bool) {
panic("can not call with inline nodes.")
}
// Lines implements Node.Lines
func (b *BaseInline) Lines() *textm.Segments {
panic("can not call with inline nodes.")
}
// SetLines implements Node.SetLines
func (b *BaseInline) SetLines(v *textm.Segments) {
panic("can not call with inline nodes.")
}
// A Text struct represents a textual content of the Markdown text.
type Text struct {
BaseInline
// Segment is a position in a source text.
Segment textm.Segment
flags uint8
}
const (
textSoftLineBreak = 1 << iota
textHardLineBreak
textRaw
)
// Inline implements Inline.Inline.
func (n *Text) Inline() {
}
// SoftLineBreak returns true if this node ends with a new line,
// otherwise false.
func (n *Text) SoftLineBreak() bool {
return n.flags&textSoftLineBreak != 0
}
// SetSoftLineBreak sets whether this node ends with a new line.
func (n *Text) SetSoftLineBreak(v bool) {
if v {
n.flags |= textSoftLineBreak
} else {
n.flags = n.flags &^ textHardLineBreak
}
}
// IsRaw returns true if this text should be rendered without unescaping
// back slash escapes and resolving references.
func (n *Text) IsRaw() bool {
return n.flags&textRaw != 0
}
// SetRaw sets whether this text should be rendered as raw contents.
func (n *Text) SetRaw(v bool) {
if v {
n.flags |= textRaw
} else {
n.flags = n.flags &^ textRaw
}
}
// HardLineBreak returns true if this node ends with a hard line break.
// See https://spec.commonmark.org/0.29/#hard-line-breaks for details.
func (n *Text) HardLineBreak() bool {
return n.flags&textHardLineBreak != 0
}
// SetHardLineBreak sets whether this node ends with a hard line break.
func (n *Text) SetHardLineBreak(v bool) {
if v {
n.flags |= textHardLineBreak
} else {
n.flags = n.flags &^ textHardLineBreak
}
}
// Merge merges a Node n into this node.
// Merge returns true if given node has been merged, otherwise false.
func (n *Text) Merge(node Node, source []byte) bool {
t, ok := node.(*Text)
if !ok {
return false
}
if n.Segment.Stop != t.Segment.Start || t.Segment.Padding != 0 || source[n.Segment.Stop-1] == '\n' || t.IsRaw() != n.IsRaw() {
return false
}
n.Segment.Stop = t.Segment.Stop
n.SetSoftLineBreak(t.SoftLineBreak())
n.SetHardLineBreak(t.HardLineBreak())
return true
}
// Text implements Node.Text.
func (n *Text) Text(source []byte) []byte {
return n.Segment.Value(source)
}
// Dump implements Node.Dump.
func (n *Text) Dump(source []byte, level int) {
fmt.Printf("%sText: \"%s\"\n", strings.Repeat(" ", level), strings.TrimRight(string(n.Text(source)), "\n"))
}
// NewText returns a new Text node.
func NewText() *Text {
return &Text{
BaseInline: BaseInline{},
}
}
// NewTextSegment returns a new Text node with given source potision.
func NewTextSegment(v textm.Segment) *Text {
return &Text{
BaseInline: BaseInline{},
Segment: v,
}
}
// NewRawTextSegment returns a new Text node with given source position.
// The new node should be rendered as raw contents.
func NewRawTextSegment(v textm.Segment) *Text {
t := &Text{
BaseInline: BaseInline{},
Segment: v,
}
t.SetRaw(true)
return t
}
// MergeOrAppendTextSegment merges a given s into the last child of the parent if
// it can be merged, otherwise creates a new Text node and appends it to after current
// last child.
func MergeOrAppendTextSegment(parent Node, s textm.Segment) {
last := parent.LastChild()
t, ok := last.(*Text)
if ok && t.Segment.Stop == s.Start && !t.SoftLineBreak() {
ts := t.Segment
t.Segment = ts.WithStop(s.Stop)
} else {
parent.AppendChild(parent, NewTextSegment(s))
}
}
// MergeOrReplaceTextSegment merges a given s into a previous sibling of the node n
// if a previous sibling of the node n is *Text, otherwise replaces Node n with s.
func MergeOrReplaceTextSegment(parent Node, n Node, s textm.Segment) {
prev := n.PreviousSibling()
if t, ok := prev.(*Text); ok && t.Segment.Stop == s.Start && !t.SoftLineBreak() {
t.Segment = t.Segment.WithStop(s.Stop)
parent.RemoveChild(parent, n)
} else {
parent.ReplaceChild(parent, n, NewTextSegment(s))
}
}
// A CodeSpan struct represents a code span of Markdown text.
type CodeSpan struct {
BaseInline
}
// Inline implements Inline.Inline .
func (n *CodeSpan) Inline() {
}
// IsBlank returns true if this node consists of spaces, otherwise false.
func (n *CodeSpan) IsBlank(source []byte) bool {
for c := n.FirstChild(); c != nil; c = c.NextSibling() {
text := c.(*Text).Segment
if !util.IsBlank(text.Value(source)) {
return false
}
}
return true
}
// Dump implements Node.Dump
func (n *CodeSpan) Dump(source []byte, level int) {
DumpHelper(n, source, level, "CodeSpan", nil, nil)
}
// NewCodeSpan returns a new CodeSpan node.
func NewCodeSpan() *CodeSpan {
return &CodeSpan{
BaseInline: BaseInline{},
}
}
// An Emphasis struct represents an emphasis of Markdown text.
type Emphasis struct {
BaseInline
// Level is a level of the emphasis.
Level int
}
// Inline implements Inline.Inline.
func (n *Emphasis) Inline() {
}
// Dump implements Node.Dump.
func (n *Emphasis) Dump(source []byte, level int) {
DumpHelper(n, source, level, fmt.Sprintf("Emphasis(%d)", n.Level), nil, nil)
}
// NewEmphasis returns a new Emphasis node with given level.
func NewEmphasis(level int) *Emphasis {
return &Emphasis{
BaseInline: BaseInline{},
Level: level,
}
}
type baseLink struct {
BaseInline
// Destination is a destination(URL) of this link.
Destination []byte
// Title is a title of this link.
Title []byte
}
// Inline implements Inline.Inline.
func (n *baseLink) Inline() {
}
func (n *baseLink) Dump(source []byte, level int) {
m := map[string]string{}
m["Destination"] = string(n.Destination)
m["Title"] = string(n.Title)
DumpHelper(n, source, level, "Link", m, nil)
}
// A Link struct represents a link of the Markdown text.
type Link struct {
baseLink
}
// NewLink returns a new Link node.
func NewLink() *Link {
c := &Link{
baseLink: baseLink{
BaseInline: BaseInline{},
},
}
return c
}
// An Image struct represents an image of the Markdown text.
type Image struct {
baseLink
}
// Dump implements Node.Dump.
func (n *Image) Dump(source []byte, level int) {
m := map[string]string{}
m["Destination"] = string(n.Destination)
m["Title"] = string(n.Title)
DumpHelper(n, source, level, "Image", m, nil)
}
// NewImage returns a new Image node.
func NewImage(link *Link) *Image {
c := &Image{
baseLink: baseLink{
BaseInline: BaseInline{},
},
}
c.Destination = link.Destination
c.Title = link.Title
for n := link.FirstChild(); n != nil; {
next := n.NextSibling()
link.RemoveChild(link, n)
c.AppendChild(c, n)
n = next
}
return c
}
// AutoLinkType defines kind of auto links.
type AutoLinkType int
const (
// AutoLinkEmail indicates that an autolink is an email address.
AutoLinkEmail = iota + 1
// AutoLinkURL indicates that an autolink is a generic URL.
AutoLinkURL
)
// An AutoLink struct represents an autolink of the Markdown text.
type AutoLink struct {
BaseInline
// Value is a link text of this node.
Value *Text
// Type is a type of this autolink.
AutoLinkType AutoLinkType
}
// Inline implements Inline.Inline.
func (n *AutoLink) Inline() {}
// Dump implenets Node.Dump
func (n *AutoLink) Dump(source []byte, level int) {
segment := n.Value.Segment
m := map[string]string{
"Value": string(segment.Value(source)),
}
DumpHelper(n, source, level, "AutoLink", m, nil)
}
// NewAutoLink returns a new AutoLink node.
func NewAutoLink(typ AutoLinkType, value *Text) *AutoLink {
return &AutoLink{
BaseInline: BaseInline{},
Value: value,
AutoLinkType: typ,
}
}
// A RawHTML struct represents an inline raw HTML of the Markdown text.
type RawHTML struct {
BaseInline
}
// Inline implements Inline.Inline.
func (n *RawHTML) Inline() {}
// Dump implements Node.Dump.
func (n *RawHTML) Dump(source []byte, level int) {
DumpHelper(n, source, level, "RawHTML", nil, nil)
}
// NewRawHTML returns a new RawHTML node.
func NewRawHTML() *RawHTML {
return &RawHTML{
BaseInline: BaseInline{},
}
}

53
commonmark_test.go Normal file
View file

@ -0,0 +1,53 @@
package goldmark
import (
"bytes"
"encoding/json"
"io/ioutil"
"testing"
"github.com/yuin/goldmark/renderer/html"
)
type commonmarkSpecTestCase struct {
Markdown string `json:"markdown"`
HTML string `json:"html"`
Example int `json:"example"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
Section string `json:"section"`
}
func TestSpec(t *testing.T) {
bs, err := ioutil.ReadFile("_test/spec.json")
if err != nil {
panic(err)
}
var testCases []commonmarkSpecTestCase
if err := json.Unmarshal(bs, &testCases); err != nil {
panic(err)
}
markdown := New(WithRendererOptions(html.WithXHTML()))
for _, testCase := range testCases {
var out bytes.Buffer
if err := markdown.Convert([]byte(testCase.Markdown), &out); err != nil {
panic(err)
}
if !bytes.Equal(bytes.TrimSpace(out.Bytes()), bytes.TrimSpace([]byte(testCase.HTML))) {
format := `============= case %d ================
Markdown:
-----------
%s
Expected:
----------
%s
Actual
---------
%s
`
t.Errorf(format, testCase.Example, testCase.Markdown, testCase.HTML, out.Bytes())
}
}
}

View file

@ -0,0 +1,23 @@
// Package ast defines AST nodes that represents extension's elements
package ast
import (
gast "github.com/yuin/goldmark/ast"
)
// A Strikethrough struct represents a strikethrough of GFM text.
type Strikethrough struct {
gast.BaseInline
}
func (n *Strikethrough) Inline() {
}
func (n *Strikethrough) Dump(source []byte, level int) {
gast.DumpHelper(n, source, level, "Strikethrough", nil, nil)
}
// NewStrikethrough returns a new Strikethrough node.
func NewStrikethrough() *Strikethrough {
return &Strikethrough{}
}

113
extension/ast/table.go Normal file
View file

@ -0,0 +1,113 @@
package ast
import (
"fmt"
gast "github.com/yuin/goldmark/ast"
"strings"
)
// Alignment is a text alignment of table cells.
type Alignment int
const (
// AlignLeft indicates text should be left justified.
AlignLeft Alignment = iota + 1
// AlignRight indicates text should be right justified.
AlignRight
// AlignCenter indicates text should be centered.
AlignCenter
// AlignNone indicates text should be aligned by default manner.
AlignNone
)
func (a Alignment) String() string {
switch a {
case AlignLeft:
return "left"
case AlignRight:
return "right"
case AlignCenter:
return "center"
case AlignNone:
return "none"
}
return ""
}
// A Table struct represents a table of Markdown(GFM) text.
type Table struct {
gast.BaseBlock
// Alignments returns alignments of the columns.
Alignments []Alignment
}
// Dump implements Node.Dump
func (n *Table) Dump(source []byte, level int) {
gast.DumpHelper(n, source, level, "Table", nil, func(level int) {
indent := strings.Repeat(" ", level)
fmt.Printf("%sAlignments {\n", indent)
for i, alignment := range n.Alignments {
indent2 := strings.Repeat(" ", level+1)
fmt.Printf("%s%s", indent2, alignment.String())
if i != len(n.Alignments)-1 {
fmt.Println("")
}
}
fmt.Printf("\n%s}\n", indent)
})
}
// NewTable returns a new Table node.
func NewTable() *Table {
return &Table{
Alignments: []Alignment{},
}
}
// A TableRow struct represents a table row of Markdown(GFM) text.
type TableRow struct {
gast.BaseBlock
Alignments []Alignment
}
// Dump implements Node.Dump.
func (n *TableRow) Dump(source []byte, level int) {
gast.DumpHelper(n, source, level, "TableRow", nil, nil)
}
// NewTableRow returns a new TableRow node.
func NewTableRow(alignments []Alignment) *TableRow {
return &TableRow{}
}
// A TableHeader struct represents a table header of Markdown(GFM) text.
type TableHeader struct {
*TableRow
}
// NewTableHeader returns a new TableHeader node.
func NewTableHeader(row *TableRow) *TableHeader {
return &TableHeader{row}
}
// A TableCell struct represents a table cell of a Markdown(GFM) text.
type TableCell struct {
gast.BaseBlock
Alignment Alignment
}
// Dump implements Node.Dump.
func (n *TableCell) Dump(source []byte, level int) {
gast.DumpHelper(n, source, level, "TableCell", nil, nil)
}
// NewTableCell returns a new TableCell node.
func NewTableCell() *TableCell {
return &TableCell{
Alignment: AlignNone,
}
}

27
extension/ast/tasklist.go Normal file
View file

@ -0,0 +1,27 @@
package ast
import (
"fmt"
gast "github.com/yuin/goldmark/ast"
)
// A TaskCheckBox struct represents a checkbox of a task list.
type TaskCheckBox struct {
gast.BaseInline
IsChecked bool
}
// Dump impelemtns Node.Dump.
func (n *TaskCheckBox) Dump(source []byte, level int) {
m := map[string]string{
"Checked": fmt.Sprintf("%v", n.IsChecked),
}
gast.DumpHelper(n, source, level, "TaskCheckBox", m, nil)
}
// NewTaskCheckBox returns a new TaskCheckBox node.
func NewTaskCheckBox(checked bool) *TaskCheckBox {
return &TaskCheckBox{
IsChecked: checked,
}
}

32
extension/gfm.go Normal file
View file

@ -0,0 +1,32 @@
package extension
import (
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/parser"
)
type gfm struct {
}
// GFM is an extension that provides Github Flavored markdown functionalities.
var GFM = &gfm{}
var filterTags = []string{
"title",
"textarea",
"style",
"xmp",
"iframe",
"noembed",
"noframes",
"script",
"plaintext",
}
func (e *gfm) Extend(m goldmark.Markdown) {
m.Parser().AddOption(parser.WithFilterTags(filterTags...))
Linkify.Extend(m)
Table.Extend(m)
Strikethrough.Extend(m)
TaskList.Extend(m)
}

125
extension/linkify.go Normal file
View file

@ -0,0 +1,125 @@
package extension
import (
"bytes"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"regexp"
)
var wwwURLRegxp = regexp.MustCompile(`^www\.[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&//=\(\);]*)`)
var urlRegexp = regexp.MustCompile(`^(?:http|https|ftp):\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=\(\);]*)`)
var emailRegexp = regexp.MustCompile(`^[a-zA-Z0-9\.\-_\+]+@([a-zA-Z0-9\.\-_]+)`)
type linkifyParser struct {
}
var defaultLinkifyParser = &linkifyParser{}
// NewLinkifyParser return a new InlineParser can parse
// text that seems like a URL.
func NewLinkifyParser() parser.InlineParser {
return defaultLinkifyParser
}
func (s *linkifyParser) Trigger() []byte {
// ' ' indicates any white spaces and a line head
return []byte{' ', '*', '_', '~', '('}
}
func (s *linkifyParser) Parse(parent ast.Node, block text.Reader, pc parser.Context) ast.Node {
line, segment := block.PeekLine()
consumes := 0
start := segment.Start
c := line[0]
// advance if current position is not a line head.
if c == ' ' || c == '*' || c == '_' || c == '~' || c == '(' {
consumes++
start++
line = line[1:]
}
var m []int
typ := ast.AutoLinkType(ast.AutoLinkEmail)
typ = ast.AutoLinkURL
m = urlRegexp.FindSubmatchIndex(line)
if m == nil {
m = wwwURLRegxp.FindSubmatchIndex(line)
}
if m != nil {
lastChar := line[m[1]-1]
if lastChar == '.' {
m[1]--
} else if lastChar == ')' {
closing := 0
for i := m[1] - 1; i >= m[0]; i-- {
if line[i] == ')' {
closing++
} else if line[i] == '(' {
closing--
}
}
if closing > 0 {
m[1]--
}
} else if lastChar == ';' {
i := m[1] - 2
for ; i >= m[0]; i-- {
if util.IsAlphaNumeric(line[i]) {
continue
}
break
}
if i != m[1]-2 {
if line[i] == '&' {
m[1] -= m[1] - i
}
}
}
}
if m == nil {
typ = ast.AutoLinkEmail
m = emailRegexp.FindSubmatchIndex(line)
if m == nil || bytes.IndexByte(line[m[2]:m[3]], '.') < 0 {
return nil
}
lastChar := line[m[1]-1]
if lastChar == '.' {
m[1]--
} else if lastChar == '-' || lastChar == '_' {
return nil
}
}
if m == nil {
return nil
}
if consumes != 0 {
s := segment.WithStop(segment.Start + 1)
ast.MergeOrAppendTextSegment(parent, s)
}
consumes += m[1]
block.Advance(consumes)
n := ast.NewTextSegment(text.NewSegment(start, start+m[1]))
return ast.NewAutoLink(typ, n)
}
func (s *linkifyParser) CloseBlock(parent ast.Node, pc parser.Context) {
// nothing to do
}
type linkify struct {
}
// Linkify is an extension that allow you to parse text that seems like a URL.
var Linkify = &linkify{}
func (e *linkify) Extend(m goldmark.Markdown) {
m.Parser().AddOption(parser.WithInlineParsers(
util.Prioritized(NewLinkifyParser(), 999),
))
}

111
extension/strikethrough.go Normal file
View file

@ -0,0 +1,111 @@
package extension
import (
"github.com/yuin/goldmark"
gast "github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/extension/ast"
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/renderer/html"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type strikethroughDelimiterProcessor struct {
}
func (p *strikethroughDelimiterProcessor) IsDelimiter(b byte) bool {
return b == '~'
}
func (p *strikethroughDelimiterProcessor) CanOpenCloser(opener, closer *parser.Delimiter) bool {
return opener.Char == closer.Char
}
func (p *strikethroughDelimiterProcessor) OnMatch(consumes int) gast.Node {
return ast.NewStrikethrough()
}
var defaultStrikethroughDelimiterProcessor = &strikethroughDelimiterProcessor{}
type strikethroughParser struct {
}
var defaultStrikethroughParser = &strikethroughParser{}
// NewStrikethroughParser return a new InlineParser that parses
// strikethrough expressions.
func NewStrikethroughParser() parser.InlineParser {
return defaultStrikethroughParser
}
func (s *strikethroughParser) Trigger() []byte {
return []byte{'~'}
}
func (s *strikethroughParser) Parse(parent gast.Node, block text.Reader, pc parser.Context) gast.Node {
before := block.PrecendingCharacter()
line, segment := block.PeekLine()
node := parser.ScanDelimiter(line, before, 2, defaultStrikethroughDelimiterProcessor)
if node == nil {
return nil
}
node.Segment = segment.WithStop(segment.Start + node.OriginalLength)
block.Advance(node.OriginalLength)
pc.PushDelimiter(node)
return node
}
func (s *strikethroughParser) CloseBlock(parent gast.Node, pc parser.Context) {
// nothing to do
}
// StrikethroughHTMLRenderer is a renderer.NodeRenderer implementation that
// renders Strikethrough nodes.
type StrikethroughHTMLRenderer struct {
html.Config
}
// NewStrikethroughHTMLRenderer returns a new StrikethroughHTMLRenderer.
func NewStrikethroughHTMLRenderer(opts ...html.Option) renderer.NodeRenderer {
r := &StrikethroughHTMLRenderer{
Config: html.NewConfig(),
}
for _, opt := range opts {
opt.SetHTMLOption(&r.Config)
}
return r
}
// Render implements renderer.NodeRenderer.Render.
func (r *StrikethroughHTMLRenderer) Render(writer util.BufWriter, source []byte, n gast.Node, entering bool) (gast.WalkStatus, error) {
switch node := n.(type) {
case *ast.Strikethrough:
return r.renderStrikethrough(writer, source, node, entering), nil
}
return gast.WalkContinue, renderer.NotSupported
}
func (r *StrikethroughHTMLRenderer) renderStrikethrough(w util.BufWriter, source []byte, n *ast.Strikethrough, entering bool) gast.WalkStatus {
if entering {
w.WriteString("<del>")
} else {
w.WriteString("</del>")
}
return gast.WalkContinue
}
type strikethrough struct {
}
// Strikethrough is an extension that allow you to use strikethrough expression like '~~text~~' .
var Strikethrough = &strikethrough{}
func (e *strikethrough) Extend(m goldmark.Markdown) {
m.Parser().AddOption(parser.WithInlineParsers(
util.Prioritized(NewStrikethroughParser(), 500),
))
m.Renderer().AddOption(renderer.WithNodeRenderers(
util.Prioritized(NewStrikethroughHTMLRenderer(), 500),
))
}

233
extension/table.go Normal file
View file

@ -0,0 +1,233 @@
package extension
import (
"bytes"
"fmt"
"regexp"
"github.com/yuin/goldmark"
gast "github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/extension/ast"
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/renderer/html"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
var tableDelimRegexp = regexp.MustCompile(`^[\s\-\|\:]+$`)
var tableDelimLeft = regexp.MustCompile(`^\s*\:\-+\s*$`)
var tableDelimRight = regexp.MustCompile(`^\s*\-+\:\s*$`)
var tableDelimCenter = regexp.MustCompile(`^\s*\:\-+\:\s*$`)
var tableDelimNone = regexp.MustCompile(`^\s*\-+\s*$`)
type tableParagraphTransformer struct {
}
var defaultTableParagraphTransformer = &tableParagraphTransformer{}
// NewTableParagraphTransformer returns a new ParagraphTransformer
// that can transform pargraphs into tables.
func NewTableParagraphTransformer() parser.ParagraphTransformer {
return defaultTableParagraphTransformer
}
func (b *tableParagraphTransformer) Transform(node *gast.Paragraph, pc parser.Context) {
lines := node.Lines()
if lines.Len() < 2 {
return
}
alignments := b.parseDelimiter(lines.At(1), pc)
if alignments == nil {
return
}
header := b.parseRow(lines.At(0), alignments, pc)
if header == nil || len(alignments) != header.ChildCount() {
return
}
table := ast.NewTable()
table.Alignments = alignments
table.AppendChild(table, ast.NewTableHeader(header))
if lines.Len() > 2 {
for i := 2; i < lines.Len(); i++ {
table.AppendChild(table, b.parseRow(lines.At(i), alignments, pc))
}
}
node.Parent().InsertBefore(node.Parent(), node, table)
node.Parent().RemoveChild(node.Parent(), node)
return
}
func (b *tableParagraphTransformer) parseRow(segment text.Segment, alignments []ast.Alignment, pc parser.Context) *ast.TableRow {
line := segment.Value(pc.Source())
pos := 0
pos += util.TrimLeftSpaceLength(line)
limit := len(line)
limit -= util.TrimRightSpaceLength(line)
row := ast.NewTableRow(alignments)
if len(line) > 0 && line[pos] == '|' {
pos++
}
if len(line) > 0 && line[limit-1] == '|' {
limit--
}
for i := 0; pos < limit; i++ {
closure := util.FindClosure(line[pos:], byte(0), '|', true, false)
if closure < 0 {
closure = len(line[pos:])
}
node := ast.NewTableCell()
segment := text.NewSegment(segment.Start+pos, segment.Start+pos+closure)
segment = segment.TrimLeftSpace(pc.Source())
segment = segment.TrimRightSpace(pc.Source())
node.Lines().Append(segment)
node.Alignment = alignments[i]
row.AppendChild(row, node)
pos += closure + 1
}
return row
}
func (b *tableParagraphTransformer) parseDelimiter(segment text.Segment, pc parser.Context) []ast.Alignment {
line := segment.Value(pc.Source())
if !tableDelimRegexp.Match(line) {
return nil
}
cols := bytes.Split(line, []byte{'|'})
if util.IsBlank(cols[0]) {
cols = cols[1:]
}
if len(cols) > 0 && util.IsBlank(cols[len(cols)-1]) {
cols = cols[:len(cols)-1]
}
var alignments []ast.Alignment
for _, col := range cols {
if tableDelimLeft.Match(col) {
if alignments == nil {
alignments = []ast.Alignment{}
}
alignments = append(alignments, ast.AlignLeft)
} else if tableDelimRight.Match(col) {
if alignments == nil {
alignments = []ast.Alignment{}
}
alignments = append(alignments, ast.AlignRight)
} else if tableDelimCenter.Match(col) {
if alignments == nil {
alignments = []ast.Alignment{}
}
alignments = append(alignments, ast.AlignCenter)
} else if tableDelimNone.Match(col) {
if alignments == nil {
alignments = []ast.Alignment{}
}
alignments = append(alignments, ast.AlignNone)
} else {
return nil
}
}
return alignments
}
// TableHTMLRenderer is a renderer.NodeRenderer implementation that
// renders Table nodes.
type TableHTMLRenderer struct {
html.Config
}
// NewTableHTMLRenderer returns a new TableHTMLRenderer.
func NewTableHTMLRenderer(opts ...html.Option) renderer.NodeRenderer {
r := &TableHTMLRenderer{
Config: html.NewConfig(),
}
for _, opt := range opts {
opt.SetHTMLOption(&r.Config)
}
return r
}
// Render implements renderer.Renderer.Render.
func (r *TableHTMLRenderer) Render(writer util.BufWriter, source []byte, n gast.Node, entering bool) (gast.WalkStatus, error) {
switch node := n.(type) {
case *ast.Table:
return r.renderTable(writer, source, node, entering), nil
case *ast.TableHeader:
return r.renderTableHeader(writer, source, node, entering), nil
case *ast.TableRow:
return r.renderTableRow(writer, source, node, entering), nil
case *ast.TableCell:
return r.renderTableCell(writer, source, node, entering), nil
}
return gast.WalkContinue, renderer.NotSupported
}
func (r *TableHTMLRenderer) renderTable(w util.BufWriter, source []byte, n *ast.Table, entering bool) gast.WalkStatus {
if entering {
w.WriteString("<table>\n")
} else {
w.WriteString("</table>\n")
}
return gast.WalkContinue
}
func (r *TableHTMLRenderer) renderTableHeader(w util.BufWriter, source []byte, n *ast.TableHeader, entering bool) gast.WalkStatus {
if entering {
w.WriteString("<thead>\n")
w.WriteString("<tr>\n")
} else {
w.WriteString("</tr>\n")
w.WriteString("</thead>\n")
if n.NextSibling() != nil {
w.WriteString("<tbody>\n")
}
if n.Parent().LastChild() == n {
w.WriteString("</tbody>\n")
}
}
return gast.WalkContinue
}
func (r *TableHTMLRenderer) renderTableRow(w util.BufWriter, source []byte, n *ast.TableRow, entering bool) gast.WalkStatus {
if entering {
w.WriteString("<tr>\n")
} else {
w.WriteString("</tr>\n")
if n.Parent().LastChild() == n {
w.WriteString("</tbody>\n")
}
}
return gast.WalkContinue
}
func (r *TableHTMLRenderer) renderTableCell(w util.BufWriter, source []byte, n *ast.TableCell, entering bool) gast.WalkStatus {
tag := "td"
if n.Parent().Parent().FirstChild() == n.Parent() {
tag = "th"
}
if entering {
align := ""
if n.Alignment != ast.AlignNone {
align = fmt.Sprintf(` align="%s"`, n.Alignment.String())
}
fmt.Fprintf(w, "<%s%s>", tag, align)
} else {
fmt.Fprintf(w, "</%s>\n", tag)
}
return gast.WalkContinue
}
type table struct {
}
// Table is an extension that allow you to use GFM tables .
var Table = &table{}
func (e *table) Extend(m goldmark.Markdown) {
m.Parser().AddOption(parser.WithParagraphTransformers(
util.Prioritized(NewTableParagraphTransformer(), 200),
))
m.Renderer().AddOption(renderer.WithNodeRenderers(
util.Prioritized(NewTableHTMLRenderer(), 500),
))
}

118
extension/tasklist.go Normal file
View file

@ -0,0 +1,118 @@
package extension
import (
"github.com/yuin/goldmark"
gast "github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/extension/ast"
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/renderer/html"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"regexp"
)
var taskListRegexp = regexp.MustCompile(`^\[([\sxX])\]\s*`)
type taskCheckBoxParser struct {
}
var defaultTaskCheckBoxParser = &taskCheckBoxParser{}
// NewTaskCheckBoxParser returns a new InlineParser that can parse
// checkboxes in list items.
// This parser must take precedence over the parser.LinkParser.
func NewTaskCheckBoxParser() parser.InlineParser {
return defaultTaskCheckBoxParser
}
func (s *taskCheckBoxParser) Trigger() []byte {
return []byte{'['}
}
func (s *taskCheckBoxParser) Parse(parent gast.Node, block text.Reader, pc parser.Context) gast.Node {
// Given AST structure must be like
// - List
// - ListItem : parent.Parent
// - TextBlock : parent
// (current line)
if parent.Parent() == nil || parent.Parent().FirstChild() != parent {
return nil
}
if _, ok := parent.Parent().(*gast.ListItem); !ok {
return nil
}
line, _ := block.PeekLine()
m := taskListRegexp.FindSubmatchIndex(line)
if m == nil {
return nil
}
value := line[m[2]:m[3]][0]
block.Advance(m[1])
checked := value == 'x' || value == 'X'
return ast.NewTaskCheckBox(checked)
}
func (s *taskCheckBoxParser) CloseBlock(parent gast.Node, pc parser.Context) {
// nothing to do
}
// TaskCheckBoxHTMLRenderer is a renderer.NodeRenderer implementation that
// renders checkboxes in list items.
type TaskCheckBoxHTMLRenderer struct {
html.Config
}
// NewTaskCheckBoxHTMLRenderer returns a new TaskCheckBoxHTMLRenderer.
func NewTaskCheckBoxHTMLRenderer(opts ...html.Option) renderer.NodeRenderer {
r := &TaskCheckBoxHTMLRenderer{
Config: html.NewConfig(),
}
for _, opt := range opts {
opt.SetHTMLOption(&r.Config)
}
return r
}
// Render implements renderer.NodeRenderer.Render.
func (r *TaskCheckBoxHTMLRenderer) Render(writer util.BufWriter, source []byte, n gast.Node, entering bool) (gast.WalkStatus, error) {
switch node := n.(type) {
case *ast.TaskCheckBox:
return r.renderTaskCheckBox(writer, source, node, entering), nil
}
return gast.WalkContinue, renderer.NotSupported
}
func (r *TaskCheckBoxHTMLRenderer) renderTaskCheckBox(w util.BufWriter, source []byte, n *ast.TaskCheckBox, entering bool) gast.WalkStatus {
if !entering {
return gast.WalkContinue
}
if n.IsChecked {
w.WriteString(`<input checked="" disabled="" type="checkbox"`)
} else {
w.WriteString(`<input disabled="" type="checkbox"`)
}
if r.XHTML {
w.WriteString(" />")
} else {
w.WriteString(">")
}
return gast.WalkContinue
}
type taskList struct {
}
// TaskList is an extension that allow you to use GFM task lists.
var TaskList = &taskList{}
func (e *taskList) Extend(m goldmark.Markdown) {
m.Parser().AddOption(parser.WithInlineParsers(
util.Prioritized(NewTaskCheckBoxParser(), 0),
))
m.Renderer().AddOption(renderer.WithNodeRenderers(
util.Prioritized(NewTaskCheckBoxHTMLRenderer(), 500),
))
}

1
go.mod Normal file
View file

@ -0,0 +1 @@
module github.com/yuin/goldmark

144
markdown.go Normal file
View file

@ -0,0 +1,144 @@
// Package goldmark implements functions to convert markdown text to a desired format.
package goldmark
import (
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/renderer/html"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"io"
)
// DefaultParser returns a new Parser that is configured by default values.
func DefaultParser() parser.Parser {
return parser.NewParser(parser.WithBlockParsers(parser.DefaultBlockParsers()...),
parser.WithInlineParsers(parser.DefaultInlineParsers()...),
parser.WithParagraphTransformers(parser.DefaultParagraphTransformers()...),
)
}
// DefaultRenderer returns a new Renderer that is configured by default values.
func DefaultRenderer() renderer.Renderer {
return renderer.NewRenderer(renderer.WithNodeRenderers(util.Prioritized(html.NewRenderer(), 1000)))
}
var defaultMarkdown = New()
// Convert interprets a UTF-8 bytes source in Markdown and
// write rendered contents to a writer w.
func Convert(source []byte, w io.Writer) error {
return defaultMarkdown.Convert(source, w)
}
// A Markdown interface offers functions to convert Markdown text to
// a desired format.
type Markdown interface {
// Convert interprets a UTF-8 bytes source in Markdown and write rendered
// contents to a writer w.
Convert(source []byte, writer io.Writer) error
// Parser returns a Parser that will be used for conversion.
Parser() parser.Parser
// SetParser sets a Parser to this object.
SetParser(parser.Parser)
// Parser returns a Renderer that will be used for conversion.
Renderer() renderer.Renderer
// SetRenderer sets a Renderer to this object.
SetRenderer(renderer.Renderer)
}
// Option is a functional option type for Markdown objects.
type Option func(*markdown)
// WithExtensions adds extensions.
func WithExtensions(ext ...Extender) Option {
return func(m *markdown) {
m.extensions = append(m.extensions, ext...)
}
}
// WithParser allows you to override the default parser.
func WithParser(p parser.Parser) Option {
return func(m *markdown) {
m.parser = p
}
}
// WithParserOptions applies options for the parser.
func WithParserOptions(opts ...parser.Option) Option {
return func(m *markdown) {
for _, opt := range opts {
m.parser.AddOption(opt)
}
}
}
// WithRenderer allows you to override the default renderer.
func WithRenderer(r renderer.Renderer) Option {
return func(m *markdown) {
m.renderer = r
}
}
// WithRendererOptions applies options for the renderer.
func WithRendererOptions(opts ...renderer.Option) Option {
return func(m *markdown) {
for _, opt := range opts {
m.renderer.AddOption(opt)
}
}
}
type markdown struct {
parser parser.Parser
renderer renderer.Renderer
extensions []Extender
}
// New returns a new Markdown with given options.
func New(options ...Option) Markdown {
md := &markdown{
parser: DefaultParser(),
renderer: DefaultRenderer(),
extensions: []Extender{},
}
for _, opt := range options {
opt(md)
}
for _, e := range md.extensions {
e.Extend(md)
}
return md
}
func (m *markdown) Convert(source []byte, writer io.Writer) error {
reader := text.NewReader(source)
doc, _ := m.parser.Parse(reader)
return m.renderer.Render(writer, reader.Source(), doc)
}
func (m *markdown) Parser() parser.Parser {
return m.parser
}
func (m *markdown) SetParser(v parser.Parser) {
m.parser = v
}
func (m *markdown) Renderer() renderer.Renderer {
return m.renderer
}
func (m *markdown) SetRenderer(v renderer.Renderer) {
m.renderer = v
}
// An Extender interface is used for extending Markdown.
type Extender interface {
// Extend extends the Markdown.
Extend(Markdown)
}

146
parser/atx_heading.go Normal file
View file

@ -0,0 +1,146 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"regexp"
)
// A HeadingConfig struct is a data structure that holds configuration of the renderers related to headings.
type HeadingConfig struct {
HeadingID bool
}
// SetOption implements SetOptioner.
func (b *HeadingConfig) SetOption(name OptionName, value interface{}) {
switch name {
case HeadingID:
b.HeadingID = true
}
}
// A HeadingOption interface sets options for heading parsers.
type HeadingOption interface {
SetHeadingOption(*HeadingConfig)
}
// HeadingID is an option name that enables custom and auto IDs for headings.
var HeadingID OptionName = "HeadingID"
type withHeadingID struct {
}
func (o *withHeadingID) SetConfig(c *Config) {
c.Options[HeadingID] = true
}
func (o *withHeadingID) SetHeadingOption(p *HeadingConfig) {
p.HeadingID = true
}
// WithHeadingID is a functional option that enables custom heading ids and
// auto generated heading ids.
func WithHeadingID() interface {
Option
HeadingOption
} {
return &withHeadingID{}
}
var atxHeadingRegexp = regexp.MustCompile(`^[ ]{0,3}(#{1,6})(?:\s+(.*?)\s*([\s]#+\s*)?)?\n?$`)
type atxHeadingParser struct {
HeadingConfig
}
// NewATXHeadingParser return a new BlockParser that can parse ATX headings.
func NewATXHeadingParser(opts ...HeadingOption) BlockParser {
p := &atxHeadingParser{}
for _, o := range opts {
o.SetHeadingOption(&p.HeadingConfig)
}
return p
}
func (b *atxHeadingParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
line, segment := reader.PeekLine()
pos := pc.BlockOffset()
i := pos
for ; i < len(line) && line[i] == '#'; i++ {
}
level := i - pos
if i == pos || level > 6 {
return nil, NoChildren
}
l := util.TrimLeftSpaceLength(line[i:])
if l == 0 {
return nil, NoChildren
}
start := i + l
stop := len(line) - util.TrimRightSpaceLength(line)
if stop <= start { // empty headings like '##[space]'
stop = start + 1
} else {
i = stop - 1
for ; line[i] == '#' && i >= start; i-- {
}
if i != stop-1 && !util.IsSpace(line[i]) {
i = stop - 1
}
i++
stop = i
}
node := ast.NewHeading(level)
if len(util.TrimRight(line[start:stop], []byte{'#'})) != 0 { // empty heading like '### ###'
node.Lines().Append(text.NewSegment(segment.Start+start, segment.Start+stop))
}
return node, NoChildren
}
func (b *atxHeadingParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
return Close
}
func (b *atxHeadingParser) Close(node ast.Node, pc Context) {
if !b.HeadingID {
return
}
parseOrGenerateHeadingID(node.(*ast.Heading), pc)
}
func (b *atxHeadingParser) CanInterruptParagraph() bool {
return true
}
func (b *atxHeadingParser) CanAcceptIndentedLine() bool {
return false
}
var headingIDRegexp = regexp.MustCompile(`^(.*[^\\])({#([^}]+)}\s*)\n?$`)
var headingIDMap = NewContextKey()
func parseOrGenerateHeadingID(node *ast.Heading, pc Context) {
existsv := pc.Get(headingIDMap)
var exists map[string]bool
if existsv == nil {
exists = map[string]bool{}
pc.Set(headingIDMap, exists)
} else {
exists = existsv.(map[string]bool)
}
lastIndex := node.Lines().Len() - 1
lastLine := node.Lines().At(lastIndex)
line := lastLine.Value(pc.Source())
m := headingIDRegexp.FindSubmatchIndex(line)
var headingID []byte
if m != nil {
headingID = line[m[6]:m[7]]
lastLine.Stop -= m[5] - m[4]
node.Lines().Set(lastIndex, lastLine)
} else {
headingID = util.GenerateLinkID(line, exists)
}
node.ID = headingID
}

46
parser/auto_link.go Normal file
View file

@ -0,0 +1,46 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"regexp"
)
type autoLinkParser struct {
}
var defaultAutoLinkParser = &autoLinkParser{}
// NewAutoLinkParser returns a new InlineParser that parses autolinks
// surrounded by '<' and '>' .
func NewAutoLinkParser() InlineParser {
return defaultAutoLinkParser
}
func (s *autoLinkParser) Trigger() []byte {
return []byte{'<'}
}
var emailAutoLinkRegexp = regexp.MustCompile(`^<([a-zA-Z0-9.!#$%&'*+\/=?^_` + "`" + `{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*)>`)
var autoLinkRegexp = regexp.MustCompile(`(?i)^<[A-Za-z][A-Za-z0-9.+-]{1,31}:[^<>\x00-\x20]*>`)
func (s *autoLinkParser) Parse(parent ast.Node, block text.Reader, pc Context) ast.Node {
line, segment := block.PeekLine()
match := emailAutoLinkRegexp.FindSubmatchIndex(line)
typ := ast.AutoLinkType(ast.AutoLinkEmail)
if match == nil {
match = autoLinkRegexp.FindSubmatchIndex(line)
typ = ast.AutoLinkURL
}
if match == nil {
return nil
}
value := ast.NewTextSegment(text.NewSegment(segment.Start+1, segment.Start+match[1]-1))
block.Advance(match[1])
return ast.NewAutoLink(typ, value)
}
func (s *autoLinkParser) CloseBlock(parent ast.Node, pc Context) {
// nothing to do
}

65
parser/blockquote.go Normal file
View file

@ -0,0 +1,65 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type blockquoteParser struct {
}
var defaultBlockquoteParser = &blockquoteParser{}
// NewBlockquoteParser returns a new BlockParser that
// parses blockquotes.
func NewBlockquoteParser() BlockParser {
return defaultBlockquoteParser
}
func (b *blockquoteParser) process(reader text.Reader) bool {
line, _ := reader.PeekLine()
w, pos := util.IndentWidth(line, 0)
if w > 3 || pos >= len(line) || line[pos] != '>' {
return false
}
pos++
if pos >= len(line) || line[pos] == '\n' {
reader.Advance(pos)
return true
}
if line[pos] == ' ' || line[pos] == '\t' {
pos++
}
reader.Advance(pos)
if line[pos-1] == '\t' {
reader.SetPadding(2)
}
return true
}
func (b *blockquoteParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
if b.process(reader) {
return ast.NewBlockquote(), HasChildren
}
return nil, NoChildren
}
func (b *blockquoteParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
if b.process(reader) {
return Continue | HasChildren
}
return Close
}
func (b *blockquoteParser) Close(node ast.Node, pc Context) {
// nothing to do
}
func (b *blockquoteParser) CanInterruptParagraph() bool {
return true
}
func (b *blockquoteParser) CanAcceptIndentedLine() bool {
return false
}

75
parser/code_block.go Normal file
View file

@ -0,0 +1,75 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type codeBlockParser struct {
}
// CodeBlockParser is a BlockParser implementation that parses indented code blocks.
var defaultCodeBlockParser = &codeBlockParser{}
// NewCodeBlockParser returns a new BlockParser that
// parses code blocks.
func NewCodeBlockParser() BlockParser {
return defaultCodeBlockParser
}
func (b *codeBlockParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
line, segment := reader.PeekLine()
pos, padding := util.IndentPosition(line, reader.LineOffset(), 4)
if pos < 0 {
return nil, NoChildren
}
node := ast.NewCodeBlock()
reader.AdvanceAndSetPadding(pos, padding)
_, segment = reader.PeekLine()
node.Lines().Append(segment)
reader.Advance(segment.Len() - 1)
return node, NoChildren
}
func (b *codeBlockParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
line, segment := reader.PeekLine()
if util.IsBlank(line) {
node.Lines().Append(segment.TrimLeftSpaceWidth(4, reader.Source()))
return Continue | NoChildren
}
pos, padding := util.IndentPosition(line, reader.LineOffset(), 4)
if pos < 0 {
return Close
}
reader.AdvanceAndSetPadding(pos, padding)
_, segment = reader.PeekLine()
node.Lines().Append(segment)
reader.Advance(segment.Len() - 1)
return Continue | NoChildren
}
func (b *codeBlockParser) Close(node ast.Node, pc Context) {
// trim trailing blank lines
lines := node.Lines()
length := lines.Len() - 1
source := pc.Source()
for {
line := lines.At(length)
if util.IsBlank(line.Value(source)) {
length--
} else {
break
}
}
lines.SetSliced(0, length+1)
}
func (b *codeBlockParser) CanInterruptParagraph() bool {
return false
}
func (b *codeBlockParser) CanAcceptIndentedLine() bool {
return true
}

87
parser/code_span.go Normal file
View file

@ -0,0 +1,87 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type codeSpanParser struct {
}
var defaultCodeSpanParser = &codeSpanParser{}
// NewCodeSpanParser return a new InlineParser that parses inline codes
// surrounded by '`' .
func NewCodeSpanParser() InlineParser {
return defaultCodeSpanParser
}
func (s *codeSpanParser) Trigger() []byte {
return []byte{'`'}
}
func (s *codeSpanParser) Parse(parent ast.Node, block text.Reader, pc Context) ast.Node {
line, startSegment := block.PeekLine()
opener := 0
for ; opener < len(line) && line[opener] == '`'; opener++ {
}
block.Advance(opener)
l, pos := block.Position()
node := ast.NewCodeSpan()
for {
line, segment := block.PeekLine()
if line == nil {
block.SetPosition(l, pos)
return ast.NewTextSegment(startSegment.WithStop(startSegment.Start + opener))
}
for i := 0; i < len(line); i++ {
c := line[i]
if c == '`' {
oldi := i
for ; i < len(line) && line[i] == '`'; i++ {
}
closure := i - oldi
if closure == opener && (i+1 >= len(line) || line[i+1] != '`') {
segment := segment.WithStop(segment.Start + i - closure)
if !segment.IsEmpty() {
node.AppendChild(node, ast.NewRawTextSegment(segment))
}
block.Advance(i)
goto end
}
}
}
if !util.IsBlank(line) {
node.AppendChild(node, ast.NewRawTextSegment(segment))
}
block.AdvanceLine()
}
end:
if !node.IsBlank(pc.Source()) {
// trim first halfspace and last halfspace
segment := node.FirstChild().(*ast.Text).Segment
shouldTrimmed := true
if !(!segment.IsEmpty() && pc.Source()[segment.Start] == ' ') {
shouldTrimmed = false
}
segment = node.LastChild().(*ast.Text).Segment
if !(!segment.IsEmpty() && pc.Source()[segment.Stop-1] == ' ') {
shouldTrimmed = false
}
if shouldTrimmed {
t := node.FirstChild().(*ast.Text)
segment := t.Segment
t.Segment = segment.WithStart(segment.Start + 1)
t = node.LastChild().(*ast.Text)
segment = node.LastChild().(*ast.Text).Segment
t.Segment = segment.WithStop(segment.Stop - 1)
}
}
return node
}
func (s *codeSpanParser) CloseBlock(parent ast.Node, pc Context) {
// nothing to do
}

232
parser/delimiter.go Normal file
View file

@ -0,0 +1,232 @@
package parser
import (
"fmt"
"strings"
"unicode"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
// A DelimiterProcessor interface provides a set of functions about
// Deliiter nodes.
type DelimiterProcessor interface {
// IsDelimiter returns true if given character is a delimiter, otherwise false.
IsDelimiter(byte) bool
// CanOpenCloser returns true if given opener can close given closer, otherwise false.
CanOpenCloser(opener, closer *Delimiter) bool
// OnMatch will be called when new matched delimiter found.
// OnMatch should return a new Node correspond to the matched delimiter.
OnMatch(consumes int) ast.Node
}
// A Delimiter struct represents a delimiter like '*' of the Markdown text.
type Delimiter struct {
ast.BaseInline
Segment text.Segment
// CanOpen is set true if this delimiter can open a span for a new node.
// See https://spec.commonmark.org/0.29/#can-open-emphasis for details.
CanOpen bool
// CanClose is set true if this delimiter can close a span for a new node.
// See https://spec.commonmark.org/0.29/#can-open-emphasis for details.
CanClose bool
// Length is a remaining length of this delmiter.
Length int
// OriginalLength is a original length of this delimiter.
OriginalLength int
// Char is a character of this delimiter.
Char byte
// PreviousDelimiter is a previous sibling delimiter node of this delimiter.
PreviousDelimiter *Delimiter
// NextDelimiter is a next sibling delimiter node of this delimiter.
NextDelimiter *Delimiter
// Processor is a DelimiterProcessor associated with this delimiter.
Processor DelimiterProcessor
}
// Inline implements Inline.Inline.
func (d *Delimiter) Inline() {}
// Dump implements Node.Dump.
func (d *Delimiter) Dump(source []byte, level int) {
fmt.Printf("%sDelimiter: \"%s\"\n", strings.Repeat(" ", level), string(d.Text(source)))
}
// Text implements Node.Text
func (d *Delimiter) Text(source []byte) []byte {
return d.Segment.Value(source)
}
// ConsumeCharacters consumes delimiters.
func (d *Delimiter) ConsumeCharacters(n int) {
d.Length -= n
d.Segment = d.Segment.WithStop(d.Segment.Start + d.Length)
}
// CalcComsumption calculates how many characters should be used for opening
// a new span correspond to given closer.
func (d *Delimiter) CalcComsumption(closer *Delimiter) int {
if (d.CanClose || closer.CanOpen) && (d.OriginalLength+closer.OriginalLength)%3 == 0 && closer.OriginalLength%3 != 0 {
return 0
}
if d.Length >= 2 && closer.Length >= 2 {
return 2
}
return 1
}
// NewDelimiter returns a new Delimiter node.
func NewDelimiter(canOpen, canClose bool, length int, char byte, processor DelimiterProcessor) *Delimiter {
c := &Delimiter{
BaseInline: ast.BaseInline{},
CanOpen: canOpen,
CanClose: canClose,
Length: length,
OriginalLength: length,
Char: char,
PreviousDelimiter: nil,
NextDelimiter: nil,
Processor: processor,
}
return c
}
// ScanDelimiter scans a delimiter by given DelimiterProcessor.
func ScanDelimiter(line []byte, before rune, min int, processor DelimiterProcessor) *Delimiter {
i := 0
c := line[i]
j := i
if !processor.IsDelimiter(c) {
return nil
}
for ; j < len(line) && c == line[j]; j++ {
}
if (j - i) >= min {
after := rune(' ')
if j != len(line) {
after = util.ToRune(line, j)
}
isLeft, isRight, canOpen, canClose := false, false, false, false
beforeIsPunctuation := unicode.IsPunct(before)
beforeIsWhitespace := unicode.IsSpace(before)
afterIsPunctuation := unicode.IsPunct(after)
afterIsWhitespace := unicode.IsSpace(after)
isLeft = !afterIsWhitespace &&
(!afterIsPunctuation || beforeIsWhitespace || beforeIsPunctuation)
isRight = !beforeIsWhitespace &&
(!beforeIsPunctuation || afterIsWhitespace || afterIsPunctuation)
if line[i] == '_' {
canOpen = isLeft && (!isRight || beforeIsPunctuation)
canClose = isRight && (!isLeft || afterIsPunctuation)
} else {
canOpen = isLeft
canClose = isRight
}
return NewDelimiter(canOpen, canClose, j-i, c, processor)
}
return nil
}
// ProcessDelimiters processes the delimiter list in the context.
// Processing will be stop when reaching the bottom.
//
// If you implement an inline parser that can have other inline nodes as
// children, you should call this function when nesting span has closed.
func ProcessDelimiters(bottom ast.Node, pc Context) {
if pc.LastDelimiter() == nil {
return
}
var closer *Delimiter
if bottom != nil {
for c := pc.LastDelimiter().PreviousSibling(); c != nil; {
if d, ok := c.(*Delimiter); ok {
closer = d
}
prev := c.PreviousSibling()
if prev == bottom {
break
}
c = prev
}
} else {
closer = pc.FirstDelimiter()
}
if closer == nil {
pc.ClearDelimiters(bottom)
return
}
for closer != nil {
if !closer.CanClose {
closer = closer.NextDelimiter
continue
}
consume := 0
found := false
maybeOpener := false
var opener *Delimiter
for opener = closer.PreviousDelimiter; opener != nil; opener = opener.PreviousDelimiter {
if opener.CanOpen && opener.Processor.CanOpenCloser(opener, closer) {
maybeOpener = true
consume = opener.CalcComsumption(closer)
if consume > 0 {
found = true
break
}
}
}
if !found {
if !maybeOpener && !closer.CanOpen {
pc.RemoveDelimiter(closer)
}
closer = closer.NextDelimiter
continue
}
opener.ConsumeCharacters(consume)
closer.ConsumeCharacters(consume)
node := opener.Processor.OnMatch(consume)
parent := opener.Parent()
child := opener.NextSibling()
for child != nil && child != closer {
next := child.NextSibling()
node.AppendChild(node, child)
child = next
}
parent.InsertAfter(parent, opener, node)
for c := opener.NextDelimiter; c != nil && c != closer; {
next := c.NextDelimiter
pc.RemoveDelimiter(c)
c = next
}
if opener.Length == 0 {
pc.RemoveDelimiter(opener)
}
if closer.Length == 0 {
next := closer.NextDelimiter
pc.RemoveDelimiter(closer)
closer = next
}
}
pc.ClearDelimiters(bottom)
}

54
parser/emphasis.go Normal file
View file

@ -0,0 +1,54 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
)
type emphasisDelimiterProcessor struct {
}
func (p *emphasisDelimiterProcessor) IsDelimiter(b byte) bool {
return b == '*' || b == '_'
}
func (p *emphasisDelimiterProcessor) CanOpenCloser(opener, closer *Delimiter) bool {
return opener.Char == closer.Char
}
func (p *emphasisDelimiterProcessor) OnMatch(consumes int) ast.Node {
return ast.NewEmphasis(consumes)
}
var defaultEmphasisDelimiterProcessor = &emphasisDelimiterProcessor{}
type emphasisParser struct {
}
var defaultEmphasisParser = &emphasisParser{}
// NewEmphasisParser return a new InlineParser that parses emphasises.
func NewEmphasisParser() InlineParser {
return defaultEmphasisParser
}
func (s *emphasisParser) Trigger() []byte {
return []byte{'*', '_'}
}
func (s *emphasisParser) Parse(parent ast.Node, block text.Reader, pc Context) ast.Node {
before := block.PrecendingCharacter()
line, segment := block.PeekLine()
node := ScanDelimiter(line, before, 1, defaultEmphasisDelimiterProcessor)
if node == nil {
return nil
}
node.Segment = segment.WithStop(segment.Start + node.OriginalLength)
block.Advance(node.OriginalLength)
pc.PushDelimiter(node)
return node
}
func (s *emphasisParser) CloseBlock(parent ast.Node, pc Context) {
// nothing to do
}

96
parser/fcode_block.go Normal file
View file

@ -0,0 +1,96 @@
package parser
import (
"bytes"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type fencedCodeBlockParser struct {
}
var defaultFencedCodeBlockParser = &fencedCodeBlockParser{}
// NewFencedCodeBlockParser returns a new BlockParser that
// parses fenced code blocks.
func NewFencedCodeBlockParser() BlockParser {
return defaultFencedCodeBlockParser
}
type fenceData struct {
char byte
indent int
length int
}
var fencedCodeBlockInfoKey = NewContextKey()
func (b *fencedCodeBlockParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
line, segment := reader.PeekLine()
pos := pc.BlockOffset()
if line[pos] != '`' && line[pos] != '~' {
return nil, NoChildren
}
findent := pos
fenceChar := line[pos]
i := pos
for ; i < len(line) && line[i] == fenceChar; i++ {
}
oFenceLength := i - pos
if oFenceLength < 3 {
return nil, NoChildren
}
var info *ast.Text
if i < len(line)-1 {
rest := line[i:]
left := util.TrimLeftSpaceLength(rest)
right := util.TrimRightSpaceLength(rest)
infoStart, infoStop := segment.Start+i+left, segment.Stop-right
value := rest[left : len(rest)-right]
if fenceChar == '`' && bytes.IndexByte(value, '`') > -1 {
return nil, NoChildren
} else if infoStart != infoStop {
info = ast.NewTextSegment(text.NewSegment(infoStart, infoStop))
}
}
pc.Set(fencedCodeBlockInfoKey, &fenceData{fenceChar, findent, oFenceLength})
node := ast.NewFencedCodeBlock(info)
return node, NoChildren
}
func (b *fencedCodeBlockParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
line, segment := reader.PeekLine()
fdata := pc.Get(fencedCodeBlockInfoKey).(*fenceData)
w, pos := util.IndentWidth(line, 0)
if w < 4 {
i := pos
for ; i < len(line) && line[i] == fdata.char; i++ {
}
length := i - pos
if length >= fdata.length && util.IsBlank(line[i:]) {
reader.Advance(segment.Stop - segment.Start - 1 - segment.Padding)
return Close
}
}
pos, padding := util.DedentPosition(line, fdata.indent)
seg := text.NewSegmentPadding(segment.Start+pos, segment.Stop, padding)
node.Lines().Append(seg)
reader.AdvanceAndSetPadding(segment.Stop-segment.Start-pos-1, padding)
return Continue | NoChildren
}
func (b *fencedCodeBlockParser) Close(node ast.Node, pc Context) {
pc.Set(fencedCodeBlockInfoKey, nil)
}
func (b *fencedCodeBlockParser) CanInterruptParagraph() bool {
return true
}
func (b *fencedCodeBlockParser) CanAcceptIndentedLine() bool {
return false
}

278
parser/html_block.go Normal file
View file

@ -0,0 +1,278 @@
package parser
import (
"bytes"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"regexp"
"strings"
)
// An HTMLConfig struct is a data structure that holds configuration of the renderers related to raw htmls.
type HTMLConfig struct {
FilterTags map[string]bool
}
// SetOption implements SetOptioner.
func (b *HTMLConfig) SetOption(name OptionName, value interface{}) {
switch name {
case FilterTags:
b.FilterTags = value.(map[string]bool)
}
}
// A HTMLOption interface sets options for the raw HTML parsers.
type HTMLOption interface {
SetHTMLOption(*HTMLConfig)
}
// FilterTags is an otpion name that specify forbidden tag names.
const FilterTags OptionName = "FilterTags"
type withFilterTags struct {
value map[string]bool
}
func (o *withFilterTags) SetConfig(c *Config) {
c.Options[FilterTags] = o.value
}
func (o *withFilterTags) SetHTMLOption(p *HTMLConfig) {
p.FilterTags = o.value
}
// WithFilterTags is a functional otpion that specify forbidden tag names.
func WithFilterTags(names ...string) interface {
Option
HTMLOption
} {
m := map[string]bool{}
for _, name := range names {
m[name] = true
}
return &withFilterTags{m}
}
var allowedBlockTags = map[string]bool{
"address": true,
"article": true,
"aside": true,
"base": true,
"basefont": true,
"blockquote": true,
"body": true,
"caption": true,
"center": true,
"col": true,
"colgroup": true,
"dd": true,
"details": true,
"dialog": true,
"dir": true,
"div": true,
"dl": true,
"dt": true,
"fieldset": true,
"figcaption": true,
"figure": true,
"footer": true,
"form": true,
"frame": true,
"frameset": true,
"h1": true,
"h2": true,
"h3": true,
"h4": true,
"h5": true,
"h6": true,
"head": true,
"header": true,
"hr": true,
"html": true,
"iframe": true,
"legend": true,
"li": true,
"link": true,
"main": true,
"menu": true,
"menuitem": true,
"meta": true,
"nav": true,
"noframes": true,
"ol": true,
"optgroup": true,
"option": true,
"p": true,
"param": true,
"section": true,
"source": true,
"summary": true,
"table": true,
"tbody": true,
"td": true,
"tfoot": true,
"th": true,
"thead": true,
"title": true,
"tr": true,
"track": true,
"ul": true,
}
var htmlBlockType1OpenRegexp = regexp.MustCompile(`(?i)^[ ]{0,3}<(script|pre|style)(?:\s.*|>.*|/>.*|)\n?$`)
var htmlBlockType1CloseRegexp = regexp.MustCompile(`(?i)^[ ]{0,3}(?:[^ ].*|)</(?:script|pre|style)>.*`)
var htmlBlockType2OpenRegexp = regexp.MustCompile(`^[ ]{0,3}<!\-\-`)
var htmlBlockType2Close = []byte{'-', '-', '>'}
var htmlBlockType3OpenRegexp = regexp.MustCompile(`^[ ]{0,3}<\?`)
var htmlBlockType3Close = []byte{'?', '>'}
var htmlBlockType4OpenRegexp = regexp.MustCompile(`^[ ]{0,3}<![A-Z]+.*\n?$`)
var htmlBlockType4Close = []byte{'>'}
var htmlBlockType5OpenRegexp = regexp.MustCompile(`^[ ]{0,3}<\!\[CDATA\[`)
var htmlBlockType5Close = []byte{']', ']', '>'}
var htmlBlockType6Regexp = regexp.MustCompile(`^[ ]{0,3}</?([a-zA-Z0-9]+)(?:\s.*|>.*|/>.*|)\n?$`)
var htmlBlockType7Regexp = regexp.MustCompile(`^[ ]{0,3}<(/)?([a-zA-Z0-9]+)(` + attributePattern + `*)(:?>|/>)\s*\n?$`)
type htmlBlockParser struct {
HTMLConfig
}
// NewHTMLBlockParser return a new BlockParser that can parse html
// blocks.
func NewHTMLBlockParser(opts ...HTMLOption) BlockParser {
p := &htmlBlockParser{}
for _, o := range opts {
o.SetHTMLOption(&p.HTMLConfig)
}
return p
}
func (b *htmlBlockParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
var node *ast.HTMLBlock
line, segment := reader.PeekLine()
last := pc.LastOpenedBlock().Node
if pos := pc.BlockOffset(); line[pos] != '<' {
return nil, NoChildren
}
tagName := ""
if m := htmlBlockType1OpenRegexp.FindSubmatchIndex(line); m != nil {
tagName = string(line[m[2]:m[3]])
node = ast.NewHTMLBlock(ast.HTMLBlockType1)
} else if htmlBlockType2OpenRegexp.Match(line) {
node = ast.NewHTMLBlock(ast.HTMLBlockType2)
} else if htmlBlockType3OpenRegexp.Match(line) {
node = ast.NewHTMLBlock(ast.HTMLBlockType3)
} else if htmlBlockType4OpenRegexp.Match(line) {
node = ast.NewHTMLBlock(ast.HTMLBlockType4)
} else if htmlBlockType5OpenRegexp.Match(line) {
node = ast.NewHTMLBlock(ast.HTMLBlockType5)
} else if match := htmlBlockType7Regexp.FindSubmatchIndex(line); match != nil {
isCloseTag := match[2] > -1 && bytes.Equal(line[match[2]:match[3]], []byte("/"))
hasAttr := match[6] != match[7]
tagName = strings.ToLower(string(line[match[4]:match[5]]))
_, ok := allowedBlockTags[strings.ToLower(string(tagName))]
if ok {
node = ast.NewHTMLBlock(ast.HTMLBlockType6)
} else if tagName != "script" && tagName != "style" && tagName != "pre" && !ast.IsParagraph(last) && !(isCloseTag && hasAttr) { // type 7 can not interrupt paragraph
node = ast.NewHTMLBlock(ast.HTMLBlockType7)
}
}
if node == nil {
if match := htmlBlockType6Regexp.FindSubmatchIndex(line); match != nil {
tagName = string(line[match[2]:match[3]])
_, ok := allowedBlockTags[strings.ToLower(tagName)]
if ok {
node = ast.NewHTMLBlock(ast.HTMLBlockType6)
}
}
}
if node != nil {
if b.FilterTags != nil {
if _, ok := b.FilterTags[tagName]; ok {
return nil, NoChildren
}
}
reader.Advance(segment.Len() - 1)
node.Lines().Append(segment)
return node, NoChildren
}
return nil, NoChildren
}
func (b *htmlBlockParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
htmlBlock := node.(*ast.HTMLBlock)
lines := htmlBlock.Lines()
line, segment := reader.PeekLine()
var closurePattern []byte
switch htmlBlock.HTMLBlockType {
case ast.HTMLBlockType1:
if lines.Len() == 1 {
firstLine := lines.At(0)
if htmlBlockType1CloseRegexp.Match(firstLine.Value(reader.Source())) {
return Close
}
}
if htmlBlockType1CloseRegexp.Match(line) {
htmlBlock.ClosureLine = segment
reader.Advance(segment.Len() - 1)
return Close
}
case ast.HTMLBlockType2:
closurePattern = htmlBlockType2Close
fallthrough
case ast.HTMLBlockType3:
if closurePattern == nil {
closurePattern = htmlBlockType3Close
}
fallthrough
case ast.HTMLBlockType4:
if closurePattern == nil {
closurePattern = htmlBlockType4Close
}
fallthrough
case ast.HTMLBlockType5:
if closurePattern == nil {
closurePattern = htmlBlockType5Close
}
if lines.Len() == 1 {
firstLine := lines.At(0)
if bytes.Contains(firstLine.Value(reader.Source()), closurePattern) {
return Close
}
}
if bytes.Contains(line, closurePattern) {
htmlBlock.ClosureLine = segment
reader.Advance(segment.Len() - 1)
return Close
}
case ast.HTMLBlockType6, ast.HTMLBlockType7:
if util.IsBlank(line) {
return Close
}
}
node.Lines().Append(segment)
reader.Advance(segment.Len() - 1)
return Continue | NoChildren
}
func (b *htmlBlockParser) Close(node ast.Node, pc Context) {
// nothing to do
}
func (b *htmlBlockParser) CanInterruptParagraph() bool {
return true
}
func (b *htmlBlockParser) CanAcceptIndentedLine() bool {
return false
}

366
parser/link.go Normal file
View file

@ -0,0 +1,366 @@
package parser
import (
"fmt"
"regexp"
"strings"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
var linkLabelStateKey = NewContextKey()
type linkLabelState struct {
ast.BaseInline
Segment text.Segment
IsImage bool
Prev *linkLabelState
Next *linkLabelState
First *linkLabelState
Last *linkLabelState
}
func newLinkLabelState(segment text.Segment, isImage bool) *linkLabelState {
return &linkLabelState{
Segment: segment,
IsImage: isImage,
}
}
func (s *linkLabelState) Text(source []byte) []byte {
return s.Segment.Value(source)
}
func (s *linkLabelState) Dump(source []byte, level int) {
fmt.Printf("%slinkLabelState: \"%s\"\n", strings.Repeat(" ", level), s.Text(source))
}
func pushLinkLabelState(pc Context, v *linkLabelState) {
tlist := pc.Get(linkLabelStateKey)
var list *linkLabelState
if tlist == nil {
list = v
v.First = v
v.Last = v
pc.Set(linkLabelStateKey, list)
} else {
list = tlist.(*linkLabelState)
l := list.Last
list.Last = v
l.Next = v
v.Prev = l
}
}
func removeLinkLabelState(pc Context, d *linkLabelState) {
tlist := pc.Get(linkLabelStateKey)
var list *linkLabelState
if tlist == nil {
return
}
list = tlist.(*linkLabelState)
if d.Prev == nil {
list = d.Next
if list != nil {
list.First = d
list.Last = d.Last
list.Prev = nil
pc.Set(linkLabelStateKey, list)
} else {
pc.Set(linkLabelStateKey, nil)
}
} else {
d.Prev.Next = d.Next
if d.Next != nil {
d.Next.Prev = d.Prev
}
}
if list != nil && d.Next == nil {
list.Last = d.Prev
}
d.Next = nil
d.Prev = nil
d.First = nil
d.Last = nil
}
type linkParser struct {
}
var defaultLinkParser = &linkParser{}
// NewLinkParser return a new InlineParser that parses links.
func NewLinkParser() InlineParser {
return defaultLinkParser
}
func (s *linkParser) Trigger() []byte {
return []byte{'!', '[', ']'}
}
var linkDestinationRegexp = regexp.MustCompile(`\s*([^\s].+)`)
var linkTitleRegexp = regexp.MustCompile(`\s+(\)|["'\(].+)`)
var linkBottom = NewContextKey()
func (s *linkParser) Parse(parent ast.Node, block text.Reader, pc Context) ast.Node {
line, segment := block.PeekLine()
if line[0] == '!' && len(line) > 1 && line[1] == '[' {
block.Advance(1)
pc.Set(linkBottom, pc.LastDelimiter())
return processLinkLabelOpen(block, segment.Start+1, true, pc)
}
if line[0] == '[' {
pc.Set(linkBottom, pc.LastDelimiter())
return processLinkLabelOpen(block, segment.Start, false, pc)
}
// line[0] == ']'
tlist := pc.Get(linkLabelStateKey)
if tlist == nil {
return nil
}
last := tlist.(*linkLabelState).Last
if last == nil {
return nil
}
block.Advance(1)
removeLinkLabelState(pc, last)
if s.containsLink(last) { // a link in a link text is not allowed
ast.MergeOrReplaceTextSegment(last.Parent(), last, last.Segment)
return nil
}
labelValue := block.Value(text.NewSegment(last.Segment.Start+1, segment.Start))
if util.IsBlank(labelValue) && !last.IsImage {
ast.MergeOrReplaceTextSegment(last.Parent(), last, last.Segment)
return nil
}
c := block.Peek()
l, pos := block.Position()
var link *ast.Link
var hasValue bool
if c == '(' { // normal link
link = s.parseLink(parent, last, block, pc)
} else if c == '[' { // reference link
link, hasValue = s.parseReferenceLink(parent, last, block, pc)
if link == nil && hasValue {
ast.MergeOrReplaceTextSegment(last.Parent(), last, last.Segment)
return nil
}
}
if link == nil {
// maybe shortcut reference link
block.SetPosition(l, pos)
ssegment := text.NewSegment(last.Segment.Stop, segment.Start)
maybeReference := block.Value(ssegment)
ref, ok := pc.Reference(util.ToLinkReference(maybeReference))
if !ok {
ast.MergeOrReplaceTextSegment(last.Parent(), last, last.Segment)
return nil
}
link = ast.NewLink()
s.processLinkLabel(parent, link, last, pc)
link.Title = ref.Title()
link.Destination = ref.Destination()
}
if last.IsImage {
last.Parent().RemoveChild(last.Parent(), last)
return ast.NewImage(link)
}
last.Parent().RemoveChild(last.Parent(), last)
return link
}
func (s *linkParser) containsLink(last *linkLabelState) bool {
if last.IsImage {
return false
}
var c ast.Node
for c = last; c != nil; c = c.NextSibling() {
if _, ok := c.(*ast.Link); ok {
return true
}
}
return false
}
func processLinkLabelOpen(block text.Reader, pos int, isImage bool, pc Context) *linkLabelState {
start := pos
if isImage {
start--
}
state := newLinkLabelState(text.NewSegment(start, pos+1), isImage)
pushLinkLabelState(pc, state)
block.Advance(1)
return state
}
func (s *linkParser) processLinkLabel(parent ast.Node, link *ast.Link, last *linkLabelState, pc Context) {
var bottom ast.Node
if v := pc.Get(linkBottom); v != nil {
bottom = v.(ast.Node)
}
pc.Set(linkBottom, nil)
ProcessDelimiters(bottom, pc)
for c := last.NextSibling(); c != nil; {
next := c.NextSibling()
parent.RemoveChild(parent, c)
link.AppendChild(link, c)
c = next
}
}
func (s *linkParser) parseReferenceLink(parent ast.Node, last *linkLabelState, block text.Reader, pc Context) (*ast.Link, bool) {
_, orgpos := block.Position()
block.Advance(1) // skip '['
line, segment := block.PeekLine()
endIndex := util.FindClosure(line, '[', ']', false, true)
if endIndex < 0 {
return nil, false
}
block.Advance(endIndex + 1)
ssegment := segment.WithStop(segment.Start + endIndex)
maybeReference := block.Value(ssegment)
if util.IsBlank(maybeReference) { // collapsed reference link
ssegment = text.NewSegment(last.Segment.Stop, orgpos.Start-1)
maybeReference = block.Value(ssegment)
}
ref, ok := pc.Reference(util.ToLinkReference(maybeReference))
if !ok {
return nil, true
}
link := ast.NewLink()
s.processLinkLabel(parent, link, last, pc)
link.Title = ref.Title()
link.Destination = ref.Destination()
return link, true
}
func (s *linkParser) parseLink(parent ast.Node, last *linkLabelState, block text.Reader, pc Context) *ast.Link {
block.Advance(1) // skip '('
block.SkipSpaces()
var title []byte
var destination []byte
var ok bool
if block.Peek() == ')' { // empty link like '[link]()'
block.Advance(1)
} else {
destination, ok = parseLinkDestination(block)
if !ok {
return nil
}
block.SkipSpaces()
if block.Peek() == ')' {
block.Advance(1)
} else {
title, ok = parseLinkTitle(block)
if !ok {
return nil
}
block.SkipSpaces()
if block.Peek() == ')' {
block.Advance(1)
} else {
return nil
}
}
}
link := ast.NewLink()
s.processLinkLabel(parent, link, last, pc)
link.Destination = destination
link.Title = title
return link
}
func parseLinkDestination(block text.Reader) ([]byte, bool) {
block.SkipSpaces()
line, _ := block.PeekLine()
buf := []byte{}
if block.Peek() == '<' {
i := 1
for i < len(line) {
c := line[i]
if c == '\\' && i < len(line)-1 && util.IsPunct(line[i+1]) {
buf = append(buf, '\\', line[i+1])
i += 2
continue
} else if c == '>' {
block.Advance(i + 1)
return line[1:i], true
}
buf = append(buf, c)
i++
}
return nil, false
}
opened := 0
i := 0
for i < len(line) {
c := line[i]
if c == '\\' && i < len(line)-1 && util.IsPunct(line[i+1]) {
buf = append(buf, '\\', line[i+1])
i += 2
continue
} else if c == '(' {
opened++
} else if c == ')' {
opened--
if opened < 0 {
break
}
} else if util.IsSpace(c) {
break
}
buf = append(buf, c)
i++
}
block.Advance(i)
return line[:i], len(line[:i]) != 0
}
func parseLinkTitle(block text.Reader) ([]byte, bool) {
block.SkipSpaces()
opener := block.Peek()
if opener != '"' && opener != '\'' && opener != '(' {
return nil, false
}
closer := opener
if opener == '(' {
closer = ')'
}
line, _ := block.PeekLine()
pos := util.FindClosure(line[1:], opener, closer, false, true)
if pos < 0 {
return nil, false
}
pos += 2 // opener + closer
block.Advance(pos)
return line[1 : pos-1], true
}
func (s *linkParser) CloseBlock(parent ast.Node, pc Context) {
tlist := pc.Get(linkLabelStateKey)
if tlist == nil {
return
}
for s := tlist.(*linkLabelState); s != nil; {
next := s.Next
removeLinkLabelState(pc, s)
s.Parent().ReplaceChild(s.Parent(), s, ast.NewTextSegment(s.Segment))
s = next
}
}

163
parser/link_ref.go Normal file
View file

@ -0,0 +1,163 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type linkReferenceParagraphTransformer struct {
}
// LinkReferenceParagraphTransformer is a ParagraphTransformer implementation
// that parses and extracts link reference from paragraphs.
var LinkReferenceParagraphTransformer = &linkReferenceParagraphTransformer{}
func (p *linkReferenceParagraphTransformer) Transform(node *ast.Paragraph, pc Context) {
lines := node.Lines()
block := text.NewBlockReader(pc.Source(), lines)
removes := [][2]int{}
for {
start, end := parseLinkReferenceDefinition(block, pc)
if start > -1 {
if start == end {
end++
}
removes = append(removes, [2]int{start, end})
continue
}
break
}
offset := 0
for _, remove := range removes {
if lines.Len() == 0 {
break
}
s := lines.Sliced(remove[1]-offset, lines.Len())
lines.SetSliced(0, remove[0]-offset)
lines.AppendAll(s)
offset = remove[1]
}
if lines.Len() == 0 {
t := ast.NewTextBlock()
t.SetBlankPreviousLines(node.HasBlankPreviousLines())
node.Parent().ReplaceChild(node.Parent(), node, t)
return
}
node.SetLines(lines)
}
func parseLinkReferenceDefinition(block text.Reader, pc Context) (int, int) {
block.SkipSpaces()
line, segment := block.PeekLine()
if line == nil {
return -1, -1
}
startLine, _ := block.Position()
width, pos := util.IndentWidth(line, 0)
if width > 3 {
return -1, -1
}
if width != 0 {
pos++
}
if line[pos] != '[' {
return -1, -1
}
open := segment.Start + pos + 1
closes := -1
block.Advance(pos + 1)
for {
line, segment = block.PeekLine()
if line == nil {
return -1, -1
}
closure := util.FindClosure(line, '[', ']', false, false)
if closure > -1 {
closes = segment.Start + closure
next := closure + 1
if next >= len(line) || line[next] != ':' {
return -1, -1
}
block.Advance(next + 1)
break
}
block.AdvanceLine()
}
if closes < 0 {
return -1, -1
}
label := block.Value(text.NewSegment(open, closes))
if util.IsBlank(label) {
return -1, -1
}
block.SkipSpaces()
destination, ok := parseLinkDestination(block)
if !ok {
return -1, -1
}
line, segment = block.PeekLine()
isNewLine := line == nil || util.IsBlank(line)
endLine, _ := block.Position()
_, spaces, _ := block.SkipSpaces()
opener := block.Peek()
if opener != '"' && opener != '\'' && opener != '(' {
if !isNewLine {
return -1, -1
}
ref := NewReference(label, destination, nil)
pc.AddReference(ref)
return startLine, endLine + 1
}
if spaces == 0 {
return -1, -1
}
block.Advance(1)
open = -1
closes = -1
closer := opener
if opener == '(' {
closer = ')'
}
for {
line, segment = block.PeekLine()
if line == nil {
return -1, -1
}
if open < 0 {
open = segment.Start
}
closure := util.FindClosure(line, opener, closer, false, true)
if closure > -1 {
closes = segment.Start + closure
block.Advance(closure + 1)
break
}
block.AdvanceLine()
}
if closes < 0 {
return -1, -1
}
line, segment = block.PeekLine()
if line != nil && !util.IsBlank(line) {
if !isNewLine {
return -1, -1
}
title := block.Value(text.NewSegment(open, closes))
ref := NewReference(label, destination, title)
pc.AddReference(ref)
return startLine, endLine
}
title := block.Value(text.NewSegment(open, closes))
endLine, _ = block.Position()
ref := NewReference(label, destination, title)
pc.AddReference(ref)
return startLine, endLine + 1
}

241
parser/list.go Normal file
View file

@ -0,0 +1,241 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"strconv"
)
type listItemType int
const (
notList listItemType = iota
bulletList
orderedList
)
// Same as
// `^(([ ]*)([\-\*\+]))(\s+.*)?\n?$`.FindSubmatchIndex or
// `^(([ ]*)(\d{1,9}[\.\)]))(\s+.*)?\n?$`.FindSubmatchIndex
func parseListItem(line []byte) ([6]int, listItemType) {
i := 0
l := len(line)
ret := [6]int{}
for ; i < l && line[i] == ' '; i++ {
c := line[i]
if c == '\t' {
return ret, notList
}
}
if i > 3 {
return ret, notList
}
ret[0] = 0
ret[1] = i
ret[2] = i
var typ listItemType
if i < l && line[i] == '-' || line[i] == '*' || line[i] == '+' {
i++
ret[3] = i
typ = bulletList
} else if i < l {
for ; i < l && util.IsNumeric(line[i]); i++ {
}
ret[3] = i
if ret[3] == ret[2] || ret[3]-ret[2] > 9 {
return ret, notList
}
if i < l && line[i] == '.' || line[i] == ')' {
i++
ret[3] = i
} else {
return ret, notList
}
typ = orderedList
} else {
return ret, notList
}
if line[i] != '\n' {
w, _ := util.IndentWidth(line[i:], 0)
if w == 0 {
return ret, notList
}
}
ret[4] = i
ret[5] = len(line)
if line[ret[5]-1] == '\n' && line[i] != '\n' {
ret[5]--
}
return ret, typ
}
func matchesListItem(source []byte, strict bool) ([6]int, listItemType) {
m, typ := parseListItem(source)
if typ != notList && (!strict || strict && m[1] < 4) {
return m, typ
}
return m, notList
}
func calcListOffset(source []byte, match [6]int) int {
offset := 0
if util.IsBlank(source[match[4]:]) { // list item starts with a blank line
offset = 1
} else {
offset, _ = util.IndentWidth(source[match[4]:], match[2])
if offset > 4 { // offseted codeblock
offset = 1
}
}
return offset
}
func lastOffset(node ast.Node) int {
lastChild := node.LastChild()
if lastChild != nil {
return lastChild.(*ast.ListItem).Offset
}
return 0
}
type listParser struct {
}
var defaultListParser = &listParser{}
// NewListParser returns a new BlockParser that
// parses lists.
// This parser must take predecence over the ListItemParser.
func NewListParser() BlockParser {
return defaultListParser
}
func (b *listParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
last := pc.LastOpenedBlock().Node
if _, lok := last.(*ast.List); lok || pc.Get(skipListParser) != nil {
pc.Set(skipListParser, nil)
return nil, NoChildren
}
line, _ := reader.PeekLine()
match, typ := matchesListItem(line, true)
if typ == notList {
return nil, NoChildren
}
start := -1
if typ == orderedList {
number := line[match[2] : match[3]-1]
start, _ = strconv.Atoi(string(number))
}
if ast.IsParagraph(last) && last.Parent() == parent {
// we allow only lists starting with 1 to interrupt paragraphs.
if typ == orderedList && start != 1 {
return nil, NoChildren
}
//an empty list item cannot interrupt a paragraph:
if match[5]-match[4] == 1 {
return nil, NoChildren
}
}
marker := line[match[3]-1]
node := ast.NewList(marker)
if start > -1 {
node.Start = start
}
return node, HasChildren
}
func (b *listParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
list := node.(*ast.List)
line, _ := reader.PeekLine()
if util.IsBlank(line) {
// A list item can begin with at most one blank line
if node.ChildCount() == 1 && node.LastChild().ChildCount() == 0 {
return Close
}
return Continue | HasChildren
}
// Themantic Breaks take predecence over lists
if isThemanticBreak(line) {
isHeading := false
last := pc.LastOpenedBlock().Node
if ast.IsParagraph(last) {
c, ok := matchesSetextHeadingBar(line)
if ok && c == '-' {
isHeading = true
}
}
if !isHeading {
return Close
}
}
// "offset" means a width that bar indicates.
// - aaaaaaaa
// |----|
//
// If the indent is less than the last offset like
// - a
// - b <--- current line
// it maybe a new child of the list.
offset := lastOffset(node)
indent, _ := util.IndentWidth(line, 0)
if indent < offset {
if indent < 4 {
match, typ := matchesListItem(line, false) // may have a leading spaces more than 3
if typ != notList && match[1]-offset < 4 {
marker := line[match[3]-1]
if !list.CanContinue(marker, typ == orderedList) {
return Close
}
return Continue | HasChildren
}
}
return Close
}
return Continue | HasChildren
}
func (b *listParser) Close(node ast.Node, pc Context) {
list := node.(*ast.List)
for c := node.FirstChild(); c != nil && list.IsTight; c = c.NextSibling() {
if c.FirstChild() != nil && c.FirstChild() != c.LastChild() {
for c1 := c.FirstChild().NextSibling(); c1 != nil; c1 = c1.NextSibling() {
if bl, ok := c1.(ast.Node); ok && bl.HasBlankPreviousLines() {
list.IsTight = false
break
}
}
}
if c != node.FirstChild() {
if bl, ok := c.(ast.Node); ok && bl.HasBlankPreviousLines() {
list.IsTight = false
}
}
}
if list.IsTight {
for child := node.FirstChild(); child != nil; child = child.NextSibling() {
for gc := child.FirstChild(); gc != nil; gc = gc.NextSibling() {
paragraph, ok := gc.(*ast.Paragraph)
if ok {
textBlock := ast.NewTextBlock()
textBlock.SetLines(paragraph.Lines())
child.ReplaceChild(child, paragraph, textBlock)
}
}
}
}
}
func (b *listParser) CanInterruptParagraph() bool {
return true
}
func (b *listParser) CanAcceptIndentedLine() bool {
return false
}

81
parser/list_item.go Normal file
View file

@ -0,0 +1,81 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type listItemParser struct {
}
var defaultListItemParser = &listItemParser{}
// NewListItemParser returns a new BlockParser that
// parses list items.
func NewListItemParser() BlockParser {
return defaultListItemParser
}
var skipListParser = NewContextKey()
var skipListParserValue interface{} = true
func (b *listItemParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
list, lok := parent.(*ast.List)
if !lok { // list item must be a child of a list
return nil, NoChildren
}
offset := lastOffset(list)
line, _ := reader.PeekLine()
match, typ := matchesListItem(line, false)
if typ == notList {
return nil, NoChildren
}
if match[1]-offset > 3 {
return nil, NoChildren
}
itemOffset := calcListOffset(line, match)
node := ast.NewListItem(match[3] + itemOffset)
if match[5]-match[4] == 1 {
return node, NoChildren
}
pos, padding := util.IndentPosition(line[match[4]:], match[4], itemOffset)
child := match[3] + pos
reader.AdvanceAndSetPadding(child, padding)
return node, HasChildren
}
func (b *listItemParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
line, _ := reader.PeekLine()
if util.IsBlank(line) {
return Continue | HasChildren
}
indent, _ := util.IndentWidth(line, reader.LineOffset())
offset := lastOffset(node.Parent())
if indent < offset && indent < 4 {
_, typ := matchesListItem(line, true)
// new list item found
if typ != notList {
pc.Set(skipListParser, skipListParserValue)
}
return Close
}
pos, padding := util.IndentPosition(line, reader.LineOffset(), offset)
reader.AdvanceAndSetPadding(pos, padding)
return Continue | HasChildren
}
func (b *listItemParser) Close(node ast.Node, pc Context) {
// nothing to do
}
func (b *listItemParser) CanInterruptParagraph() bool {
return true
}
func (b *listItemParser) CanAcceptIndentedLine() bool {
return false
}

62
parser/paragraph.go Normal file
View file

@ -0,0 +1,62 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
)
type paragraphParser struct {
}
var defaultParagraphParser = &paragraphParser{}
// NewParagraphParser returns a new BlockParser that
// parses paragraphs.
func NewParagraphParser() BlockParser {
return defaultParagraphParser
}
func (b *paragraphParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
_, segment := reader.PeekLine()
segment = segment.TrimLeftSpace(reader.Source())
if segment.IsEmpty() {
return nil, NoChildren
}
node := ast.NewParagraph()
node.Lines().Append(segment)
reader.Advance(segment.Len() - 1)
return node, NoChildren
}
func (b *paragraphParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
_, segment := reader.PeekLine()
segment = segment.TrimLeftSpace(reader.Source())
if segment.IsEmpty() {
return Close
}
node.Lines().Append(segment)
reader.Advance(segment.Len() - 1)
return Continue | NoChildren
}
func (b *paragraphParser) Close(node ast.Node, pc Context) {
lines := node.Lines()
if lines.Len() != 0 {
// trim trailing spaces
length := lines.Len()
lastLine := node.Lines().At(length - 1)
node.Lines().Set(length-1, lastLine.TrimRightSpace(pc.Source()))
}
if lines.Len() == 0 {
node.Parent().RemoveChild(node.Parent(), node)
return
}
}
func (b *paragraphParser) CanInterruptParagraph() bool {
return false
}
func (b *paragraphParser) CanAcceptIndentedLine() bool {
return false
}

987
parser/parser.go Normal file
View file

@ -0,0 +1,987 @@
// Package parser contains stuff that are related to parsing a Markdown text.
package parser
import (
"fmt"
"strings"
"sync"
"sync/atomic"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
// A Reference interface represents a link reference in Markdown text.
type Reference interface {
// String implements Stringer.
String() string
// Label returns a label of the reference.
Label() []byte
// Destination returns a destination(URL) of the reference.
Destination() []byte
// Title returns a title of the reference.
Title() []byte
}
type reference struct {
label []byte
destination []byte
title []byte
}
// NewReference returns a new Reference.
func NewReference(label, destination, title []byte) Reference {
return &reference{label, destination, title}
}
func (r *reference) Label() []byte {
return r.label
}
func (r *reference) Destination() []byte {
return r.destination
}
func (r *reference) Title() []byte {
return r.title
}
func (r *reference) String() string {
return fmt.Sprintf("Reference{Label:%s, Destination:%s, Title:%s}", r.label, r.destination, r.title)
}
// ContextKey is a key that is used to set arbitary values to the context.
type ContextKey int32
// New returns a new ContextKey value.
func (c *ContextKey) New() ContextKey {
return ContextKey(atomic.AddInt32((*int32)(c), 1))
}
var contextKey ContextKey
// NewContextKey return a new ContextKey value.
func NewContextKey() ContextKey {
return contextKey.New()
}
// A Context interface holds a information that are necessary to parse
// Markdown text.
type Context interface {
// String implements Stringer.
String() string
// Source returns a source of Markdown text.
Source() []byte
// Get returns a value associated with given key.
Get(ContextKey) interface{}
// Set sets given value to the context.
Set(ContextKey, interface{})
// AddReference adds given reference to this context.
AddReference(Reference)
// Reference returns (a reference, true) if a reference associated with
// given label exists, otherwise (nil, false).
Reference(label string) (Reference, bool)
// References returns a list of references.
References() []Reference
// BlockOffset returns a first non-space character position on current line.
// This value is valid only for BlockParser.Open.
BlockOffset() int
// BlockOffset sets a first non-space character position on current line.
// This value is valid only for BlockParser.Open.
SetBlockOffset(int)
// FirstDelimiter returns a first delimiter of the current delimiter list.
FirstDelimiter() *Delimiter
// LastDelimiter returns a last delimiter of the current delimiter list.
LastDelimiter() *Delimiter
// PushDelimiter appends given delimiter to the tail of the current
// delimiter list.
PushDelimiter(delimiter *Delimiter)
// RemoveDelimiter removes given delimiter from the current delimiter list.
RemoveDelimiter(d *Delimiter)
// ClearDelimiters clears the current delimiter list.
ClearDelimiters(bottom ast.Node)
// OpenedBlocks returns a list of nodes that are currently in parsing.
OpenedBlocks() []Block
// SetOpenedBlocks sets a list of nodes that are currently in parsing.
SetOpenedBlocks([]Block)
// LastOpenedBlock returns a last node that is currently in parsing.
LastOpenedBlock() Block
// SetLastOpenedBlock sets a last node that is currently in parsing.
SetLastOpenedBlock(Block)
}
// A Result interface holds a result of parsing Markdown text.
type Result interface {
// Reference returns (a reference, true) if a reference associated with
// given label exists, otherwise (nil, false).
Reference(label string) (Reference, bool)
}
type parseContext struct {
store []interface{}
source []byte
refs map[string]Reference
blockOffset int
delimiters *Delimiter
lastDelimiter *Delimiter
openedBlocks []Block
lastOpenedBlock Block
}
func newContext(source []byte) Context {
return &parseContext{
store: make([]interface{}, contextKey+1),
source: source,
refs: map[string]Reference{},
blockOffset: 0,
delimiters: nil,
lastDelimiter: nil,
openedBlocks: []Block{},
lastOpenedBlock: Block{},
}
}
func (p *parseContext) Get(key ContextKey) interface{} {
return p.store[key]
}
func (p *parseContext) Set(key ContextKey, value interface{}) {
p.store[key] = value
}
func (p *parseContext) BlockOffset() int {
return p.blockOffset
}
func (p *parseContext) SetBlockOffset(v int) {
p.blockOffset = v
}
func (p *parseContext) Source() []byte {
return p.source
}
func (p *parseContext) LastDelimiter() *Delimiter {
return p.lastDelimiter
}
func (p *parseContext) FirstDelimiter() *Delimiter {
return p.delimiters
}
func (p *parseContext) PushDelimiter(d *Delimiter) {
if p.delimiters == nil {
p.delimiters = d
p.lastDelimiter = d
} else {
l := p.lastDelimiter
p.lastDelimiter = d
l.NextDelimiter = d
d.PreviousDelimiter = l
}
}
func (p *parseContext) RemoveDelimiter(d *Delimiter) {
if d.PreviousDelimiter == nil {
p.delimiters = d.NextDelimiter
} else {
d.PreviousDelimiter.NextDelimiter = d.NextDelimiter
if d.NextDelimiter != nil {
d.NextDelimiter.PreviousDelimiter = d.PreviousDelimiter
}
}
if d.NextDelimiter == nil {
p.lastDelimiter = d.PreviousDelimiter
}
if p.delimiters != nil {
p.delimiters.PreviousDelimiter = nil
}
if p.lastDelimiter != nil {
p.lastDelimiter.NextDelimiter = nil
}
d.NextDelimiter = nil
d.PreviousDelimiter = nil
if d.Length != 0 {
ast.MergeOrReplaceTextSegment(d.Parent(), d, d.Segment)
} else {
d.Parent().RemoveChild(d.Parent(), d)
}
}
func (p *parseContext) ClearDelimiters(bottom ast.Node) {
if p.lastDelimiter == nil {
return
}
var c ast.Node
for c = p.lastDelimiter; c != nil && c != bottom; {
prev := c.PreviousSibling()
if d, ok := c.(*Delimiter); ok {
p.RemoveDelimiter(d)
}
c = prev
}
}
func (p *parseContext) AddReference(ref Reference) {
key := util.ToLinkReference(ref.Label())
if _, ok := p.refs[key]; !ok {
p.refs[key] = ref
}
}
func (p *parseContext) Reference(label string) (Reference, bool) {
v, ok := p.refs[label]
return v, ok
}
func (p *parseContext) References() []Reference {
ret := make([]Reference, 0, len(p.refs))
for _, v := range p.refs {
ret = append(ret, v)
}
return ret
}
func (p *parseContext) String() string {
refs := []string{}
for _, r := range p.refs {
refs = append(refs, r.String())
}
return fmt.Sprintf("Context{Store:%#v, Refs:%s}", p.store, strings.Join(refs, ","))
}
func (p *parseContext) OpenedBlocks() []Block {
return p.openedBlocks
}
func (p *parseContext) SetOpenedBlocks(v []Block) {
p.openedBlocks = v
}
func (p *parseContext) LastOpenedBlock() Block {
return p.lastOpenedBlock
}
func (p *parseContext) SetLastOpenedBlock(v Block) {
p.lastOpenedBlock = v
}
// State represents parser's state.
// State is designed to use as a bit flag.
type State int
const (
none State = 1 << iota
// Continue indicates parser can continue parsing.
Continue
// Close indicates parser cannot parse anymore.
Close
// HasChildren indicates parser may have child blocks.
HasChildren
// NoChildren indicates parser does not have child blocks.
NoChildren
)
// A Config struct is a data structure that holds configuration of the Parser.
type Config struct {
Options map[OptionName]interface{}
BlockParsers util.PrioritizedSlice /*<BlockParser>*/
InlineParsers util.PrioritizedSlice /*<InlineParser>*/
ParagraphTransformers util.PrioritizedSlice /*<ParagraphTransformer>*/
ASTTransformers util.PrioritizedSlice /*<ASTTransformer>*/
}
// NewConfig returns a new Config.
func NewConfig() *Config {
return &Config{
Options: map[OptionName]interface{}{},
BlockParsers: util.PrioritizedSlice{},
InlineParsers: util.PrioritizedSlice{},
ParagraphTransformers: util.PrioritizedSlice{},
ASTTransformers: util.PrioritizedSlice{},
}
}
// An Option interface is a functional option type for the Parser.
type Option interface {
SetConfig(*Config)
}
// OptionName is a name of parser options.
type OptionName string
// A Parser interface parses Markdown text into AST nodes.
type Parser interface {
// Parse parses given Markdown text into AST nodes.
Parse(reader text.Reader) (ast.Node, Result)
// AddOption adds given option to thie parser.
AddOption(Option)
}
// A SetOptioner interface sets given option to the object.
type SetOptioner interface {
// SetOption sets given option to the object.
// Unacceptable options may be passed.
// Thus implementations must ignore unacceptable options.
SetOption(name OptionName, value interface{})
}
// A BlockParser interface parses a block level element like Paragraph, List,
// Blockquote etc.
type BlockParser interface {
// Open parses the current line and returns a result of parsing.
//
// Open must not parse beyond the current line.
// If Open has been able to parse the current line, Open must advance a reader
// position by consumed byte length.
//
// If Open has not been able to parse the current line, Open should returns
// (nil, NoChildren). If Open has been able to parse the current line, Open
// should returns a new Block node and returns HasChildren or NoChildren.
Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State)
// Continue parses the current line and returns a result of parsing.
//
// Continue must not parse beyond the current line.
// If Continue has been able to parse the current line, Continue must advance
// a reader position by consumed byte length.
//
// If Continue has not been able to parse the current line, Continue should
// returns Close. If Continue has been able to parse the current line,
// Continue should returns (Continue | NoChildren) or
// (Continue | HasChildren)
Continue(node ast.Node, reader text.Reader, pc Context) State
// Close will be called when the parser returns Close.
Close(node ast.Node, pc Context)
// CanInterruptParagraph returns true if the parser can interrupt pargraphs,
// otherwise false.
CanInterruptParagraph() bool
// CanAcceptIndentedLine returns true if the parser can open new node when
// given line is being indented more than 3 spaces.
CanAcceptIndentedLine() bool
}
// An InlineParser interface parses an inline level element like CodeSpan, Link etc.
type InlineParser interface {
// Trigger returns a list of characters that triggers Parse method of
// this parser.
// Trigger characters must be a punctuation or a halfspace.
// Halfspaces triggers this parser when character is any spaces characters or
// a head of line
Trigger() []byte
// Parse parse given block into an inline node.
//
// Parse can parse beyond the current line.
// If Parse has been able to parse the current line, it must advance a reader
// position by consumed byte length.
Parse(parent ast.Node, block text.Reader, pc Context) ast.Node
// CloseBlock will be called when a block is closed.
CloseBlock(parent ast.Node, pc Context)
}
// A ParagraphTransformer transforms parsed Paragraph nodes.
// For example, link references are searched in parsed Paragraphs.
type ParagraphTransformer interface {
// Transform transforms given paragraph.
Transform(node *ast.Paragraph, pc Context)
}
// ASTTransformer transforms entire Markdown document AST tree.
type ASTTransformer interface {
// Transform transforms given AST tree.
Transform(node *ast.Document, pc Context)
}
// DefaultBlockParsers returns a new list of default BlockParsers.
// Priorities of default BlockParsers are:
//
// SetextHeadingParser, 100
// ThemanticBreakParser, 200
// ListParser, 300
// ListItemParser, 400
// CodeBlockParser, 500
// ATXHeadingParser, 600
// FencedCodeBlockParser, 700
// BlockquoteParser, 800
// HTMLBlockParser, 900
// ParagraphParser, 1000
func DefaultBlockParsers() []util.PrioritizedValue {
return []util.PrioritizedValue{
util.Prioritized(NewSetextHeadingParser(), 100),
util.Prioritized(NewThemanticBreakParser(), 200),
util.Prioritized(NewListParser(), 300),
util.Prioritized(NewListItemParser(), 400),
util.Prioritized(NewCodeBlockParser(), 500),
util.Prioritized(NewATXHeadingParser(), 600),
util.Prioritized(NewFencedCodeBlockParser(), 700),
util.Prioritized(NewBlockquoteParser(), 800),
util.Prioritized(NewHTMLBlockParser(), 900),
util.Prioritized(NewParagraphParser(), 1000),
}
}
// DefaultInlineParsers returns a new list of default InlineParsers.
// Priorities of default InlineParsers are:
//
// CodeSpanParser, 100
// LinkParser, 200
// AutoLinkParser, 300
// RawHTMLParser, 400
// EmphasisParser, 500
func DefaultInlineParsers() []util.PrioritizedValue {
return []util.PrioritizedValue{
util.Prioritized(NewCodeSpanParser(), 100),
util.Prioritized(NewLinkParser(), 200),
util.Prioritized(NewAutoLinkParser(), 300),
util.Prioritized(NewRawHTMLParser(), 400),
util.Prioritized(NewEmphasisParser(), 500),
}
}
// DefaultParagraphTransformers returns a new list of default ParagraphTransformers.
// Priorities of default ParagraphTransformers are:
//
// LinkReferenceParagraphTransformer, 100
func DefaultParagraphTransformers() []util.PrioritizedValue {
return []util.PrioritizedValue{
util.Prioritized(LinkReferenceParagraphTransformer, 100),
}
}
// A Block struct holds a node and correspond parser pair.
type Block struct {
// Node is a BlockNode.
Node ast.Node
// Parser is a BlockParser.
Parser BlockParser
}
type parser struct {
options map[OptionName]interface{}
blockParsers []BlockParser
inlineParsers [256][]InlineParser
inlineParsersList []InlineParser
paragraphTransformers []ParagraphTransformer
astTransformers []ASTTransformer
config *Config
initSync sync.Once
}
type withBlockParsers struct {
value []util.PrioritizedValue
}
func (o *withBlockParsers) SetConfig(c *Config) {
c.BlockParsers = append(c.BlockParsers, o.value...)
}
// WithBlockParsers is a functional option that allow you to add
// BlockParsers to the parser.
func WithBlockParsers(bs ...util.PrioritizedValue) Option {
return &withBlockParsers{bs}
}
type withInlineParsers struct {
value []util.PrioritizedValue
}
func (o *withInlineParsers) SetConfig(c *Config) {
c.InlineParsers = append(c.InlineParsers, o.value...)
}
// WithInlineParsers is a functional option that allow you to add
// InlineParsers to the parser.
func WithInlineParsers(bs ...util.PrioritizedValue) Option {
return &withInlineParsers{bs}
}
type withParagraphTransformers struct {
value []util.PrioritizedValue
}
func (o *withParagraphTransformers) SetConfig(c *Config) {
c.ParagraphTransformers = append(c.ParagraphTransformers, o.value...)
}
// WithParagraphTransformers is a functional option that allow you to add
// ParagraphTransformers to the parser.
func WithParagraphTransformers(ps ...util.PrioritizedValue) Option {
return &withParagraphTransformers{ps}
}
type withASTTransformers struct {
value []util.PrioritizedValue
}
func (o *withASTTransformers) SetConfig(c *Config) {
c.ASTTransformers = append(c.ASTTransformers, o.value...)
}
// WithASTTransformers is a functional option that allow you to add
// ASTTransformers to the parser.
func WithASTTransformers(ps ...util.PrioritizedValue) Option {
return &withASTTransformers{ps}
}
type withOption struct {
name OptionName
value interface{}
}
func (o *withOption) SetConfig(c *Config) {
c.Options[o.name] = o.value
}
// WithOption is a functional option that allow you to set
// an arbitary option to the parser.
func WithOption(name OptionName, value interface{}) Option {
return &withOption{name, value}
}
// NewParser returns a new Parser with given options.
func NewParser(options ...Option) Parser {
config := NewConfig()
for _, opt := range options {
opt.SetConfig(config)
}
p := &parser{
options: map[OptionName]interface{}{},
config: config,
}
return p
}
func (p *parser) AddOption(o Option) {
o.SetConfig(p.config)
}
func (p *parser) addBlockParser(v util.PrioritizedValue, options map[OptionName]interface{}) {
bp, ok := v.Value.(BlockParser)
if !ok {
panic(fmt.Sprintf("%v is not a BlockParser", v.Value))
}
so, ok := v.Value.(SetOptioner)
if ok {
for oname, ovalue := range options {
so.SetOption(oname, ovalue)
}
}
p.blockParsers = append(p.blockParsers, bp)
}
func (p *parser) addInlineParser(v util.PrioritizedValue, options map[OptionName]interface{}) {
ip, ok := v.Value.(InlineParser)
if !ok {
panic(fmt.Sprintf("%v is not a InlineParser", v.Value))
}
tcs := ip.Trigger()
so, ok := v.Value.(SetOptioner)
if ok {
for oname, ovalue := range options {
so.SetOption(oname, ovalue)
}
}
p.inlineParsersList = append(p.inlineParsersList, ip)
for _, tc := range tcs {
if p.inlineParsers[tc] == nil {
p.inlineParsers[tc] = []InlineParser{}
}
p.inlineParsers[tc] = append(p.inlineParsers[tc], ip)
}
}
func (p *parser) addParagraphTransformer(v util.PrioritizedValue, options map[OptionName]interface{}) {
pt, ok := v.Value.(ParagraphTransformer)
if !ok {
panic(fmt.Sprintf("%v is not a ParagraphTransformer", v.Value))
}
so, ok := v.Value.(SetOptioner)
if ok {
for oname, ovalue := range options {
so.SetOption(oname, ovalue)
}
}
p.paragraphTransformers = append(p.paragraphTransformers, pt)
}
func (p *parser) addASTTransformer(v util.PrioritizedValue, options map[OptionName]interface{}) {
at, ok := v.Value.(ASTTransformer)
if !ok {
panic(fmt.Sprintf("%v is not a ASTTransformer", v.Value))
}
so, ok := v.Value.(SetOptioner)
if ok {
for oname, ovalue := range options {
so.SetOption(oname, ovalue)
}
}
p.astTransformers = append(p.astTransformers, at)
}
func (p *parser) Parse(reader text.Reader) (ast.Node, Result) {
p.initSync.Do(func() {
p.config.BlockParsers.Sort()
for _, v := range p.config.BlockParsers {
p.addBlockParser(v, p.config.Options)
}
p.config.InlineParsers.Sort()
for _, v := range p.config.InlineParsers {
p.addInlineParser(v, p.config.Options)
}
p.config.ParagraphTransformers.Sort()
for _, v := range p.config.ParagraphTransformers {
p.addParagraphTransformer(v, p.config.Options)
}
p.config.ASTTransformers.Sort()
for _, v := range p.config.ASTTransformers {
p.addASTTransformer(v, p.config.Options)
}
p.config = nil
})
root := ast.NewDocument()
pc := newContext(reader.Source())
p.parseBlocks(root, reader, pc)
blockReader := text.NewBlockReader(reader.Source(), nil)
p.walkBlock(root, func(node ast.Node) {
p.parseBlock(blockReader, node, pc)
})
for _, at := range p.astTransformers {
at.Transform(root, pc)
}
//root.Dump(reader.Source(), 0)
return root, pc
}
func (p *parser) transformParagraph(node *ast.Paragraph, pc Context) {
for _, pt := range p.paragraphTransformers {
pt.Transform(node, pc)
if node.Parent() == nil {
break
}
}
}
func (p *parser) closeBlocks(from, to int, pc Context) {
blocks := pc.OpenedBlocks()
last := pc.LastOpenedBlock()
for i := from; i >= to; i-- {
node := blocks[i].Node
if node.Parent() != nil {
blocks[i].Parser.Close(blocks[i].Node, pc)
paragraph, ok := node.(*ast.Paragraph)
if ok && node.Parent() != nil {
p.transformParagraph(paragraph, pc)
}
}
}
if from == len(blocks)-1 {
blocks = blocks[0:to]
} else {
blocks = append(blocks[0:to], blocks[from+1:]...)
}
l := len(blocks)
if l == 0 {
last.Node = nil
} else {
last = blocks[l-1]
}
pc.SetOpenedBlocks(blocks)
pc.SetLastOpenedBlock(last)
}
type blockOpenResult int
const (
paragraphContinuation blockOpenResult = iota + 1
newBlocksOpened
noBlocksOpened
)
func (p *parser) openBlocks(parent ast.Node, blankLine bool, reader text.Reader, pc Context) blockOpenResult {
result := blockOpenResult(noBlocksOpened)
continuable := false
lastBlock := pc.LastOpenedBlock()
if lastBlock.Node != nil {
continuable = ast.IsParagraph(lastBlock.Node)
}
retry:
shouldPeek := true
var currentLineNum int
var w int
var pos int
var line []byte
for _, bp := range p.blockParsers {
if shouldPeek {
currentLineNum, _ = reader.Position()
line, _ = reader.PeekLine()
w, pos = util.IndentWidth(line, 0)
pc.SetBlockOffset(pos)
shouldPeek = false
if line == nil || line[0] == '\n' {
break
}
}
if continuable && result == noBlocksOpened && !bp.CanInterruptParagraph() {
continue
}
if w > 3 && !bp.CanAcceptIndentedLine() {
continue
}
last := pc.LastOpenedBlock().Node
node, state := bp.Open(parent, reader, pc)
if l, _ := reader.Position(); l != currentLineNum {
panic("BlockParser.Open must not advance position beyond the current line")
}
if node != nil {
shouldPeek = true
node.SetBlankPreviousLines(blankLine)
if last != nil && last.Parent() == nil {
lastPos := len(pc.OpenedBlocks()) - 1
p.closeBlocks(lastPos, lastPos, pc)
}
parent.AppendChild(parent, node)
result = newBlocksOpened
be := Block{node, bp}
pc.SetOpenedBlocks(append(pc.OpenedBlocks(), be))
pc.SetLastOpenedBlock(be)
if state == HasChildren {
parent = node
goto retry // try child block
}
break // no children, can not open more blocks on this line
}
}
if result == noBlocksOpened && continuable {
state := lastBlock.Parser.Continue(lastBlock.Node, reader, pc)
if state&Continue != 0 {
result = paragraphContinuation
}
}
return result
}
type lineStat struct {
lineNum int
level int
isBlank bool
}
func isBlankLine(lineNum, level int, stats []lineStat) ([]lineStat, bool) {
ret := false
for i := len(stats) - 1 - level; i >= 0; i-- {
s := stats[i]
if s.lineNum == lineNum && s.level == level {
ret = s.isBlank
continue
}
if s.lineNum < lineNum {
return stats[i:], ret
}
}
return stats[0:0], ret
}
func (p *parser) parseBlocks(parent ast.Node, reader text.Reader, pc Context) {
pc.SetLastOpenedBlock(Block{})
pc.SetOpenedBlocks([]Block{})
blankLines := make([]lineStat, 0, 64)
isBlank := false
for { // process blocks separated by blank lines
_, lines, ok := reader.SkipBlankLines()
if !ok {
return
}
// first, we try to open blocks
if p.openBlocks(parent, lines != 0, reader, pc) != newBlocksOpened {
return
}
lineNum, _ := reader.Position()
for i := 0; i < len(pc.OpenedBlocks()); i++ {
blankLines = append(blankLines, lineStat{lineNum - 1, i, lines != 0})
}
reader.AdvanceLine()
for len(pc.OpenedBlocks()) != 0 { // process opened blocks line by line
lastIndex := len(pc.OpenedBlocks()) - 1
for i := 0; i < len(pc.OpenedBlocks()); i++ {
be := pc.OpenedBlocks()[i]
line, _ := reader.PeekLine()
if line == nil {
p.closeBlocks(lastIndex, 0, pc)
reader.AdvanceLine()
return
}
lineNum, _ := reader.Position()
blankLines = append(blankLines, lineStat{lineNum, i, util.IsBlank(line)})
// If node is a paragraph, p.openBlocks determines whether it is continuable.
// So we do not process paragraphs here.
if !ast.IsParagraph(be.Node) {
state := be.Parser.Continue(be.Node, reader, pc)
if state&Continue != 0 {
// When current node is a container block and has no children,
// we try to open new child nodes
if state&HasChildren != 0 && i == lastIndex {
blankLines, isBlank = isBlankLine(lineNum-1, i, blankLines)
p.openBlocks(be.Node, isBlank, reader, pc)
break
}
continue
}
}
// current node may be closed or lazy continuation
blankLines, isBlank = isBlankLine(lineNum-1, i, blankLines)
thisParent := parent
if i != 0 {
thisParent = pc.OpenedBlocks()[i-1].Node
}
result := p.openBlocks(thisParent, isBlank, reader, pc)
if result != paragraphContinuation {
p.closeBlocks(lastIndex, i, pc)
}
break
}
reader.AdvanceLine()
}
}
}
func (p *parser) walkBlock(block ast.Node, cb func(node ast.Node)) {
for c := block.FirstChild(); c != nil; c = c.NextSibling() {
p.walkBlock(c, cb)
}
cb(block)
}
func (p *parser) parseBlock(block text.BlockReader, parent ast.Node, pc Context) {
if parent.IsRaw() {
return
}
escaped := false
source := block.Source()
block.Reset(parent.Lines())
for {
retry:
line, _ := block.PeekLine()
if line == nil {
break
}
lineLength := len(line)
l, startPosition := block.Position()
n := 0
softLinebreak := false
for i := 0; i < lineLength; i++ {
c := line[i]
if c == '\n' {
softLinebreak = true
break
}
isSpace := util.IsSpace(c)
isPunct := util.IsPunct(c)
if (isPunct && !escaped) || isSpace || i == 0 {
parserChar := c
if isSpace || (i == 0 && !isPunct) {
parserChar = ' '
}
ips := p.inlineParsers[parserChar]
if ips != nil {
block.Advance(n)
n = 0
savedLine, savedPosition := block.Position()
if i != 0 {
_, currentPosition := block.Position()
ast.MergeOrAppendTextSegment(parent, startPosition.Between(currentPosition))
_, startPosition = block.Position()
}
var inlineNode ast.Node
for _, ip := range ips {
inlineNode = ip.Parse(parent, block, pc)
if inlineNode != nil {
break
}
block.SetPosition(savedLine, savedPosition)
}
if inlineNode != nil {
parent.AppendChild(parent, inlineNode)
goto retry
}
}
}
if escaped {
escaped = false
n++
continue
}
if c == '\\' {
escaped = true
n++
continue
}
escaped = false
n++
}
if n != 0 {
block.Advance(n)
}
currentL, currentPosition := block.Position()
if l != currentL {
continue
}
diff := startPosition.Between(currentPosition)
stop := diff.Stop
hardlineBreak := false
if lineLength > 2 && line[lineLength-2] == '\\' && softLinebreak { // ends with \\n
stop--
hardlineBreak = true
} else if lineLength > 3 && line[lineLength-3] == ' ' && line[lineLength-2] == ' ' && softLinebreak { // ends with [space][space]\n
hardlineBreak = true
}
rest := diff.WithStop(stop)
text := ast.NewTextSegment(rest.TrimRightSpace(source))
text.SetSoftLineBreak(softLinebreak)
text.SetHardLineBreak(hardlineBreak)
parent.AppendChild(parent, text)
block.AdvanceLine()
}
ProcessDelimiters(nil, pc)
for _, ip := range p.inlineParsersList {
ip.CloseBlock(parent, pc)
}
}

126
parser/raw_html.go Normal file
View file

@ -0,0 +1,126 @@
package parser
import (
"bytes"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
"regexp"
)
type rawHTMLParser struct {
HTMLConfig
}
// NewRawHTMLParser return a new InlineParser that can parse
// inline htmls
func NewRawHTMLParser(opts ...HTMLOption) InlineParser {
p := &rawHTMLParser{}
for _, o := range opts {
o.SetHTMLOption(&p.HTMLConfig)
}
return p
}
func (s *rawHTMLParser) Trigger() []byte {
return []byte{'<'}
}
func (s *rawHTMLParser) Parse(parent ast.Node, block text.Reader, pc Context) ast.Node {
line, _ := block.PeekLine()
if len(line) > 1 && util.IsAlphaNumeric(line[1]) {
return s.parseMultiLineRegexp(openTagRegexp, block, pc)
}
if len(line) > 2 && line[1] == '/' && util.IsAlphaNumeric(line[2]) {
return s.parseMultiLineRegexp(closeTagRegexp, block, pc)
}
if bytes.HasPrefix(line, []byte("<!--")) {
return s.parseMultiLineRegexp(commentRegexp, block, pc)
}
if bytes.HasPrefix(line, []byte("<?")) {
return s.parseSingleLineRegexp(processingInstructionRegexp, block, pc)
}
if len(line) > 2 && line[1] == '!' && line[2] >= 'A' && line[2] <= 'Z' {
return s.parseSingleLineRegexp(declRegexp, block, pc)
}
if bytes.HasPrefix(line, []byte("<![CDATA[")) {
return s.parseMultiLineRegexp(cdataRegexp, block, pc)
}
return nil
}
var tagnamePattern = `([A-Za-z][A-Za-z0-9-]*)`
var attributePattern = `(?:\s+[a-zA-Z_:][a-zA-Z0-9:._-]*(?:\s*=\s*(?:[^\"'=<>` + "`" + `\x00-\x20]+|'[^']*'|"[^"]*"))?)`
var openTagRegexp = regexp.MustCompile("^<" + tagnamePattern + attributePattern + `*\s*/?>`)
var closeTagRegexp = regexp.MustCompile("^</" + tagnamePattern + `\s*>`)
var commentRegexp = regexp.MustCompile(`^<!---->|<!--(?:-?[^>-])(?:-?[^-])*-->`)
var processingInstructionRegexp = regexp.MustCompile(`^(?:<\?).*?(?:\?>)`)
var declRegexp = regexp.MustCompile(`^<![A-Z]+\s+[^>]*>`)
var cdataRegexp = regexp.MustCompile(`<!\[CDATA\[[\s\S]*?\]\]>`)
func (s *rawHTMLParser) parseSingleLineRegexp(reg *regexp.Regexp, block text.Reader, pc Context) ast.Node {
line, segment := block.PeekLine()
match := reg.FindSubmatchIndex(line)
if match == nil {
return nil
}
node := ast.NewRawHTML()
node.AppendChild(node, ast.NewRawTextSegment(segment.WithStop(segment.Start+match[1])))
block.Advance(match[1])
return node
}
var dummyMatch = [][]byte{}
func (s *rawHTMLParser) parseMultiLineRegexp(reg *regexp.Regexp, block text.Reader, pc Context) ast.Node {
sline, ssegment := block.Position()
var m [][]byte
if s.FilterTags != nil {
m = block.FindSubMatch(reg)
} else {
if block.Match(reg) {
m = dummyMatch
}
}
if m != nil {
if s.FilterTags != nil {
tagName := string(m[1])
if _, ok := s.FilterTags[tagName]; ok {
return nil
}
}
node := ast.NewRawHTML()
eline, esegment := block.Position()
block.SetPosition(sline, ssegment)
for {
line, segment := block.PeekLine()
if line == nil {
break
}
l, _ := block.Position()
start := segment.Start
if l == sline {
start = ssegment.Start
}
end := segment.Stop
if l == eline {
end = esegment.Start
}
node.AppendChild(node, ast.NewRawTextSegment(text.NewSegment(start, end)))
if l == eline {
block.Advance(end - start)
break
} else {
block.AdvanceLine()
}
}
return node
}
return nil
}
func (s *rawHTMLParser) CloseBlock(parent ast.Node, pc Context) {
// nothing to do
}

109
parser/setext_headings.go Normal file
View file

@ -0,0 +1,109 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
var temporaryParagraphKey = NewContextKey()
type setextHeadingParser struct {
HeadingConfig
}
func matchesSetextHeadingBar(line []byte) (byte, bool) {
start := 0
end := len(line)
space := util.TrimLeftLength(line, []byte{' '})
if space > 3 {
return 0, false
}
start += space
level1 := util.TrimLeftLength(line[start:end], []byte{'='})
c := byte('=')
var level2 int
if level1 == 0 {
level2 = util.TrimLeftLength(line[start:end], []byte{'-'})
c = '-'
}
end -= util.TrimRightSpaceLength(line[start:end])
if !((level1 > 0 && start+level1 == end) || (level2 > 0 && start+level2 == end)) {
return 0, false
}
return c, true
}
// NewSetextHeadingParser return a new BlockParser that can parse Setext headings.
func NewSetextHeadingParser(opts ...HeadingOption) BlockParser {
p := &setextHeadingParser{}
for _, o := range opts {
o.SetHeadingOption(&p.HeadingConfig)
}
return p
}
func (b *setextHeadingParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
last := pc.LastOpenedBlock().Node
if last == nil {
return nil, NoChildren
}
paragraph, ok := last.(*ast.Paragraph)
if !ok || paragraph.Parent() != parent {
return nil, NoChildren
}
line, segment := reader.PeekLine()
c, ok := matchesSetextHeadingBar(line)
if !ok {
return nil, NoChildren
}
level := 1
if c == '-' {
level = 2
}
node := ast.NewHeading(level)
node.Lines().Append(segment)
pc.Set(temporaryParagraphKey, paragraph)
return node, NoChildren
}
func (b *setextHeadingParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
return Close
}
func (b *setextHeadingParser) Close(node ast.Node, pc Context) {
heading := node.(*ast.Heading)
segment := node.Lines().At(0)
heading.Lines().Clear()
tmp := pc.Get(temporaryParagraphKey).(*ast.Paragraph)
pc.Set(temporaryParagraphKey, nil)
if tmp.Lines().Len() == 0 {
next := heading.NextSibling()
segment = segment.TrimLeftSpace(pc.Source())
if next == nil || !ast.IsParagraph(next) {
para := ast.NewParagraph()
para.Lines().Append(segment)
heading.Parent().InsertAfter(heading.Parent(), heading, para)
} else {
next.(ast.Node).Lines().Unshift(segment)
}
heading.Parent().RemoveChild(heading.Parent(), heading)
} else {
heading.SetLines(tmp.Lines())
heading.SetBlankPreviousLines(tmp.HasBlankPreviousLines())
tmp.Parent().RemoveChild(tmp.Parent(), tmp)
}
if !b.HeadingID {
return
}
parseOrGenerateHeadingID(heading, pc)
}
func (b *setextHeadingParser) CanInterruptParagraph() bool {
return true
}
func (b *setextHeadingParser) CanAcceptIndentedLine() bool {
return false
}

71
parser/themantic_break.go Normal file
View file

@ -0,0 +1,71 @@
package parser
import (
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)
type themanticBreakParser struct {
}
var defaultThemanticBreakParser = &themanticBreakParser{}
// NewThemanticBreakParser returns a new BlockParser that
// parses themantic breaks.
func NewThemanticBreakParser() BlockParser {
return defaultThemanticBreakParser
}
func isThemanticBreak(line []byte) bool {
w, pos := util.IndentWidth(line, 0)
if w > 3 {
return false
}
mark := byte(0)
count := 0
for i := pos; i < len(line); i++ {
c := line[i]
if util.IsSpace(c) {
continue
}
if mark == 0 {
mark = c
count = 1
if mark == '*' || mark == '-' || mark == '_' {
continue
}
return false
}
if c != mark {
return false
}
count++
}
return count > 2
}
func (b *themanticBreakParser) Open(parent ast.Node, reader text.Reader, pc Context) (ast.Node, State) {
line, segment := reader.PeekLine()
if isThemanticBreak(line) {
reader.Advance(segment.Len() - 1)
return ast.NewThemanticBreak(), NoChildren
}
return nil, NoChildren
}
func (b *themanticBreakParser) Continue(node ast.Node, reader text.Reader, pc Context) State {
return Close
}
func (b *themanticBreakParser) Close(node ast.Node, pc Context) {
// nothing to do
}
func (b *themanticBreakParser) CanInterruptParagraph() bool {
return true
}
func (b *themanticBreakParser) CanAcceptIndentedLine() bool {
return false
}

588
renderer/html/html.go Normal file
View file

@ -0,0 +1,588 @@
package html
import (
"bytes"
"fmt"
"strconv"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/util"
)
// A Config struct has configurations for the HTML based renderers.
type Config struct {
Writer Writer
SoftLineBreak bool
XHTML bool
}
// NewConfig returns a new Config with defaults.
func NewConfig() Config {
return Config{
Writer: DefaultWriter,
SoftLineBreak: false,
XHTML: false,
}
}
// SetOption implements renderer.NodeRenderer.SetOption.
func (c *Config) SetOption(name renderer.OptionName, value interface{}) {
switch name {
case SoftLineBreak:
c.SoftLineBreak = value.(bool)
case XHTML:
c.XHTML = value.(bool)
case TextWriter:
c.Writer = value.(Writer)
}
}
// An Option interface sets options for HTML based renderers.
type Option interface {
SetHTMLOption(*Config)
}
// TextWriter is an option name used in WithWriter.
const TextWriter renderer.OptionName = "Writer"
type withWriter struct {
value Writer
}
func (o *withWriter) SetConfig(c *renderer.Config) {
c.Options[TextWriter] = o.value
}
func (o *withWriter) SetHTMLOption(c *Config) {
c.Writer = o.value
}
// WithWriter is a functional option that allow you to set given writer to
// the renderer.
func WithWriter(writer Writer) interface {
renderer.Option
Option
} {
return &withWriter{writer}
}
// SoftLineBreak is an option name used in WithSoftLineBreak.
const SoftLineBreak renderer.OptionName = "SoftLineBreak"
type withSoftLineBreak struct {
}
func (o *withSoftLineBreak) SetConfig(c *renderer.Config) {
c.Options[SoftLineBreak] = true
}
func (o *withSoftLineBreak) SetHTMLOption(c *Config) {
c.SoftLineBreak = true
}
// WithSoftLineBreak is a functional option that indicates whether softline breaks
// should be rendered as '<br>'.
func WithSoftLineBreak() interface {
renderer.Option
Option
} {
return &withSoftLineBreak{}
}
// XHTML is an option name used in WithXHTML.
const XHTML renderer.OptionName = "XHTML"
type withXHTML struct {
}
func (o *withXHTML) SetConfig(c *renderer.Config) {
c.Options[XHTML] = true
}
func (o *withXHTML) SetHTMLOption(c *Config) {
c.XHTML = true
}
// WithXHTML is a functional option indicates that nodes should be rendered in
// xhtml instead of HTML5.
func WithXHTML() interface {
Option
renderer.Option
} {
return &withXHTML{}
}
// A Renderer struct is an implementation of renderer.NodeRenderer that renders
// nodes as (X)HTML.
type Renderer struct {
Config
}
// NewRenderer returns a new Renderer with given options.
func NewRenderer(opts ...Option) renderer.NodeRenderer {
r := &Renderer{
Config: NewConfig(),
}
for _, opt := range opts {
opt.SetHTMLOption(&r.Config)
}
return r
}
// Render implements renderer.NodeRenderer.Render.
func (r *Renderer) Render(writer util.BufWriter, source []byte, n ast.Node, entering bool) (ast.WalkStatus, error) {
switch node := n.(type) {
// blocks
case *ast.Document:
return r.renderDocument(writer, source, node, entering), nil
case *ast.Heading:
return r.renderHeading(writer, source, node, entering), nil
case *ast.Blockquote:
return r.renderBlockquote(writer, source, node, entering), nil
case *ast.CodeBlock:
return r.renderCodeBlock(writer, source, node, entering), nil
case *ast.FencedCodeBlock:
return r.renderFencedCodeBlock(writer, source, node, entering), nil
case *ast.HTMLBlock:
return r.renderHTMLBlock(writer, source, node, entering), nil
case *ast.List:
return r.renderList(writer, source, node, entering), nil
case *ast.ListItem:
return r.renderListItem(writer, source, node, entering), nil
case *ast.Paragraph:
return r.renderParagraph(writer, source, node, entering), nil
case *ast.TextBlock:
return r.renderTextBlock(writer, source, node, entering), nil
case *ast.ThemanticBreak:
return r.renderThemanticBreak(writer, source, node, entering), nil
// inlines
case *ast.AutoLink:
return r.renderAutoLink(writer, source, node, entering), nil
case *ast.CodeSpan:
return r.renderCodeSpan(writer, source, node, entering), nil
case *ast.Emphasis:
return r.renderEmphasis(writer, source, node, entering), nil
case *ast.Image:
return r.renderImage(writer, source, node, entering), nil
case *ast.Link:
return r.renderLink(writer, source, node, entering), nil
case *ast.RawHTML:
return r.renderRawHTML(writer, source, node, entering), nil
case *ast.Text:
return r.renderText(writer, source, node, entering), nil
}
return ast.WalkContinue, renderer.NotSupported
}
func (r *Renderer) writeLines(w util.BufWriter, source []byte, n ast.Node) {
l := n.Lines().Len()
for i := 0; i < l; i++ {
line := n.Lines().At(i)
r.Writer.RawWrite(w, line.Value(source))
}
}
func (r *Renderer) renderDocument(w util.BufWriter, source []byte, n *ast.Document, entering bool) ast.WalkStatus {
// nothing to do
return ast.WalkContinue
}
func (r *Renderer) renderHeading(w util.BufWriter, source []byte, n *ast.Heading, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<h")
w.WriteByte("0123456"[n.Level])
if n.ID != nil {
w.WriteString(` id="`)
w.Write(n.ID)
w.WriteByte('"')
}
w.WriteByte('>')
} else {
w.WriteString("</h")
w.WriteByte("0123456"[n.Level])
w.WriteString(">\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderBlockquote(w util.BufWriter, source []byte, n *ast.Blockquote, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<blockquote>\n")
} else {
w.WriteString("</blockquote>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderCodeBlock(w util.BufWriter, source []byte, n *ast.CodeBlock, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<pre><code>")
r.writeLines(w, source, n)
} else {
w.WriteString("</code></pre>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderFencedCodeBlock(w util.BufWriter, source []byte, n *ast.FencedCodeBlock, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<pre><code")
if n.Info != nil {
segment := n.Info.Segment
info := segment.Value(source)
i := 0
for ; i < len(info); i++ {
if info[i] == ' ' {
break
}
}
language := info[:i]
w.WriteString(" class=\"language-")
r.Writer.Write(w, language)
w.WriteString("\"")
}
w.WriteByte('>')
r.writeLines(w, source, n)
} else {
w.WriteString("</code></pre>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderHTMLBlock(w util.BufWriter, source []byte, n *ast.HTMLBlock, entering bool) ast.WalkStatus {
if entering {
l := n.Lines().Len()
for i := 0; i < l; i++ {
line := n.Lines().At(i)
w.Write(line.Value(source))
}
} else {
if n.HasClosure() {
closure := n.ClosureLine
w.Write(closure.Value(source))
}
}
return ast.WalkContinue
}
func (r *Renderer) renderList(w util.BufWriter, source []byte, n *ast.List, entering bool) ast.WalkStatus {
tag := "ul"
if n.IsOrdered() {
tag = "ol"
}
if entering {
w.WriteByte('<')
w.WriteString(tag)
if n.IsOrdered() && n.Start != 1 {
fmt.Fprintf(w, " start=\"%d\">\n", n.Start)
} else {
w.WriteString(">\n")
}
} else {
w.WriteString("</")
w.WriteString(tag)
w.WriteString(">\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderListItem(w util.BufWriter, source []byte, n *ast.ListItem, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<li>")
fc := n.FirstChild()
if fc != nil {
if _, ok := fc.(*ast.TextBlock); !ok {
w.WriteByte('\n')
}
}
} else {
w.WriteString("</li>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderParagraph(w util.BufWriter, source []byte, n *ast.Paragraph, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<p>")
} else {
w.WriteString("</p>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderTextBlock(w util.BufWriter, source []byte, n *ast.TextBlock, entering bool) ast.WalkStatus {
if !entering {
if _, ok := n.NextSibling().(ast.Node); ok && n.FirstChild() != nil {
w.WriteByte('\n')
}
return ast.WalkContinue
}
return ast.WalkContinue
}
func (r *Renderer) renderThemanticBreak(w util.BufWriter, source []byte, n *ast.ThemanticBreak, entering bool) ast.WalkStatus {
if !entering {
return ast.WalkContinue
}
if r.XHTML {
w.WriteString("<hr />\n")
} else {
w.WriteString("<hr>\n")
}
return ast.WalkContinue
}
func (r *Renderer) renderAutoLink(w util.BufWriter, source []byte, n *ast.AutoLink, entering bool) ast.WalkStatus {
if !entering {
return ast.WalkContinue
}
w.WriteString(`<a href="`)
segment := n.Value.Segment
value := segment.Value(source)
if n.AutoLinkType == ast.AutoLinkEmail && !bytes.HasPrefix(bytes.ToLower(value), []byte("mailto:")) {
w.WriteString("mailto:")
}
w.Write(util.EscapeHTML(util.URLEscape(value, false)))
w.WriteString(`">`)
w.Write(util.EscapeHTML(value))
w.WriteString(`</a>`)
return ast.WalkContinue
}
func (r *Renderer) renderCodeSpan(w util.BufWriter, source []byte, n *ast.CodeSpan, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<code>")
for c := n.FirstChild(); c != nil; c = c.NextSibling() {
segment := c.(*ast.Text).Segment
value := segment.Value(source)
if bytes.HasSuffix(value, []byte("\n")) {
r.Writer.RawWrite(w, value[:len(value)-1])
if c != n.LastChild() {
r.Writer.RawWrite(w, []byte(" "))
}
} else {
r.Writer.RawWrite(w, value)
}
}
return ast.WalkSkipChildren
}
w.WriteString("</code>")
return ast.WalkContinue
}
func (r *Renderer) renderEmphasis(w util.BufWriter, source []byte, n *ast.Emphasis, entering bool) ast.WalkStatus {
tag := "em"
if n.Level == 2 {
tag = "strong"
}
if entering {
w.WriteByte('<')
w.WriteString(tag)
w.WriteByte('>')
} else {
w.WriteString("</")
w.WriteString(tag)
w.WriteByte('>')
}
return ast.WalkContinue
}
func (r *Renderer) renderLink(w util.BufWriter, source []byte, n *ast.Link, entering bool) ast.WalkStatus {
if entering {
w.WriteString("<a href=\"")
w.Write(util.EscapeHTML(util.URLEscape(n.Destination, true)))
w.WriteByte('"')
if n.Title != nil {
w.WriteString(` title="`)
r.Writer.Write(w, n.Title)
w.WriteByte('"')
}
w.WriteByte('>')
} else {
w.WriteString("</a>")
}
return ast.WalkContinue
}
func (r *Renderer) renderImage(w util.BufWriter, source []byte, n *ast.Image, entering bool) ast.WalkStatus {
if !entering {
return ast.WalkContinue
}
w.WriteString("<img src=\"")
w.Write(util.EscapeHTML(util.URLEscape(n.Destination, true)))
w.WriteString(`" alt="`)
w.Write(n.Text(source))
w.WriteByte('"')
if n.Title != nil {
w.WriteString(` title="`)
r.Writer.Write(w, n.Title)
w.WriteByte('"')
}
if r.XHTML {
w.WriteString(" />")
} else {
w.WriteString(">")
}
return ast.WalkSkipChildren
}
func (r *Renderer) renderRawHTML(w util.BufWriter, source []byte, n *ast.RawHTML, entering bool) ast.WalkStatus {
return ast.WalkContinue
}
func (r *Renderer) renderText(w util.BufWriter, source []byte, n *ast.Text, entering bool) ast.WalkStatus {
if !entering {
return ast.WalkContinue
}
segment := n.Segment
if n.IsRaw() {
w.Write(segment.Value(source))
} else {
r.Writer.Write(w, segment.Value(source))
if n.HardLineBreak() || (n.SoftLineBreak() && r.SoftLineBreak) {
if r.XHTML {
w.WriteString("<br />\n")
} else {
w.WriteString("<br>\n")
}
} else if n.SoftLineBreak() {
w.WriteByte('\n')
}
}
return ast.WalkContinue
}
func readWhile(source []byte, index [2]int, pred func(byte) bool) (int, bool) {
j := index[0]
ok := false
for ; j < index[1]; j++ {
c1 := source[j]
if pred(c1) {
ok = true
continue
}
break
}
return j, ok
}
// A Writer interface wirtes textual contents to a writer.
type Writer interface {
// Write writes given source to writer with resolving references and unescaping
// backslash escaped characters.
Write(writer util.BufWriter, source []byte)
// RawWrite wirtes given source to writer without resolving references and
// unescaping backslash escaped characters.
RawWrite(writer util.BufWriter, source []byte)
}
type defaultWriter struct {
}
var htmlEscaleTable = [256][]byte{nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, []byte("&quot;"), nil, nil, nil, []byte("&amp;"), nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, []byte("&lt;"), nil, []byte("&gt;"), nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil}
func escapeRune(writer util.BufWriter, r rune) {
if r < 256 {
v := htmlEscaleTable[byte(r)]
if v != nil {
writer.Write(v)
return
}
}
writer.WriteRune(util.ToValidRune(r))
}
func (d *defaultWriter) RawWrite(writer util.BufWriter, source []byte) {
n := 0
l := len(source)
for i := 0; i < l; i++ {
v := htmlEscaleTable[source[i]]
if v != nil {
writer.Write(source[i-n : i])
n = 0
writer.Write(v)
continue
}
n++
}
if n != 0 {
writer.Write(source[l-n:])
}
}
func (d *defaultWriter) Write(writer util.BufWriter, source []byte) {
escaped := false
ok := false
limit := len(source)
n := 0
for i := 0; i < limit; i++ {
c := source[i]
if escaped {
if util.IsPunct(c) {
d.RawWrite(writer, source[n:i-1])
n = i
escaped = false
continue
}
}
if c == '&' {
pos := i
next := i + 1
if next < limit && source[next] == '#' {
nnext := next + 1
nc := source[nnext]
// code point like #x22;
if nnext < limit && nc == 'x' || nc == 'X' {
start := nnext + 1
i, ok = readWhile(source, [2]int{start, limit}, util.IsHexDecimal)
if ok && i < limit && source[i] == ';' {
v, _ := strconv.ParseUint(util.BytesToReadOnlyString(source[start:i]), 16, 32)
d.RawWrite(writer, source[n:pos])
n = i + 1
escapeRune(writer, rune(v))
continue
}
// code point like #1234;
} else if nc >= '0' && nc <= '9' {
start := nnext
i, ok = readWhile(source, [2]int{start, limit}, util.IsNumeric)
if ok && i < limit && i-start < 8 && source[i] == ';' {
v, _ := strconv.ParseUint(util.BytesToReadOnlyString(source[start:i]), 0, 32)
d.RawWrite(writer, source[n:pos])
n = i + 1
escapeRune(writer, rune(v))
continue
}
}
} else {
start := next
i, ok = readWhile(source, [2]int{start, limit}, util.IsAlphaNumeric)
// entity reference
if ok && i < limit && source[i] == ';' {
name := util.BytesToReadOnlyString(source[start:i])
entity, ok := util.LookUpHTML5EntityByName(name)
if ok {
d.RawWrite(writer, source[n:pos])
n = i + 1
d.RawWrite(writer, entity.Characters)
continue
}
}
}
i = next - 1
}
if c == '\\' {
escaped = true
continue
}
escaped = false
}
d.RawWrite(writer, source[n:len(source)])
}
// DefaultWriter is a default implementation of the Writer.
var DefaultWriter = &defaultWriter{}

161
renderer/renderer.go Normal file
View file

@ -0,0 +1,161 @@
// Package renderer renders given AST to certain formats.
package renderer
import (
"bufio"
"io"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/util"
"sync"
)
// A Config struct is a data structure that holds configuration of the Renderer.
type Config struct {
Options map[OptionName]interface{}
NodeRenderers util.PrioritizedSlice
}
// NewConfig returns a new Config
func NewConfig() *Config {
return &Config{
Options: map[OptionName]interface{}{},
NodeRenderers: util.PrioritizedSlice{},
}
}
type notSupported struct {
}
func (e *notSupported) Error() string {
return "not supported by this parser"
}
// NotSupported indicates given node can not be rendered by this NodeRenderer.
var NotSupported = &notSupported{}
// An OptionName is a name of the option.
type OptionName string
// An Option interface is a functional option type for the Renderer.
type Option interface {
SetConfig(*Config)
}
type withNodeRenderers struct {
value []util.PrioritizedValue
}
func (o *withNodeRenderers) SetConfig(c *Config) {
c.NodeRenderers = append(c.NodeRenderers, o.value...)
}
// WithNodeRenderers is a functional option that allow you to add
// NodeRenderers to the renderer.
func WithNodeRenderers(ps ...util.PrioritizedValue) Option {
return &withNodeRenderers{ps}
}
type withOption struct {
name OptionName
value interface{}
}
func (o *withOption) SetConfig(c *Config) {
c.Options[o.name] = o.value
}
// WithOption is a functional option that allow you to set
// an arbitary option to the parser.
func WithOption(name OptionName, value interface{}) Option {
return &withOption{name, value}
}
// A SetOptioner interface sets given option to the object.
type SetOptioner interface {
// SetOption sets given option to the object.
// Unacceptable options may be passed.
// Thus implementations must ignore unacceptable options.
SetOption(name OptionName, value interface{})
}
// A NodeRenderer interface renders given AST node to given writer.
type NodeRenderer interface {
// Render renders given AST node to given writer.
Render(writer util.BufWriter, source []byte, n ast.Node, entering bool) (ast.WalkStatus, error)
}
// A Renderer interface renders given AST node to given
// writer with given Renderer.
type Renderer interface {
Render(w io.Writer, source []byte, n ast.Node) error
// AddOption adds given option to thie parser.
AddOption(Option)
}
type renderer struct {
config *Config
options map[OptionName]interface{}
nodeRenderers []NodeRenderer
initSync sync.Once
}
// NewRenderer returns a new Renderer with given options.
func NewRenderer(options ...Option) Renderer {
config := NewConfig()
for _, opt := range options {
opt.SetConfig(config)
}
r := &renderer{
options: map[OptionName]interface{}{},
config: config,
}
return r
}
func (r *renderer) AddOption(o Option) {
o.SetConfig(r.config)
}
// Render renders given AST node to given writer with given Renderer.
func (r *renderer) Render(w io.Writer, source []byte, n ast.Node) error {
r.initSync.Do(func() {
r.options = r.config.Options
r.config.NodeRenderers.Sort()
r.nodeRenderers = make([]NodeRenderer, 0, len(r.config.NodeRenderers))
for _, v := range r.config.NodeRenderers {
nr, _ := v.Value.(NodeRenderer)
if se, ok := v.Value.(SetOptioner); ok {
for oname, ovalue := range r.options {
se.SetOption(oname, ovalue)
}
}
r.nodeRenderers = append(r.nodeRenderers, nr)
}
r.config = nil
})
writer, ok := w.(util.BufWriter)
if !ok {
writer = bufio.NewWriter(w)
}
err := ast.Walk(n, func(n ast.Node, entering bool) (ast.WalkStatus, error) {
var s ast.WalkStatus
var err error
for _, nr := range r.nodeRenderers {
s, err = nr.Render(writer, source, n, entering)
if err == NotSupported {
continue
}
break
}
return s, err
})
if err != nil {
return err
}
return writer.Flush()
}

492
text/reader.go Normal file
View file

@ -0,0 +1,492 @@
package text
import (
"github.com/yuin/goldmark/util"
"io"
"regexp"
"unicode/utf8"
)
const invalidValue = -1
// EOF indicates the end of file.
const EOF = byte(0xff)
// A Reader interface provides abstracted method for reading text.
type Reader interface {
io.RuneReader
// Source returns a source of the reader.
Source() []byte
// Peek returns a byte at current position without advancing the internal pointer.
Peek() byte
// PeekLine returns the current line without advancing the internal pointer.
PeekLine() ([]byte, Segment)
// PrecendingCharacter returns a character just before current internal pointer.
PrecendingCharacter() rune
// Value returns a value of given segment.
Value(Segment) []byte
// LineOffset returns a distance from the line head to current position.
LineOffset() int
// Position returns current line number and position.
Position() (int, Segment)
// SetPosition sets current line number and position.
SetPosition(int, Segment)
// SetPadding sets padding to the reader.
SetPadding(int)
// Advance advances the internal pointer.
Advance(int)
// AdvanceAndSetPadding advances the internal pointer and add padding to the
// reader.
AdvanceAndSetPadding(int, int)
// AdvanceLine advances the internal pointer to the next line head.
AdvanceLine()
// SkipSpaces skips space characters and returns a non-blank line.
// If it reaches EOF, returns false.
SkipSpaces() (Segment, int, bool)
// SkipSpaces skips blank lines and returns a non-blank line.
// If it reaches EOF, returns false.
SkipBlankLines() (Segment, int, bool)
// Match performs regular expression matching to current line.
Match(reg *regexp.Regexp) bool
// Match performs regular expression searching to current line.
FindSubMatch(reg *regexp.Regexp) [][]byte
}
type reader struct {
source []byte
sourceLength int
line int
peekedLine []byte
pos Segment
head int
}
// NewReader return a new Reader that can read UTF-8 bytes .
func NewReader(source []byte) Reader {
r := &reader{
source: source,
sourceLength: len(source),
line: -1,
head: 0,
}
r.AdvanceLine()
return r
}
func (r *reader) Source() []byte {
return r.source
}
func (r *reader) Value(seg Segment) []byte {
return seg.Value(r.source)
}
func (r *reader) Peek() byte {
if r.pos.Start >= 0 && r.pos.Start < r.sourceLength {
if r.pos.Padding != 0 {
return space[0]
}
return r.source[r.pos.Start]
}
return EOF
}
func (r *reader) PeekLine() ([]byte, Segment) {
if r.pos.Start >= 0 && r.pos.Start < r.sourceLength {
if r.peekedLine == nil {
r.peekedLine = r.pos.Value(r.Source())
}
return r.peekedLine, r.pos
}
return nil, r.pos
}
// io.RuneReader interface
func (r *reader) ReadRune() (rune, int, error) {
return readRuneReader(r)
}
func (r *reader) LineOffset() int {
v := r.pos.Start - r.head
if r.pos.Padding > 0 {
v += util.TabWidth(v) - r.pos.Padding
}
return v
}
func (r *reader) PrecendingCharacter() rune {
if r.pos.Start <= 0 {
if r.pos.Padding != 0 {
return rune(' ')
}
return rune('\n')
}
i := r.pos.Start - 1
for ; i >= 0; i-- {
if utf8.RuneStart(r.source[i]) {
break
}
}
rn, _ := utf8.DecodeRune(r.source[i:])
return rn
}
func (r *reader) Advance(n int) {
if n < len(r.peekedLine) && r.pos.Padding == 0 {
r.pos.Start += n
r.peekedLine = nil
return
}
r.peekedLine = nil
l := r.sourceLength
for ; n > 0 && r.pos.Start < l; n-- {
if r.pos.Padding != 0 {
r.pos.Padding--
continue
}
if r.source[r.pos.Start] == '\n' {
r.AdvanceLine()
continue
}
r.pos.Start++
}
}
func (r *reader) AdvanceAndSetPadding(n, padding int) {
r.Advance(n)
if padding > r.pos.Padding {
r.SetPadding(padding)
}
}
func (r *reader) AdvanceLine() {
r.peekedLine = nil
r.pos.Start = r.pos.Stop
r.head = r.pos.Start
if r.pos.Start < 0 {
return
}
r.pos.Stop = invalidValue
for i := r.pos.Start; i < r.sourceLength; i++ {
c := r.source[i]
if c == '\n' {
r.pos.Stop = i + 1
break
}
}
r.line++
r.pos.Padding = 0
}
func (r *reader) Position() (int, Segment) {
return r.line, r.pos
}
func (r *reader) SetPosition(line int, pos Segment) {
r.line = line
r.pos = pos
}
func (r *reader) SetPadding(v int) {
r.pos.Padding = v
}
func (r *reader) SkipSpaces() (Segment, int, bool) {
return skipSpacesReader(r)
}
func (r *reader) SkipBlankLines() (Segment, int, bool) {
return skipBlankLinesReader(r)
}
func (r *reader) Match(reg *regexp.Regexp) bool {
return matchReader(r, reg)
}
func (r *reader) FindSubMatch(reg *regexp.Regexp) [][]byte {
return findSubMatchReader(r, reg)
}
// A BlockReader interface is a reader that is optimized for Blocks.
type BlockReader interface {
Reader
Reset(segment *Segments)
}
type blockReader struct {
source []byte
segments *Segments
segmentsLength int
line int
pos Segment
head int
last int
}
// NewBlockReader returns a new BlockReader.
func NewBlockReader(source []byte, segments *Segments) BlockReader {
r := &blockReader{
source: source,
}
if segments != nil {
r.Reset(segments)
}
return r
}
// Reset resets current state and sets new segments to the reader.
func (r *blockReader) Reset(segments *Segments) {
r.segments = segments
r.segmentsLength = segments.Len()
r.line = -1
r.head = 0
r.last = 0
r.pos.Start = -1
r.pos.Stop = -1
r.pos.Padding = 0
if r.segmentsLength > 0 {
last := r.segments.At(r.segmentsLength - 1)
r.last = last.Stop
}
r.AdvanceLine()
}
func (r *blockReader) Source() []byte {
return r.source
}
func (r *blockReader) Value(seg Segment) []byte {
line := r.segmentsLength - 1
ret := make([]byte, 0, seg.Stop-seg.Start+1)
for ; line >= 0; line-- {
if seg.Start >= r.segments.At(line).Start {
break
}
}
i := seg.Start
for ; line < r.segmentsLength; line++ {
s := r.segments.At(line)
if i < 0 {
i = s.Start
}
ret = s.ConcatPadding(ret)
for ; i < seg.Stop && i < s.Stop; i++ {
ret = append(ret, r.source[i])
}
i = -1
if s.Stop > seg.Stop {
break
}
}
return ret
}
// io.RuneReader interface
func (r *blockReader) ReadRune() (rune, int, error) {
return readRuneReader(r)
}
func (r *blockReader) PrecendingCharacter() rune {
if r.pos.Padding != 0 {
return rune(' ')
}
if r.pos.Start <= 0 {
return rune('\n')
}
i := r.pos.Start - 1
for ; i >= 0; i-- {
if utf8.RuneStart(r.source[i]) {
break
}
}
rn, _ := utf8.DecodeRune(r.source[i:])
return rn
}
func (r *blockReader) LineOffset() int {
v := r.pos.Start - r.head
if r.pos.Padding > 0 {
v += util.TabWidth(v) - r.pos.Padding
}
return v
}
func (r *blockReader) Peek() byte {
if r.line < r.segmentsLength && r.pos.Start >= 0 && r.pos.Start < r.last {
if r.pos.Padding != 0 {
return space[0]
}
return r.source[r.pos.Start]
}
return EOF
}
func (r *blockReader) PeekLine() ([]byte, Segment) {
if r.line < r.segmentsLength && r.pos.Start >= 0 && r.pos.Start < r.last {
return r.pos.Value(r.source), r.pos
}
return nil, r.pos
}
func (r *blockReader) Advance(n int) {
if n < r.pos.Stop-r.pos.Start && r.pos.Padding == 0 {
r.pos.Start += n
return
}
for ; n > 0; n-- {
if r.pos.Padding != 0 {
r.pos.Padding--
continue
}
if r.pos.Start >= r.pos.Stop-1 && r.pos.Stop < r.last {
r.AdvanceLine()
continue
}
r.pos.Start++
}
}
func (r *blockReader) AdvanceAndSetPadding(n, padding int) {
r.Advance(n)
if padding > r.pos.Padding {
r.SetPadding(padding)
}
}
func (r *blockReader) AdvanceLine() {
r.SetPosition(r.line+1, NewSegment(invalidValue, invalidValue))
r.head = r.pos.Start
}
func (r *blockReader) Position() (int, Segment) {
return r.line, r.pos
}
func (r *blockReader) SetPosition(line int, pos Segment) {
r.line = line
if pos.Start == invalidValue {
if r.line < r.segmentsLength {
r.pos = r.segments.At(line)
}
} else {
r.pos = pos
}
}
func (r *blockReader) SetPadding(v int) {
r.pos.Padding = v
}
func (r *blockReader) SkipSpaces() (Segment, int, bool) {
return skipSpacesReader(r)
}
func (r *blockReader) SkipBlankLines() (Segment, int, bool) {
return skipBlankLinesReader(r)
}
func (r *blockReader) Match(reg *regexp.Regexp) bool {
return matchReader(r, reg)
}
func (r *blockReader) FindSubMatch(reg *regexp.Regexp) [][]byte {
return findSubMatchReader(r, reg)
}
func skipBlankLinesReader(r Reader) (Segment, int, bool) {
lines := 0
for {
line, seg := r.PeekLine()
if line == nil {
return seg, lines, false
}
if util.IsBlank(line) {
lines++
r.AdvanceLine()
} else {
return seg, lines, true
}
}
}
func skipSpacesReader(r Reader) (Segment, int, bool) {
chars := 0
for {
line, segment := r.PeekLine()
if line == nil {
return segment, chars, false
}
for i, c := range line {
if util.IsSpace(c) {
chars++
r.Advance(1)
continue
}
return segment.WithStart(segment.Start + i + 1), chars, true
}
}
}
func matchReader(r Reader, reg *regexp.Regexp) bool {
oldline, oldseg := r.Position()
match := reg.FindReaderSubmatchIndex(r)
r.SetPosition(oldline, oldseg)
if match == nil {
return false
}
r.Advance(match[1] - match[0])
return true
}
func findSubMatchReader(r Reader, reg *regexp.Regexp) [][]byte {
oldline, oldseg := r.Position()
match := reg.FindReaderSubmatchIndex(r)
r.SetPosition(oldline, oldseg)
if match == nil {
return nil
}
runes := make([]rune, 0, match[1]-match[0])
for i := 0; i < match[1]; {
r, size, _ := readRuneReader(r)
i += size
runes = append(runes, r)
}
result := [][]byte{}
for i := 0; i < len(match); i += 2 {
result = append(result, []byte(string(runes[match[i]:match[i+1]])))
}
r.SetPosition(oldline, oldseg)
r.Advance(match[1] - match[0])
return result
}
func readRuneReader(r Reader) (rune, int, error) {
line, _ := r.PeekLine()
if line == nil {
return 0, 0, io.EOF
}
rn, size := utf8.DecodeRune(line)
if rn == utf8.RuneError {
return 0, 0, io.EOF
}
r.Advance(size)
return rn, size, nil
}

209
text/segment.go Normal file
View file

@ -0,0 +1,209 @@
package text
import (
"bytes"
"github.com/yuin/goldmark/util"
)
var space = []byte(" ")
// A Segment struct holds information about source potisions.
type Segment struct {
// Start is a start position of the segment.
Start int
// Stop is a stop position of the segment.
// This value should be excluded.
Stop int
// Padding is a padding length of the segment.
Padding int
}
// NewSegment return a new Segment.
func NewSegment(start, stop int) Segment {
return Segment{
Start: start,
Stop: stop,
Padding: 0,
}
}
// NewSegmentPadding returns a new Segment with given padding.
func NewSegmentPadding(start, stop, n int) Segment {
return Segment{
Start: start,
Stop: stop,
Padding: n,
}
}
// Value returns a value of the segment.
func (t *Segment) Value(buffer []byte) []byte {
if t.Padding == 0 {
return buffer[t.Start:t.Stop]
}
result := make([]byte, 0, t.Padding+t.Stop-t.Start+1)
result = append(result, bytes.Repeat(space, t.Padding)...)
return append(result, buffer[t.Start:t.Stop]...)
}
// Len returns a length of the segment.
func (t *Segment) Len() int {
return t.Stop - t.Start + t.Padding
}
// Between returns a segment between this segment and given segment.
func (t *Segment) Between(other Segment) Segment {
if t.Stop != other.Stop {
panic("invalid state")
}
return NewSegmentPadding(
t.Start,
other.Start,
t.Padding-other.Padding,
)
}
// IsEmpty returns true if this segment is empty, otherwise false.
func (t *Segment) IsEmpty() bool {
return t.Start >= t.Stop && t.Padding == 0
}
// TrimRightSpace returns a new segment by slicing off all trailing
// space characters.
func (t *Segment) TrimRightSpace(buffer []byte) Segment {
v := buffer[t.Start:t.Stop]
l := util.TrimRightSpaceLength(v)
if l == len(v) {
return NewSegment(t.Start, t.Start)
}
return NewSegmentPadding(t.Start, t.Stop-l, t.Padding)
}
// TrimLeftSpace returns a new segment by slicing off all leading
// space characters including padding.
func (t *Segment) TrimLeftSpace(buffer []byte) Segment {
v := buffer[t.Start:t.Stop]
l := util.TrimLeftSpaceLength(v)
return NewSegment(t.Start+l, t.Stop)
}
// TrimLeftSpaceWidth returns a new segment by slicing off leading space
// characters until given width.
func (t *Segment) TrimLeftSpaceWidth(width int, buffer []byte) Segment {
padding := t.Padding
for ; width > 0; width-- {
if padding == 0 {
break
}
padding--
}
if width == 0 {
return NewSegmentPadding(t.Start, t.Stop, padding)
}
text := buffer[t.Start:t.Stop]
start := t.Start
for _, c := range text {
if start >= t.Stop-1 || width <= 0 {
break
}
if c == ' ' {
width--
} else if c == '\t' {
width -= 4
} else {
break
}
start++
}
if width < 0 {
padding = width * -1
}
return NewSegmentPadding(start, t.Stop, padding)
}
// WithStart returns a new Segment with same value except Start.
func (t *Segment) WithStart(v int) Segment {
return NewSegmentPadding(v, t.Stop, t.Padding)
}
// WithStop returns a new Segment with same value except Stop.
func (t *Segment) WithStop(v int) Segment {
return NewSegmentPadding(t.Start, v, t.Padding)
}
// ConcatPadding concats the padding to given slice.
func (t *Segment) ConcatPadding(v []byte) []byte {
if t.Padding > 0 {
return append(v, bytes.Repeat(space, t.Padding)...)
}
return v
}
// Segments is a collection of the Segment.
type Segments struct {
values []Segment
}
// NewSegments return a new Segments.
func NewSegments() *Segments {
return &Segments{
values: nil,
}
}
// Append appends given segment after the tail of the collection.
func (s *Segments) Append(t Segment) {
if s.values == nil {
s.values = make([]Segment, 0, 20)
}
s.values = append(s.values, t)
}
// AppendAll appends all elements of given segments after the tail of the collection.
func (s *Segments) AppendAll(t []Segment) {
if s.values == nil {
s.values = make([]Segment, 0, 20)
}
s.values = append(s.values, t...)
}
// Len returns the length of the collection.
func (s *Segments) Len() int {
if s.values == nil {
return 0
}
return len(s.values)
}
// At returns a segment at given index.
func (s *Segments) At(i int) Segment {
return s.values[i]
}
// Set sets given Segment.
func (s *Segments) Set(i int, v Segment) {
s.values[i] = v
}
// SetSliced replace the collection with a subsliced value.
func (s *Segments) SetSliced(lo, hi int) {
s.values = s.values[lo:hi]
}
// Sliced returns a subslice of the collection.
func (s *Segments) Sliced(lo, hi int) []Segment {
return s.values[lo:hi]
}
// Clear delete all element of the collction.
func (s *Segments) Clear() {
s.values = nil
}
// Unshift insert given Segment to head of the collection.
func (s *Segments) Unshift(v Segment) {
s.values = append(s.values[0:1], s.values[0:]...)
s.values[0] = v
}

2142
util/html5entities.go Normal file

File diff suppressed because it is too large Load diff

538
util/util.go Normal file
View file

@ -0,0 +1,538 @@
// Package util provides utility functions for the goldmark.
package util
import (
"bytes"
"fmt"
"io"
"net/url"
"regexp"
"sort"
"strconv"
"strings"
"unicode/utf8"
)
// IsBlank returns true if given string is all space characters.
func IsBlank(bs []byte) bool {
for _, b := range bs {
if IsSpace(b) {
continue
}
return false
}
return true
}
// DedentPosition dedents lines by given width.
func DedentPosition(bs []byte, width int) (pos, padding int) {
i := 0
l := len(bs)
w := 0
for ; i < l && w < width; i++ {
b := bs[i]
if b == ' ' {
w++
} else if b == '\t' {
w += 4
} else {
break
}
}
padding = w - width
if padding < 0 {
padding = 0
}
return i, padding
}
// VisualizeSpaces visualize invisible space characters.
func VisualizeSpaces(bs []byte) []byte {
bs = bytes.Replace(bs, []byte(" "), []byte("[SPACE]"), -1)
bs = bytes.Replace(bs, []byte("\t"), []byte("[TAB]"), -1)
bs = bytes.Replace(bs, []byte("\n"), []byte("[NEWLINE]\n"), -1)
return bs
}
// TabWidth calculates actual width of a tab at given position.
func TabWidth(currentPos int) int {
return 4 - currentPos%4
}
// IndentPosition searches an indent position with given width for given line.
// If the line contains tab characters, paddings may be not zero.
// currentPos==0 and width==2:
//
// position: 0 1
// [TAB]aaaa
// width: 1234 5678
//
// width=2 is in the tab character. In this case, IndentPosition returns
// (pos=1, padding=2)
func IndentPosition(bs []byte, currentPos, width int) (pos, padding int) {
w := 0
l := len(bs)
for i := 0; i < l; i++ {
b := bs[i]
if b == ' ' {
w++
} else if b == '\t' {
w += TabWidth(currentPos + w)
} else {
break
}
if w >= width {
return i + 1, w - width
}
}
return -1, -1
}
// IndentWidth calculate an indent width for given line.
func IndentWidth(bs []byte, currentPos int) (width, pos int) {
l := len(bs)
for i := 0; i < l; i++ {
b := bs[i]
if b == ' ' {
width++
pos++
} else if b == '\t' {
width += TabWidth(currentPos + width)
pos++
} else {
break
}
}
return
}
// FirstNonSpacePosition returns a potisoin line that is a first nonspace
// character.
func FirstNonSpacePosition(bs []byte) int {
i := 0
for ; i < len(bs); i++ {
c := bs[i]
if c == ' ' || c == '\t' {
continue
}
if c == '\n' {
return -1
}
return i
}
return -1
}
// FindClosure returns a position that closes given opener.
// If codeSpan is set true, it ignores characters in code spans.
// If allowNesting is set true, closures correspond to nested opener will be
// ignored.
func FindClosure(bs []byte, opener, closure byte, codeSpan, allowNesting bool) int {
i := 0
opened := 1
codeSpanOpener := 0
for i < len(bs) {
c := bs[i]
if codeSpan && codeSpanOpener != 0 && c == '`' {
codeSpanCloser := 0
for ; i < len(bs); i++ {
if bs[i] == '`' {
codeSpanCloser++
} else {
break
}
}
if codeSpanCloser == codeSpanOpener {
codeSpanOpener = 0
}
} else if c == '\\' && i < len(bs)-1 && IsPunct(bs[i+1]) {
i += 2
continue
} else if codeSpan && codeSpanOpener == 0 && c == '`' {
for ; i < len(bs); i++ {
if bs[i] == '`' {
codeSpanOpener++
} else {
break
}
}
} else if (codeSpan && codeSpanOpener == 0) || !codeSpan {
if c == closure {
opened--
if opened == 0 {
return i
}
} else if c == opener {
if !allowNesting {
return -1
}
opened++
}
}
i++
}
return -1
}
// TrimLeft trims characters in given s from head of the source.
// bytes.TrimLeft offers same functionalities, but bytes.TrimLeft
// allocates new buffer for the result.
func TrimLeft(source, b []byte) []byte {
i := 0
for ; i < len(source); i++ {
c := source[i]
found := false
for j := 0; j < len(b); j++ {
if c == b[j] {
found = true
break
}
}
if !found {
break
}
}
return source[i:]
}
// TrimRight trims characters in given s from tail of the source.
func TrimRight(source, b []byte) []byte {
i := len(source) - 1
for ; i >= 0; i-- {
c := source[i]
found := false
for j := 0; j < len(b); j++ {
if c == b[j] {
found = true
break
}
}
if !found {
break
}
}
return source[:i+1]
}
// TrimLeftLength returns a length of leading specified characters.
func TrimLeftLength(source, s []byte) int {
return len(source) - len(TrimLeft(source, s))
}
// TrimRightLength returns a length of trailing specified characters.
func TrimRightLength(source, s []byte) int {
return len(source) - len(TrimRight(source, s))
}
// TrimLeftSpaceLength returns a length of leading space characters.
func TrimLeftSpaceLength(source []byte) int {
return TrimLeftLength(source, spaces)
}
// TrimRightSpaceLength returns a length of trailing space characters.
func TrimRightSpaceLength(source []byte) int {
return TrimRightLength(source, spaces)
}
// TrimLeftSpace returns a subslice of given string by slicing off all leading
// space characters.
func TrimLeftSpace(source []byte) []byte {
return TrimLeft(source, spaces)
}
// TrimRightSpace returns a subslice of given string by slicing off all trailing
// space characters.
func TrimRightSpace(source []byte) []byte {
return TrimRight(source, spaces)
}
// ReplaceSpaces replaces sequence of spaces with given repl.
func ReplaceSpaces(source []byte, repl byte) []byte {
var ret []byte
start := -1
for i, c := range source {
iss := IsSpace(c)
if start < 0 && iss {
start = i
continue
} else if start >= 0 && iss {
continue
} else if start >= 0 {
if ret == nil {
ret = make([]byte, 0, len(source))
ret = append(ret, source[:start]...)
}
ret = append(ret, repl)
start = -1
}
if ret != nil {
ret = append(ret, c)
}
}
if start >= 0 && ret != nil {
ret = append(ret, repl)
}
if ret == nil {
return source
}
return ret
}
// ToRune decode given bytes start at pos and returns a rune.
func ToRune(source []byte, pos int) rune {
i := pos
for ; i >= 0; i-- {
if utf8.RuneStart(source[i]) {
break
}
}
r, _ := utf8.DecodeRune(source[i:])
return r
}
// ToValidRune returns 0xFFFD if given rune is invalid, otherwise v.
func ToValidRune(v rune) rune {
if v == 0 || !utf8.ValidRune(v) {
return rune(0xFFFD)
}
return v
}
// ToLinkReference convert given bytes into a valid link reference string.
// ToLinkReference trims leading and trailing spaces and convert into lower
// case and replace spaces with a single space character.
func ToLinkReference(v []byte) string {
v = TrimLeftSpace(v)
v = TrimRightSpace(v)
return strings.ToLower(string(ReplaceSpaces(v, ' ')))
}
var escapeRegex = regexp.MustCompile(`\\.`)
var hexRefRegex = regexp.MustCompile(`#[xX][\da-fA-F]+;`)
var numRefRegex = regexp.MustCompile(`#\d{1,7};`)
var entityRefRegex = regexp.MustCompile(`&([a-zA-Z\d]+);`)
var entityLt = []byte("&lt;")
var entityGt = []byte("&gt;")
var entityAmp = []byte("&amp;")
var entityQuot = []byte("&quot;")
// EscapeHTML escapes characters that should be escaped in HTML text.
func EscapeHTML(v []byte) []byte {
result := make([]byte, 0, len(v)+10)
for _, c := range v {
switch c {
case '<':
result = append(result, entityLt...)
case '>':
result = append(result, entityGt...)
case '&':
result = append(result, entityAmp...)
case '"':
result = append(result, entityQuot...)
default:
result = append(result, c)
}
}
return result
}
// UnescapePunctuations unescapes blackslash escaped punctuations.
func UnescapePunctuations(v []byte) []byte {
return escapeRegex.ReplaceAllFunc(v, func(match []byte) []byte {
if IsPunct(match[1]) {
return []byte{match[1]}
}
return match
})
}
// ResolveNumericReferences resolve numeric references like '&#1234;" .
func ResolveNumericReferences(v []byte) []byte {
buf := make([]byte, 6, 6)
v = hexRefRegex.ReplaceAllFunc(v, func(match []byte) []byte {
v, _ := strconv.ParseUint(string(match[2:len(match)-1]), 16, 32)
n := utf8.EncodeRune(buf, ToValidRune(rune(v)))
return buf[:n]
})
return numRefRegex.ReplaceAllFunc(v, func(match []byte) []byte {
v, _ := strconv.ParseUint(string(match[1:len(match)-1]), 0, 32)
n := utf8.EncodeRune(buf, ToValidRune(rune(v)))
return buf[:n]
})
}
// ResolveEntityNames resolve entity references like '&ouml;" .
func ResolveEntityNames(v []byte) []byte {
return entityRefRegex.ReplaceAllFunc(v, func(match []byte) []byte {
entity, ok := LookUpHTML5EntityByName(string(match[1 : len(match)-1]))
if ok {
return entity.Characters
}
return match
})
}
// URLEscape escape given URL.
// If resolveReference is set true:
// 1. unescape punctuations
// 2. resolve numeric references
// 3. resolve entity references
//
// URL encoded values (%xx) are keeped as is.
func URLEscape(v []byte, resolveReference bool) []byte {
if resolveReference {
v = UnescapePunctuations(v)
v = ResolveNumericReferences(v)
v = ResolveEntityNames(v)
}
result := make([]byte, 0, len(v)+10)
for i := 0; i < len(v); {
c := v[i]
if urlEscapeTable[c] == 1 {
result = append(result, c)
i++
continue
}
if c == '%' && i+2 < len(v) && IsHexDecimal(v[i+1]) && IsHexDecimal(v[i+1]) {
result = append(result, c, v[i+1], v[i+2])
i += 3
continue
}
u8len := utf8lenTable[c]
if u8len == 99 { // invalid utf8 leading byte, skip it
result = append(result, c)
i++
continue
}
if c == ' ' {
result = append(result, '%', '2', '0')
i++
continue
}
result = append(result, []byte(url.QueryEscape(string(v[i:i+int(u8len)])))...)
i += int(u8len)
}
return result
}
// GenerateLinkID generates an ID for links.
func GenerateLinkID(value []byte, exists map[string]bool) []byte {
value = TrimLeftSpace(value)
value = TrimRightSpace(value)
result := []byte{}
for i := 0; i < len(value); {
v := value[i]
l := utf8lenTable[v]
i += int(l)
if l != 1 {
continue
}
if IsAlphaNumeric(v) {
result = append(result, v)
} else if v == ' ' {
result = append(result, '-')
}
}
if len(result) == 0 {
result = []byte("id")
}
if _, ok := exists[string(result)]; !ok {
exists[string(result)] = true
return result
}
for i := 1; ; i++ {
newResult := fmt.Sprintf("%s%d", result, i)
if _, ok := exists[newResult]; !ok {
exists[newResult] = true
return []byte(newResult)
}
}
}
var spaces = []byte(" \t\n\x0b\x0c\x0d")
var spaceTable = [256]int8{0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
var punctTable = [256]int8{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
// a-zA-Z0-9, ;/?:@&=+$,-_.!~*'()#
var urlEscapeTable = [256]int8{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
var utf8lenTable = [256]int8{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 99, 99, 99, 99, 99, 99, 99, 99}
// IsPunct returns true if given character is a punctuation, otherwise false.
func IsPunct(c byte) bool {
return punctTable[c] == 1
}
// IsSpace returns true if given character is a space, otherwise false.
func IsSpace(c byte) bool {
return spaceTable[c] == 1
}
// IsNumeric returns true if given character is a numeric, otherwise false.
func IsNumeric(c byte) bool {
return c >= '0' && c <= '9'
}
// IsHexDecimal returns true if given character is a hexdecimal, otherwise false.
func IsHexDecimal(c byte) bool {
return c >= '0' && c <= '9' || c >= 'a' && c <= 'f' || c >= 'A' && c <= 'F'
}
// IsAlphaNumeric returns true if given character is a alphabet or a numeric, otherwise false.
func IsAlphaNumeric(c byte) bool {
return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9'
}
// A BufWriter is a subset of the bufio.Writer .
type BufWriter interface {
io.Writer
Available() int
Buffered() int
Flush() error
WriteByte(c byte) error
WriteRune(r rune) (size int, err error)
WriteString(s string) (int, error)
}
// A PrioritizedValue struct holds pair of an arbitary value and a priority.
type PrioritizedValue struct {
// Value is an arbitary value that you want to prioritize.
Value interface{}
// Priority is a priority of the value.
Priority int
}
// PrioritizedSlice is a slice of the PrioritizedValues
type PrioritizedSlice []PrioritizedValue
// Sort sorts the PrioritizedSlice in ascending order.
func (s PrioritizedSlice) Sort() {
sort.Slice(s, func(i, j int) bool {
return s[i].Priority < s[j].Priority
})
}
// Remove removes given value from this slice.
func (s PrioritizedSlice) Remove(v interface{}) PrioritizedSlice {
i := 0
found := false
for ; i < len(s); i++ {
if s[i].Value == v {
found = true
break
}
}
if !found {
return s
}
return append(s[:i], s[i+1:]...)
}
// Prioritized returns a new PrioritizedValue.
func Prioritized(v interface{}, priority int) PrioritizedValue {
return PrioritizedValue{v, priority}
}

13
util/util_safe.go Normal file
View file

@ -0,0 +1,13 @@
// +build appengine,js
package util
// BytesToReadOnlyString returns a string converted from given bytes.
func BytesToReadOnlyString(b []byte) string {
return string(b)
}
// StringToReadOnlyBytes returns bytes converted from given string.
func StringToReadOnlyBytes(s string) []byte {
return []byte(s)
}

20
util/util_unsafe.go Normal file
View file

@ -0,0 +1,20 @@
// +build !appengine,!js
package util
import (
"reflect"
"unsafe"
)
// BytesToReadOnlyString returns a string converted from given bytes.
func BytesToReadOnlyString(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
// StringToReadOnlyBytes returns bytes converted from given string.
func StringToReadOnlyBytes(s string) []byte {
sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
bh := reflect.SliceHeader{Data: sh.Data, Len: sh.Len, Cap: sh.Len}
return *(*[]byte)(unsafe.Pointer(&bh))
}