Introduction
The Scripta compiler translates source text written in a markup language to an Elm representation of Html.
Markup Languages
The languages supported by Scripta are
- L0
- microLaTeX
- XMarkdown
Blocks
The text of these markup languages
should be thought of as
structured into blocks, the content of which
is written in an internal language.
For example, in microLaTeX, one might have the text
below. There are seven blocks, each of which
is separated from its neighbor by an empty line.
The first block is a paragraph; its content consists
of plain text followed by the TeX macro expression
\italic{prime}
followed by more plain text.
Let's talk about \italic{prime} numbers.
\begin{theorem}
There are infinitely many primes $p$, and in fact
there are infinitely many primes
\begin{equation}
p \equiv 1 \ \text{mod}\ 4
\end{equation}
and also
\begin{equation}
p \equiv 1 \ \text{mod}\ 8
\end{equation}
and so on.
\end{theorem}
The first paragraph of the theorem was known to Euclid.
The body of the theorem block consists of six
blocks — the three paragraph blocks Let's talk ...
,
and also
, and and so on
. There also the two
equation blocks. The blocks in the body of the
theorem block constitute the \italic{children} of the
block. It is the job of the parser to (1) discover
the forest structure, and (2) to parse the content
of the blocks.
Note that we can visualize the block structure as an outline, as below.
PARAGRAPH
THEOREM
PARAGRPH
EQUATION
EQUATION
PARAGRAPH
PARAGRAM
In some languages, e.g. L0 and Markdown, the block structure is literally given by the "outline" structure, that is, by indentation. Below is our example rewritten in L0:
Let's talk about [italic prime] numbers.
| theorem
There are infinitely many primes $p$, and in fact
there are infinitely many primes
|| equation
p \equiv 1 \ \text{mod}\ 4
and also
|| equation
p \equiv 1 \ \text{mod}\ 8
and so on.
The first paragraph of the theorem was known to Euclid.
Note that an outline is fully equivalent to a tree:
|-- PARAGRAPH
|-- THEOREM
|- PARAGRAPH
|- EQUATION
|- EQUATION
|- PARAGRAPH
|- PARAGRAPH
Internal Language
While the surface syntax in L0, microLaTeX and XMarkdown
depends on the language, the abstract syntax is the
same for all tree. Indeed, text in the internal
language always parses to Either String (List Expr)
,
where
type Expr
= Fun String (List Expr) Meta
| Text String Meta
| Verbatim String String Meta
Block Definition
In the case of L0 and XMarkdown, a primitive block is defined by
type alias PrimitiveBlock =
{ indent : Int
, lineNumber : Int
, position : Int
, content : List String
, name : Maybe String
, args : List String
, properties : Dict String String
, sourceText : String
, blockType : PrimitiveBlockType
, error : Maybe { error : String }
}
In the case of
MicroLaTeX, there are two additional fields,
level: Int
and status: Status
.