MDML Parser

A Python library for parsing and generating MDML (Markdown Metadata List) documents - a lightweight format for structured data with Markdown-like syntax.

Features

Resilient parsing with error recovery
Flexible syntax supporting inline and list formats
Rich metadata including dates, times, details and strikethrough
Markdown links with automatic URL extraction
Nested structures with sub-fields and sub-items
Array values using { ; } syntax
Raw text using pipe | | syntax
Wiki links using [[ ]] syntax
YAML frontmatter support
Multiple export formats (JSON, YAML, dict)
Generator for creating MDML from Python data structures

Installation

(Soon)

Quick Start

from mdml import parse_document, generate_markup

content = """
name: `John Doe`
status: `active`, `2024-01-15`
website: [My Site](https://example.com)
"""
doc = parse_document(content)
print(doc.to_json())

data = {
    'fields': {
        'name': {
            'name': 'name',
            'is_list': False,
            'values': [
                {
                    'value': 'John Doe',
                    'datetime': '2026-01-24 14:45'
                }
            ]
        }
    }
}
markup = generate_markup(data)
print(markup)

MDML Syntax

Inline Fields

Simple key-value pairs:

name: `John Doe`
age: `30`
email: `john@example.com`

List Fields

Multiple values with timestamps and metadata:

tasks:
- `Write documentation`, `2024-01-15 14:30`
- `Review code` (urgent), `2024-01-15`
- ~~`Old task`~~

Details

Add contextual information in parentheses:

status: `completed` (reviewed by team)
price: `$99.99` (discounted)

Dates and Times

Append timestamps after values:

deadline: `Submit report`, `2024-12-31`
meeting: `Team sync`, `2024-01-20 10:00`

Strikethrough

Mark deprecated or completed items:

tasks:
- ~~`Obsolete task`~~
- `Active task`

invite: ~~https://t.me/+ABC~~ (expired)

Markdown Links

Links are automatically parsed and URLs are extracted:

website: [Visit our site](https://example.com)
documentation: [Read the docs](https://docs.example.com) (external)
references:
- [GitHub](https://github.com)
- [Stack Overflow](https://stackoverflow.com) (plz don't use), `2026-01-24`

The link text is stored in value and the URL in link_url.

Arrays

Store multiple related values using curly braces and semicolons:

tags: { python ; parsing ; markdown }
colors: { red ; blue ; green }
languages: { en ; fr ; es }

Raw Text

For text with special characters, use pipe delimiters:

note: | This text can have: commas, (parentheses), and | pipes |
description: | Raw text preserves everything as-is |

The closing pipe is optional but recommended.

Wiki Links

Internal links using double brackets:

related: [[Other Document]]
see also: [[Configuration|Config guide]]

Nested Structures

Create hierarchical data with sub-fields and sub-items:

project:
- `Website Redesign`
	- status: `in progress`
	- priority: `high`
	- deadline: `2024-06-30`
	- `Phase 1: Research`
	- `Phase 2: Design`

Frontmatter

Optional YAML metadata at document start:

---
version: 1.0
author: John Doe
created: 2026-01-15
---

content: `Main document content`

API Reference

Parsing Functions

# Parse complete document
doc = parse_document(content: str) -> Document

Generation Functions

# Generate MDML markup from dictionary
markup = generate_markup(data: Dict[str, Any]) -> str

Import Functions

# Create Document from dictionary
doc = from_dict(data: Dict[str, Any]) -> Document

# Create Document from JSON
doc = from_json(json_str: str) -> Document

Document Methods

# Get field by name
field = doc.get_field('field_name') -> Optional[Field]

# Get specific value from field
value = doc.get_value('field_name', index=0) -> Optional[FieldValue]

# Get all values from field
values = doc.get_values('field_name') -> List[FieldValue]

# Check for parse errors
has_errors = doc.has_errors() -> bool

# Export to different formats
json_str = doc.to_json(indent=2) -> str
yaml_str = doc.to_yaml() -> str
data_dict = doc.to_dict() -> Dict[str, Any]

Field Properties

field.name           # Field name
field.is_list        # Boolean: list or inline format
field.values         # List of FieldValue objects
field.first_value    # Most recent value
field.last_value     # Oldest value
field.parse_errors   # List of non-critical errors

FieldValue Properties

value.value              # Main text value
value.date               # Date string (YYYY-MM-DD)
value.time               # Time string (HH:MM)
value.datetime_obj       # Python datetime object
value.datetime_str       # Formatted datetime string
value.details            # Details from parentheses
value.is_strikethrough   # Boolean for strikethrough
value.is_array           # Boolean for array values
value.array_values       # List of array elements
value.is_raw             # Boolean for raw text
value.is_wiki_link       # Boolean for wiki links
value.link_url           # URL from markdown links
value.sub_items          # Dict of named sub-fields
value.list_sub_items     # List of sub-items
value.parse_error        # Error message if parsing failed

Error Handling

MDML uses resilient parsing with error recovery:

doc = parse_document(content)

# Check for errors
if doc.has_errors():
    print("Document errors:", doc.parse_errors)
    
    for field_name, field in doc.fields.items():
        if field.has_errors():
            print(f"Field '{field_name}' errors:", field.parse_errors)
        
        for value in field.values:
            if value.has_error():
                print(f"Value error: {value.parse_error}")
                print(f"Raw value: {value.value}")

Even with errors, the parser returns usable data structures with error annotations.

Advanced Usage

Working with Nested Data

doc = parse_document("""
project:
- `Alpha Release`
	- phase: `Planning`
	- deadline: `2024-06-30`
	- `Define requirements`
	- `Create timeline`
""")

project = doc.get_field('project')
first_item = project.first_value

# Access named sub-fields
phase = first_item.sub_items['phase'].value
deadline = first_item.sub_items['deadline'].value

# Access list sub-items
tasks = [item.value for item in first_item.list_sub_items]

Working with Arrays

doc = parse_document("tags: { python ; parsing ; data }")
value = doc.get_value('tags')

if value.is_array:
    print(value.array_values)  # ['python', 'parsing', 'data']

Working with Markdown Links

doc = parse_document("""
links:
- [GitHub](https://github.com)
- [Documentation](https://docs.example.com) (official)
""")

links = doc.get_field('links')
for link in links.values:
    print(f"Text: {link.value}")
    print(f"URL: {link.link_url}")
    if link.details:
        print(f"Details: {link.details}")

Working with Raw Text

doc = parse_document("note: | Text with: special, (chars) |")
value = doc.get_value('note')

if value.is_raw:
    print(value.value)  # 'Text with: special, (chars)'

Generating MDML

from mdml import MDMLGenerator
from mdml.models import Field, FieldValue

# Create field programmatically
field = Field(
    name='tasks',
    is_list=True,
    values=[
        FieldValue(
            value='Complete project',
            date='2024-01-20',
            time='15:00',
            details='high priority'
        )
    ],
    raw_content=''
)

# Generate markup
lines = MDMLGenerator.generate_field(field)
print('\n'.join(lines))

Exception Handling

Work in progress.

from mdml.exceptions import (
    MDMLException,        # Base exception
    MDMLParseError,       # Critical parsing errors
    MDMLFieldError,       # Field-specific errors
    MDMLValueError        # Value parsing errors
)

try:
    doc = parse_document(content)
except MDMLParseError as e:
    print(f"Parse error: {e}")
except MDMLException as e:
    print(f"MDML error: {e}")

Syntax Rules

Field Names

Lowercase letters, numbers, spaces, and underscores
Must start with a letter
Example: field_name, field name, field123

Values

Wrap in backticks when inline: `value`
Optional backticks in lists unless value has metadata
Use pipe delimiters for raw text: | raw text |

Dates and Times

Dates: ISO format YYYY-MM-DD
Times: 24-hour format HH:MM
Combined: `2024-01-20 15:30`

Arrays

Use curly braces: { item1 ; item2 ; item3 }
Semicolon-separated values
Spaces around semicolons are trimmed

Links

Markdown format: [text](url)
Wiki links: [[link]]
URLs are automatically extracted

Nesting

Use tabs (not spaces) for indentation
List items start with -
Named sub-fields: - field: value
List sub-items: - value

Automatic RAW Detection

Text with spaces and no explicit formatting (backticks, links, strikethrough) is automatically treated as raw text:

description: This is automatically raw text
note: `This requires backticks because it's explicit`

License

Licensed under the Apache License, Version 2.0.

See LICENSE file for details.

Contributing

Contributions welcome! Please submit issues and pull requests on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
mdml		mdml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

MDML Parser

Features

Installation

Quick Start

MDML Syntax

Inline Fields

List Fields

Details

Dates and Times

Strikethrough

Markdown Links

Arrays

Raw Text

Wiki Links

Nested Structures

Frontmatter

API Reference

Parsing Functions

Generation Functions

Import Functions

Document Methods

Field Properties

FieldValue Properties

Error Handling

Advanced Usage

Working with Nested Data

Working with Arrays

Working with Markdown Links

Working with Raw Text

Generating MDML

Exception Handling

Syntax Rules

Field Names

Values

Dates and Times

Arrays

Links

Nesting

Automatic RAW Detection

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages