Json-LD is required for AI training, AI indexing and extended search engine snippets. Markdown files do not translate natively into Json-LD. This project offers an easy way to transform markdown into Json-LD.
The markdown-to-structured-jsonld project transforms non-annotated Markdown files into structured Ai-training-ready-data, compliant to JSON-LD Schema.org. However, we aim that it is extendable to other schemas (e.g., Dublin Core). This README describes the Default Markdown to Structured Data JSON-LD Transformation, which maps Markdown structures (e.g., articles, FAQs) to Structure Data Schema.org types (e.g., Article, NewsArticle, FAQPage) using YAML front matter and document content, or minimal inference for plain Markdown. This way, markdown files can be transformed easily and allow simple semantic data extraction for AI-driven applications like NLWeb’s conversational AI. The transformation uses Markdown as the root format to generate multiple outputs (JSON-LD, HTML, Java objects) and is compatible with traditional Markdown renderers (e.g., GitHub, CommonMark).
An extended Markdown format with inline annotations (e.g., [text]@{Type,property=value}) is available for explicit tagging. See the extended specification for details.
This specification is created in formal and precise way, hoping to enable all AI systems to create parsers or transformers for the markdown format that it can be transformed into AI optimized Schema.org Datatypes.
This project is the based of the Markdown to Schema.org JSON-LD for AI SEO website.
The idea behind this project is to make it easy for developers to transform Markdown content to JSON-LD, which enhances the processability of their content in AI, Large Language Models (LLMs), and Natural Language Web (NLWeb) contexts. Here's an expanded explanation:
JSON-LD (JavaScript Object Notation for Linked Data) provides structured data that can be more easily understood by machines. When content is transformed from Markdown to JSON-LD:
- Enhanced AI Understanding: AI systems can better comprehend the semantic meaning of content
- Improved Search Engine Visibility: Search engines like Google use structured data for rich results
- Better Content Processing: LLMs can process and reason about structured content more effectively
- Semantic Relationships: Relationships between content items become explicit rather than implicit
- Semantic Enrichment: Adds meaningful context to content through Schema.org vocabularies
- Content Discoverability: Makes content more discoverable by AI systems and search engines
- Consistent Structure: Provides a standardized way to represent Markdown content
- AI-Ready Content: Prepares content for optimal processing by AI and LLMs
- Metadata Preservation: Maintains important metadata from the original Markdown
This transformation is particularly valuable for:
- Content Publishers: Blogs, news sites, and documentation platforms seeking AI visibility
- Knowledge Bases: Making knowledge repositories more accessible to AI systems
- Technical Documentation: Enhancing the machine-readability of technical content
- FAQ Systems: Structuring question-answer pairs for better AI consumption
- Product Information: Structuring product details for improved search and AI integratio
For JavaScript developers, we provide an npm package implementation. See the lib/README.md for installation instructions, API reference, and usage examples.
The transformer can process any Markdown file. For plain Markdown, it infers minimal metadata:
- Type: Defaults to
Article(Schema.org). - Schema: Defaults to
https://schema.org. - Name: Taken from the first H1 header, or the file name if no H1 exists.
- Description: Taken from YAML
description, the first paragraph, or a 200-character excerpt if no distinct paragraph is found. - ArticleBody: Concatenates all paragraphs, lists, and blockquotes.
- URL: Uses a default base URL (configurable, e.g.,
https://www.iunera.com/blog/{{filename}}) if nobase_urlis provided. - Author, Publisher, Date, Keywords, Categories: Use transformer defaults (configurable) or remain empty if not specified.
This ensures that even minimal Markdown files produce valid JSON-LD, making the transformer versatile for various use cases.
The transformer supports configuration options to customize metadata, either manually or via defaults, ensuring flexibility for files with incomplete or no YAML front matter. Options include:
-
Manual Settings:
--type <string>: Set the document type (e.g.,NewsArticle); overrides YAMLtype.--schema <url>: Set the schema URL (e.g.,https://schema.org); overrides YAMLschema.--author <string | JSON-LD>: Set the author(s) (e.g.,["Christian Schmitt", "Dr. Tim Frey"]).--publisher <JSON-LD>: Set publisher details (e.g.,{"@type": "Organization", "name": "Iunera", "@id": "https://www.iunera.com"}).--date <ISO date>: Set publication date (e.g.,2025-06-01).--base-url <url>: Set base URL (e.g.,https://www.iunera.com/blog/).--keywords <list>: Set keywords.--categories <list>: Set categories.--description <string>: Set description text; overrides YAML or inferred description.
-
Default Values:
- Type:
Articleif not specified. - Schema:
https://schema.orgif not specified. - Author: Configurable default (e.g.,
["Unknown Author"]) or empty. - Publisher: Configurable default (e.g.,
{"@type": "Organization", "name": "Unknown Publisher", "@id": "https://example.com"}). - Date: Current date (e.g.,
2025-06-01) or empty. - Base URL: Configurable default (e.g.,
https://www.iunera.com/blog/) with{{filename}}for URL generation. - Keywords, Categories: Empty lists if not specified.
- Description: First 200 characters of the article body if not specified.
- Type:
-
Empty Fields:
- Fields like
keywords,categories,author,publisher,description, anddatecan be omitted, resulting in empty or null values in JSON-LD where Schema.org allows (e.g.,keywords: [],author: null).
- Fields like
The transformation converts non-annotated Markdown into JSON-LD by analyzing YAML front matter and document structure, or inferring minimal metadata for plain Markdown. Key components include:
- YAML Front Matter: Defines metadata such as type, schema, author(s), publisher, keywords, categories, description, and URL template. If absent, defaults or transformer options apply.
- Document Structure: Maps Markdown elements to JSON-LD:
- H1 headers set the
name. - First paragraph or YAML
descriptionsets thedescription. - Paragraphs, lists, and blockquotes form the
articleBody. - H2 headers like
FAQorFrequently Asked Questionstrigger aFAQPage.
- H1 headers set the
- Table Extraction: Markdown tables are extracted and mapped to Schema.org
TableorDatasettypes. The transformer analyzes headers and content to classify tables as:- Pricing: Contains price-related terms (e.g.,
price,$,€) or plan names; mapped toTablewithPriceSpecification. - Comparison: Includes comparison terms (e.g.,
vs,feature,✓,✗); mapped toTablewith feature descriptions. - Specification: Contains technical terms (e.g.,
spec,property,value); mapped toTablewithProductproperties. - Listing: Lists items or organizations (e.g.,
name,company); mapped toTablewith directory entries. - Dataset: Large tables with numeric data (e.g.,
metric,count); mapped toDatasetwith statistical summaries.
- Pricing: Contains price-related terms (e.g.,
- List Extraction: Ordered and unordered lists are mapped to
ItemListwithListItemelements, capturing item text and position. - Link Extraction: Non-annotated links are mapped to
WebPage,SoftwareSourceCode(e.g., GitHub URLs),VideoObject(e.g., YouTube),DigitalDocument(e.g., PDFs), orImageObject(e.g., images), included asmentions. - Section Extraction: H2 and higher headers are mapped to
WebPageElement, representing article sections withname,text, andcssSelector. - FAQ Handling: FAQ sections are identified by an H2 header (
FAQ,Frequently Asked Questions, case-insensitive, with optional trailing text) oris_faq: truein YAML for standalone FAQs. FAQs in articles generate a linkedFAQPage, reusing article metadata. - Extendable Types: A
---separator defines new JSON-LD sections, each with its own type and schema, reusing main article metadata unless overridden. - Metadata Reuse: Secondary entities (e.g.,
FAQPage,Table,ItemList) inherit metadata from the main article. - Links:
mailtolinks are transformed intoemailproperties.
The transformer uses configuration options to ensure flexibility, producing valid JSON-LD for any Markdown input.
Below, we use an example article on License Token: Pioneering Fair Code and Combating Open Source Exploitation to illustrate the transformation, followed by variants showing default behavior and minimal Markdown.
---
type: Article
schema: https://schema.org
base_url: https://www.iunera.com/blog/
date: 2025-06-01
author:
- Dr. Tim Frey
- Christian Schmitt
publisher:
name: Iunera
url: https://www.iunera.com/
id: https://www.iunera.com
address:
type: PostalAddress
streetAddress: Altrottstraße 31
addressLocality: Walldorf
postalCode: 69190
telephone: +49 6227 381350
slug: license-token-fair-code
keywords:
- open source
- fair code
- license token
- blockchain
- AI
categories:
- Technology
- Software Licensing
- Blockchain Innovation
description: Revolutionizing open source with the License Token model, tackling exploitation via blockchain.
---
# License Token: Pioneering Fair Code and Combating Open Source Exploitation
[Iunera](https://www.iunera.com/)@{Organization,name=Iunera,@id=#iunera} is revolutionizing open source software with the **License Token** model, specifically the [Open Compensation Token License (OCTL)](https://github.com/open-compensation-token-license/octl)@{CreativeWork,license=https://github.com/open-compensation-token-license/octl/blob/main/LICENSE.md,@id=#octl}. This article explores how OCTL tackles exploitation, promotes fair code principles, and leverages blockchain and AI for a sustainable developer ecosystem.
## The Open Source Exploitation Crisis
Open source software powers 90% of modern applications, from cloud platforms to AI models. Yet, a 2025 study reveals 70% of maintainers receive no financial support, despite corporate exploitation. This imbalance causes burnout, underfunded projects, and vulnerabilities like Heartbleed (2014). Without change, open source sustainability is at risk.
## OCTL License Comparison
| License Type | Revenue Model | Usage Tracking | Fair Compensation |
|--------------|---------------|----------------|-------------------|
| MIT/GPL | None | No | No |
| Commercial | Fixed Fee | Limited | Partial |
| OCTL | NFT Royalties | Blockchain | Yes |
## How OCTL Works
1. License Creation: Developers mint an OCTL NFT with royalty terms.
2. Code Usage: Enterprises register usage.
3. Distribution: Smart contracts distribute royalties.
## Frequently Asked Questions
### What is OCTL?
A blockchain-based license ensuring fair compensation via NFT royalties.
### How does OCTL prevent exploitation?
NFTs and smart contracts track usage, requiring royalties.- YAML Parsing:
type: Articleandschema: https://schema.orgdefine a Schema.orgArticle.author: [Dr. Tim Frey, Christian Schmitt]maps to multiplePersonobjects.publishergenerates nested JSON-LD with@id: https://www.iunera.com.base_urlandslugform the URL.keywordsandcategoriesmap tokeywordsandabout.descriptionis taken from YAML.
- Main Article:
- H1 sets
name. - YAML
descriptionsetsdescription. - Content forms
articleBody.
- H1 sets
- Table Extraction:
- The table under
OCTL License Comparisonis classified as a comparison table (due to terms likeLicense Type,Fair Compensation) and mapped to aTablewith feature descriptions.
- The table under
- List Extraction:
- The ordered list under
How OCTL Worksis mapped to anItemListwith threeListItemelements.
- The ordered list under
- FAQ Section:
- Triggered by
## Frequently Asked Questions, generating aFAQPagewithQuestionobjects. - Reuses article metadata.
- Triggered by
- Link Extraction:
- Links like
[Iunera](https://www.iunera.com/)and[OCTL](https://github.com/.../octl)are mapped toWebPageandSoftwareSourceCode, included asmentions.
- Links like
- Section Extraction:
- H2 headers (e.g.,
The Open Source Exploitation Crisis) are mapped toWebPageElementwithname,text, andcssSelector.
- H2 headers (e.g.,
- Output:
- Multiple JSON-LD objects:
Article,FAQPage,Table,ItemList,WebPageElement, and link entities.
- Multiple JSON-LD objects:
# License Token: Pioneering Fair Code and Combating Open Source Exploitation
[... rest of the content identical to the first example, without FAQ or YAML ...]- No YAML: Infers metadata:
type:Article.schema:https://schema.org.name: From H1.description: First 200 characters of the article body.articleBody: All content.url: Default base URL with filename.author,publisher,date,keywords,categories: Defaults or empty.
- Table and List Extraction: Tables and lists are still extracted and mapped to
TableandItemList. - Section Extraction: H2 headers are mapped to
WebPageElement. - Link Extraction: Links are mapped to appropriate types.
- Output:
Article,Table,ItemList,WebPageElement, and link entities.
To ensure generated Markdown articles produce compliant JSON-LD when processed by the markdown-to-structured-jsonld transformer, include the following directive in your article generation prompts:
Generate a Markdown article compliant with the markdown-to-structured-jsonld project’s Default Markdown to Structured JSON-LD Transformation. Include:
1. **YAML Front Matter**:
- `type`: A Schema.org type (e.g., `Article`, `NewsArticle`); omit to default to `Article`.
- `schema`: A schema URL (e.g., `https://schema.org`); omit to default to `https://schema.org`.
- `base_url`: Set to `https://www.iunera.com/blog/` for dynamic URL generation.
- `date`: Current date in ISO 8601 format (e.g., `2025-06-01`).
- `author`: A list of author names (e.g., `[Author One, Author Two]`) or JSON-LD.
- `publisher.name`: Publisher name (e.g., `Iunera`).
- `publisher.url`: Publisher URL (e.g., `https://www.iunera.com/`).
- `publisher.id`: Publisher identifier (e.g., `https://www.iunera.com`).
- `publisher.address.type`: `PostalAddress`.
- `publisher.address.streetAddress`, `publisher.address.addressLocality`, `publisher.address.postalCode`: Address details.
- `publisher.telephone`: Contact number.
- `slug`: A unique identifier for the article (e.g., `license-token-fair-code`).
- `keywords`: List of relevant keywords; optional.
- `categories`: List of categories as topics; optional.
- `description`: A 200-character summary of the article; optional.
2. **Content Structure**:
- Include an H1 header for the article title.
- Use standard Markdown for paragraphs, lists, tables, and links.
- Include a FAQ section with an H2 header named `FAQ` or `Frequently Asked Questions`.
- Use H3 headers for FAQ questions, followed by answer paragraphs.
- Optionally include tables for pricing, comparison, specifications, listings, or datasets.
- Optionally include ordered or unordered lists for procedures or items.
Ensure the Markdown is compatible with traditional renderers, uses standard syntax, and produces valid Schema.org JSON-LD when processed by the markdown-to-structured-jsonld transformer.
This prompt ensures that generated Markdown articles adhere to the transformation rules of the markdown-to-structured-jsonld project, producing valid JSON-LD output. It enforces a consistent structure with YAML front matter, article content, an FAQ section, and optional tables and lists, while allowing flexibility (e.g., omitting type, schema, keywords, categories, or description for defaults). The prompt guarantees compatibility with traditional Markdown renderers and supports semantic data extraction for NLWeb’s AI applications, even for minimal Markdown files.
- Purpose: Defines metadata for the main document and sections.
- Syntax:
--- type: <string> schema: <url> base_url: <url_template> date: <ISO date> author: <string | [string, ...] | JSON-LD> publisher.<property>: <value> slug: <string> keywords: [<string>, ...] categories: [<string>, ...] description: <string> ---
title: Maps toname.type: Schema.org type (e.g.,Article,FAQPage); defaults toArticle.schema: Vocabulary URL (defaulthttps://schema.org).base_url,slug: Form the URL;slugoptional with transformer default.date: Maps todatePublished; optional with transformer default.author: String, list, or JSON-LD; lists map to multiplePersonobjects; optional.publisher.<property>: Dot notation for JSON-LD; optional.keywords: Maps tokeywords; optional.categories: Maps toabout; optional.description: Maps todescription; optional, defaults to article body excerpt.
- Behavior:
- Parsed as metadata for the main document.
- Dot notation generates nested JSON-LD.
- Metadata reused for secondary entities.
- Ignored by non-YAML renderers.
- Default Type:
Article(Schema.org) - Trigger: No YAML
typeor specific structure. - Properties:
name: YAMLtitle, H1, or file name.description: YAMLdescription, first paragraph, or 200-character excerpt.url: YAMLbase_urlwithslug, or default base URL with filename.@context: YAMLschemaorhttps://schema.org.author,publisher,datePublished,keywords,about: From YAML, transformer defaults, or empty.
- Type:
Articleor specified type - Trigger: YAML
typeor H1 with paragraphs. - Properties:
name: YAMLtitleor H1.description: YAMLdescriptionor excerpt.articleBody: Concatenated paragraphs, lists, blockquotes.datePublished: YAMLdateor default.author,publisher: From YAML, defaults, or empty.url: Generated frombase_urlandslug.keywords,about: From YAML or empty.hasPart: Includes tables, lists, and sections.mentions: Includes extracted links.
- Type:
FAQPage - Trigger:
- H2 header matching
FAQorFrequently Asked Questions(case-insensitive, optional trailing text). - YAML
is_faq: truefor standalone FAQs.
- H2 header matching
- Properties:
name: Articlenamewith “FAQ: ” prefix, or YAMLtitle.mainEntity:Questionobjects from H3 headers and paragraphs.url: Article URL or generated frombase_url.- Reuses article metadata.
- Behavior: Generates a separate
FAQPageJSON-LD object, linked viamainEntityOfPage.
- Type:
TableorDataset - Trigger: Markdown table syntax (
| ... |). - Properties:
name: Auto-generated (e.g.,Table 1) or from preceding header.description: Based on table type (pricing, comparison, etc.).text: Summarized table content (headers and sample rows).about: Contextual type (e.g.,PriceSpecification,Product) for specific table types.
- Behavior: Included in
hasPartfor articles or as standalone entities.
- Type:
ItemList - Trigger: Ordered (
1.) or unordered (-,*,+) lists. - Properties:
name: Auto-generated (e.g.,Bulleted List 1).itemListElement: Array ofListItemwithnameandposition.numberOfItems: Count of items.
- Behavior: Included as separate entities.
- Types:
WebPage,SoftwareSourceCode,VideoObject,DigitalDocument,ImageObject - Trigger:
[text](url)syntax with HTTP URLs. - Properties:
name: Link text.url,@id: URL.
- Behavior: Included as
mentions.
- Type:
WebPageElement - Trigger: H2+ headers.
- Properties:
name: Header text.text: Section content.cssSelector: Generated ID.
- Behavior: Included in
hasPart.
- Trigger:
---separator followed by YAML metadata (e.g.,type,schema). - Properties:
- Defined by YAML metadata in the section.
- Reuses main document metadata unless overridden.
- Behavior: Generates a new JSON-LD object.
- Renders normally in traditional Markdown renderers.
- YAML and
---are ignored by non-YAML renderers. - H2/H3 headers, tables, and lists are standard Markdown.
document ::= yaml_front_matter? content (section_separator yaml_front_matter? content)*
yaml_front_matter ::= "---" NL yaml_content "---" NL
yaml_content ::= (yaml_key_value NL)*
yaml_key_value ::= key ":" WS (string | json_ld | yaml_array)
key ::= "title" | "type" | "schema" | "base_url" | "date" | "author" | "publisher" | [A-Za-z]+ ("." [A-Za-z]+)* | "slug" | "keywords" | "categories" | "is_faq" | "description"
yaml_array ::= "- " string (NL "- " string)*
json_ld ::= "{" json_key_value ("," json_key_value)* "}"
json_key_value ::= "\"" key "\"" ":" (string | url | json_ld | reference)
content ::= (line | heading | paragraph | list | table | link)*
heading ::= ("#" | "##" | "###") WS string NL
paragraph ::= string NL
list ::= ("-" | [0-9]+".") WS string NL
table ::= table_row (NL table_row)*
table_row ::= "|" (WS string WS "|")+
link ::= "[" string "]" "(" url ")"
section_separator ::= "---" NL
string ::= [^\n]+
url ::= "http://" | "https://" | "mailto:" [^ \n]+
reference ::= "#" [A-Za-z0-9_-]+
WS ::= [ \t]*
NL ::= "\n"
-
Initialize:
- Set
defaultContexttohttps://schema.org. - Set
defaultTypetoArticle. - Initialize
jsonLdListfor JSON-LD objects. - Set
baseUrl(configurable, e.g.,https://www.iunera.com).
- Set
-
Parse YAML:
- Extract
title,type(defaultArticle),schema(defaulthttps://schema.org),base_url,date,author,publisher,slug,keywords,categories,is_faq,description. - Parse dot notation into nested JSON-LD.
- Parse
authoras string, list, or JSON-LD. - Replace
slugplaceholder or use default.
- Extract
-
Detect Structure:
- Identify H1 for
nameor use file name. - Extract YAML
description, first paragraph, or 200-character excerpt fordescription. - Concatenate paragraphs, lists, blockquotes for
articleBody. - Check for H2 headers matching
FAQorFrequently Asked Questions. - Identify tables, lists, and links.
- Identify H1 for
-
Infer Types:
- Use YAML
typeordefaultType. - Trigger
FAQPagewith H2 FAQ headers oris_faq: true. - Map tables to
TableorDatasetbased on content analysis. - Map lists to
ItemList. - Map links to appropriate types.
- Map sections to
WebPageElement. - Parse new sections after
---with their owntypeandschema.
- Use YAML
-
Map Properties:
- Common:
name,description,url,keywords,about. - Type-specific:
articleBody,mainEntity,hasPart,mentions. - Set
@contextfromschemaordefaultContext. - Add tables to
hasPartor as standaloneTable/Dataset. - Add lists as
ItemList. - Add links as
mentions. - Add sections to
hasPartasWebPageElement. - Reuse metadata or use defaults for secondary entities.
- Common:
-
Output:
- HTML: Render Markdown, embed JSON-LD in
<script>. - JSON-LD: Array of objects with multiple
@context.
- HTML: Render Markdown, embed JSON-LD in
We choose fair code, fair work, fair payment, open collaboration.
Licensed under the OPEN COMPENSATION TOKEN LICENSE (the "License").
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
[https://github.com/open-compensation-token-license/license/blob/main/LICENSE.md](https://github.com/open-compensation-token-license/license/blob/main/LICENSE.md)
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
See the License for the specific language governing permissions and
limitations under the License.
@octl.sid: 5fecd757-5fec-d757-d757-00005fb33b80
@octl.sid: x-octl-sid:5fecd757-5fec-d757-d757-00005fb33b80
- Why did we choose the OCTL as alternative to the BSD 3-Clause License?
- Why we do NOT apply Apache 2.0 License?
This project is licensed under the Open Compensation Token License (OCTL), with the unique project identifier
x-octl-sid:5fecd757-5fec-d757-d757-00005fb33b80. The OCTL enables blockchain-based licensing and royalty distribution via NFTs. View the license token
at https://www.license-token.com/license/new-procurement/x-octl-sid%3A5fecd757-5fec-d757-d757-00005fb33b80.
See the LICENSE file or OCTL license text for details. For OCTL
compliance, ensure contributions are registered with the project’s x-octl-sid using the license token link.