About mellea

LinkML schema describing the Mellea codebase architecture and public data models.

Status

Schema version: commit df6d0fd
Coverage target: core library (mellea/) and CLI (cli/)
Generation: auto-derived from live Mellea Python sources by linkml/scripts/schema_to_linkml.py
Test fixtures: auto-derived from live sources by linkml/scripts/gen_test_fixtures.py
62 valid + 4 counter-examples under linkml/tests/data/
SSSOM overlay: schema-agnostic injector linkml/scripts/apply_sssom_overlay.py merges CURIE alignments from src/mellea/mappings/*.sssom.tsv into the generated YAML's exact_mappings / close_mappings / broad_mappings / narrow_mappings / related_mappings slots (classes, enums, types, slots, and per-permissible-value)
Validated by: just lint (0 errors), just gen-project, just test (62 pytest fixtures pass; linkml-run-examples accepts valid set and rejects all counter-examples)

What the schema models

Subset	Classes
`core_runtime`	`BackendSpec`, `FormatterSpec`, `ContextSpec`, `SessionSpec`, `ComponentSpec`, `RequirementSpec`, `SamplingStrategySpec`, `MethodSpec`
`interface_surface`	`RepositoryCatalog`, `CliCommandSpec`, `ApiModelSpec`, `ApiFieldSpec`, `ModelIdentifierSpec`, `IntrinsicAdapterSpec`
`observability`	`PluginSpec`, `HookPayloadSpec`, `TelemetryMetricSpec`

Source-derived elements

These schema elements are re-extracted from Python sources on every regeneration - no hand-editing required when upstream changes:

Schema element	Source of truth
`PluginModeEnum`	`mellea/plugins/types.py::PluginMode`
`HookTypeEnum`	`mellea/plugins/types.py::HookType`
`AdapterTypeEnum`	`mellea/backends/adapters/catalog.py::AdapterType`
`BackendFamilyEnum`	`mellea/backends/*.py` filenames
`coverage_inventory`	AST scan of `mellea/` + `cli/` (per-package counts)

Project automation: just commands

All schema and artifact generation, validation, and testing is managed via just recipes. Run these from the linkml/ workspace:

# Regenerate the LinkML schema YAML from live Mellea Python sources
just gen-linkml

# Regenerate test fixtures from live sources
just gen-fixtures

# Apply SSSOM mappings to the generated schema YAML
just apply-sssom-overlay

# Convenience: refresh schema, mappings, and fixtures
just regen-all

# Full pipeline: regenerate, validate, and test
just regen-and-test

# Generate project files (Python, Java, TypeScript, OWL)
just gen-project

# Generated extended/experimental project files (optional)
just gen-project-extended

# Generate documentation
just gen-doc

# Run all tests
just test

# Lint the schema
just lint

# Build docs and run test server
just testdoc

# Install project dependencies
just install

# Update project template and LinkML packages
just update

# Clean all generated files
just clean

See the project.justfile and justfile for full details, including advanced recipes for migrations, deployment, and internal maintenance.

The overlay is idempotent: re-running on a clean tree produces no further changes. The subject-side CURIE prefix is taken from each schema's own default_prefix, so the same script is reusable for downstream LinkML schemas without modification (override with --subject-prefix if needed).

See linkml/scripts/README.md for generator details and the process for adding new auto-derived enums.

Coverage inventory (as of commit df6d0fd)

Package	Total	Breakdown
`cli`	86	CLASS=42, DATACLASS=5, ENUM=4, PYDANTIC_MODEL=20, TYPED_DICT=15
`mellea/backends`	28	CLASS=22, DATACLASS=2, ENUM=1, MIXIN=1, PYDANTIC_MODEL=2
`mellea/core`	27	CLASS=20, DATACLASS=5, ENUM=1, PROTOCOL=1
`mellea/formatters`	57	CLASS=39, ENUM=1, MIXIN=1, PYDANTIC_MODEL=16
`mellea/helpers`	6	CLASS=2, ENUM=1, PYDANTIC_MODEL=1, TYPED_DICT=2
`mellea/plugins`	32	CLASS=28, DATACLASS=2, ENUM=2
`mellea/stdlib`	69	CLASS=50, DATACLASS=10, ENUM=1, PROTOCOL=2, PYDANTIC_MODEL=4, TYPED_DICT=2
`mellea/telemetry`	11	CLASS=11
Total enums	11	(across all packages)

Inventory counts are regenerated automatically and stored as the coverage_inventory annotation on the schema header.

Validation fixtures

Fixture pattern	Count	Source
`BackendSpec-<family>.yaml`	7	`mellea/backends/*.py`
`IntrinsicAdapterSpec-<name>.yaml`	13	`mellea/backends/adapters/catalog.py`
`ModelIdentifierSpec-<const>.yaml`	42	`mellea/backends/model_ids.py`
Counter-examples (invalid)	4	hand-curated

All valid fixtures load through linkml_runtime.yaml_loader; all counter-examples are rejected by linkml-run-examples.

Cross-schema mappings

Curated SSSOM/TSV alignments to seven downstream schemas live under linkml/src/mellea/mappings/.

These files contain artificially-curated (semapv:LLMBasedMatching) mappings, parse cleanly with pypi sssom, use real Mellea CURIEs validated against linkml/src/mellea/schema/mellea.yaml, and carry per-row rationale (with name-collision notes such as RequirementSpec vs nexus:Requirement) in the comment column.

The flagship alignments are the densest and target schemas with genuine domain overlap with the Mellea runtime: the ai-atlas-nexus AI Risk Ontology, the Model Context Protocol, SPDX (AI-package / SBOM), and gist_linkml (Semantic Arts upper ontology, LinkML port - anchors mellea's architectural classes against general-purpose top-level concepts).

The remaining three (ISO 27001, MITRE ATT&CK, FINOS CDM event-position) are provided as smaller, honest cross-domain alignments centred on observability, audit, and structural analogues.

gist prefix note: the gist mapping uses the local prefix gist_linkml: (https://w3id.org/lmodel/gist/) to avoid collision with the upstream Semantic Arts namespace (gist_upstream: https://w3id.org/semanticarts/ns/ontology/gist/).

Mapping set	exact	close	broad	narrow	related	Total	Strongest alignment
`mellea-to-ai-atlas-nexus`	1	9	–	1	25	36	`AdapterTypeEnum` <-> `nexus:AdapterType` (exact)
`mellea-to-mcp`	–	8	–	–	12	20	`ComponentCategoryEnum.MESSAGE` <-> `mcp:PromptMessage`
`mellea-to-gist`	–	12	1	–	7	20	`RequirementSpec` <-> `gist_linkml:Requirement` (closeMatch)
`mellea-to-spdx`	–	6	–	–	8	14	`ModelIdentifierSpec` / `IntrinsicAdapterSpec` <-> `spdx:AIPackage`
`mellea-to-iso27001`	–	2	–	–	10	12	`TelemetryMetricSpec` <-> `iso27001:MonitoringItem`
`mellea-to-attack`	–	–	–	–	9	9	`TelemetryMetricSpec` <-> `attack:DataSource`
`mellea-to-cdm_event_position`	–	–	–	–	6	6	`RepositoryCatalog` <-> `common_domain_model:Portfolio` (structural)
Total	1	37	1	1	77	117

Coverage reflects genuine domain overlap, not row inflation. Mellea models an AI runtime / code architecture, so:

ai-atlas-nexus (AI governance ontology) aligns on adapter type, provider, model, intrinsic, component, requirement, and lifecycle hooks.
MCP (AI runtime protocol) aligns cleanly on stdlib component categories - INSTRUCTION <-> Prompt, MESSAGE <-> PromptMessage / SamplingMessage, TOOL_MESSAGE <-> ToolUseContent / ToolResultContent, DOCUMENT <-> Resource.
gist_linkml (Semantic Arts upper ontology) aligns mellea's architectural classes against general-purpose top-level concepts - RequirementSpec <-> Requirement, ComponentSpec <-> Component, ModelIdentifierSpec <-> ID, RepositoryCatalog <-> Collection, PythonPackage <-> Composite, CliCommandSpec <-> TaskTemplate, BackendSpec / FormatterSpec / ContextSpec / ApiModelSpec <-> Specification. Validated against gist_linkml (gist 14.1.0).
SPDX 3 aligns on AI model provenance via AIPackage, on repository packaging via Sbom / Package, and on plugins via Extension.
ISO 27001 aligns on observability/audit primitives - TelemetryMetricSpec <-> MonitoringItem (closeMatch), PluginModeEnum.AUDIT <-> InternalAudit.
MITRE ATT&CK aligns weakly on detection/observability - TelemetryMetricSpec <-> DataSource, HookPayloadSpec <-> DataComponent.
FINOS CDM event-position has essentially no domain overlap; only structural aggregation analogues at low confidence are recorded.

Object CURIEs are validated against each target schema under src/mellea/mappings/<target>/docs/schema/. Run just apply-sssom-overlay after editing any TSV: the schema-agnostic apply_sssom_overlay.py script projects every row into the generated schema YAML as native LinkML mapping slots, registers each target prefix from the TSV's curie_map, and lands permissible-value subjects (<prefix>:EnumName.PV_NAME, e.g. mellea:BackendFamilyEnum.HUGGINGFACE) on the matching permissible value rather than enum body.

Continuous integration

Two GitHub Actions workflows operate against the linkml/ subdirectory (both set defaults.run.working-directory: linkml and cache against linkml/uv.lock):

Workflow	Triggers	Job
`.github/workflows/linkml-main.yaml`	`push: [main]`, `pull_request`	`just test` on the 3.11–3.14 Python matrix
`.github/workflows/linkml-deploy-docs.yaml`	`push: [main]`, `workflow_dispatch`	`just gen-doc` + `mkdocs gh-deploy`

The deploy workflow pushes to the gh-pages branch via git, so only contents: write is required - Pages / OIDC permissions are unused.

Known gaps

Stylistic linkml-lint warnings (naming conventions on permissible values, missing per-slot descriptions) - non-blocking; addressable in a follow-up.
mellea/templates/ (Jinja templates) is intentionally excluded - no Python declarations to capture.
Per-component instance data (concrete BackendSpec / ComponentSpec records) is not yet emitted; the schema currently defines the shape of such instances. A follow-up generator pass can populate them.
Generator output under linkml/project/ (gen-sqla, gen-pandera, gen-namespaces, …) is excluded from ruff via force-exclude + extend-exclude in the root pyproject.toml [tool.ruff] block. In-place fixes are pointless because just gen-project regenerates and clobbers them - upstream linkml templates are the source of the style noise.