About mellea
LinkML schema describing the Mellea codebase architecture and public data models.
Status
- Schema version: commit df6d0fd
- Coverage target: core library (
mellea/) and CLI (cli/) - Generation: auto-derived from live Mellea Python sources by
linkml/scripts/schema_to_linkml.py - Test fixtures: auto-derived from live sources by
linkml/scripts/gen_test_fixtures.py - 62 valid + 4 counter-examples under
linkml/tests/data/ - SSSOM overlay: schema-agnostic injector
linkml/scripts/apply_sssom_overlay.pymerges CURIE alignments fromsrc/mellea/mappings/*.sssom.tsvinto the generated YAML'sexact_mappings/close_mappings/broad_mappings/narrow_mappings/related_mappingsslots (classes, enums, types, slots, and per-permissible-value) - Validated by:
just lint(0 errors),just gen-project,just test(62 pytest fixtures pass;linkml-run-examplesaccepts valid set and rejects all counter-examples)
What the schema models
| Subset | Classes |
|---|---|
core_runtime |
BackendSpec, FormatterSpec, ContextSpec, SessionSpec, ComponentSpec, RequirementSpec, SamplingStrategySpec, MethodSpec |
interface_surface |
RepositoryCatalog, CliCommandSpec, ApiModelSpec, ApiFieldSpec, ModelIdentifierSpec, IntrinsicAdapterSpec |
observability |
PluginSpec, HookPayloadSpec, TelemetryMetricSpec |
Source-derived elements
These schema elements are re-extracted from Python sources on every regeneration - no hand-editing required when upstream changes:
| Schema element | Source of truth |
|---|---|
PluginModeEnum |
mellea/plugins/types.py::PluginMode |
HookTypeEnum |
mellea/plugins/types.py::HookType |
AdapterTypeEnum |
mellea/backends/adapters/catalog.py::AdapterType |
BackendFamilyEnum |
mellea/backends/*.py filenames |
coverage_inventory |
AST scan of mellea/ + cli/ (per-package counts) |
Project automation: just commands
All schema and artifact generation, validation, and testing is managed via just recipes. Run these from the linkml/ workspace:
# Regenerate the LinkML schema YAML from live Mellea Python sources
just gen-linkml
# Regenerate test fixtures from live sources
just gen-fixtures
# Apply SSSOM mappings to the generated schema YAML
just apply-sssom-overlay
# Convenience: refresh schema, mappings, and fixtures
just regen-all
# Full pipeline: regenerate, validate, and test
just regen-and-test
# Generate project files (Python, Java, TypeScript, OWL)
just gen-project
# Generated extended/experimental project files (optional)
just gen-project-extended
# Generate documentation
just gen-doc
# Run all tests
just test
# Lint the schema
just lint
# Build docs and run test server
just testdoc
# Install project dependencies
just install
# Update project template and LinkML packages
just update
# Clean all generated files
just clean
See the project.justfile and justfile for full details, including advanced recipes for migrations, deployment, and internal maintenance.
The overlay is idempotent: re-running on a clean tree produces no further changes. The subject-side CURIE prefix is taken from each schema's own default_prefix, so the same script is reusable for downstream LinkML schemas without modification (override with --subject-prefix if needed).
See linkml/scripts/README.md for generator details and the process for adding new auto-derived enums.
Coverage inventory (as of commit df6d0fd)
| Package | Total | Breakdown |
|---|---|---|
cli |
86 | CLASS=42, DATACLASS=5, ENUM=4, PYDANTIC_MODEL=20, TYPED_DICT=15 |
mellea/backends |
28 | CLASS=22, DATACLASS=2, ENUM=1, MIXIN=1, PYDANTIC_MODEL=2 |
mellea/core |
27 | CLASS=20, DATACLASS=5, ENUM=1, PROTOCOL=1 |
mellea/formatters |
57 | CLASS=39, ENUM=1, MIXIN=1, PYDANTIC_MODEL=16 |
mellea/helpers |
6 | CLASS=2, ENUM=1, PYDANTIC_MODEL=1, TYPED_DICT=2 |
mellea/plugins |
32 | CLASS=28, DATACLASS=2, ENUM=2 |
mellea/stdlib |
69 | CLASS=50, DATACLASS=10, ENUM=1, PROTOCOL=2, PYDANTIC_MODEL=4, TYPED_DICT=2 |
mellea/telemetry |
11 | CLASS=11 |
| Total enums | 11 | (across all packages) |
Inventory counts are regenerated automatically and stored as the
coverage_inventory annotation on the schema header.
Validation fixtures
| Fixture pattern | Count | Source |
|---|---|---|
BackendSpec-<family>.yaml |
7 | mellea/backends/*.py |
IntrinsicAdapterSpec-<name>.yaml |
13 | mellea/backends/adapters/catalog.py |
ModelIdentifierSpec-<const>.yaml |
42 | mellea/backends/model_ids.py |
| Counter-examples (invalid) | 4 | hand-curated |
All valid fixtures load through linkml_runtime.yaml_loader; all
counter-examples are rejected by linkml-run-examples.
Cross-schema mappings
Curated SSSOM/TSV alignments to seven downstream schemas live under
linkml/src/mellea/mappings/.
These files contain artificially-curated (semapv:LLMBasedMatching) mappings, parse cleanly with pypi sssom, use real Mellea CURIEs validated against linkml/src/mellea/schema/mellea.yaml, and carry per-row rationale (with name-collision notes such as RequirementSpec vs nexus:Requirement) in the comment column.
The flagship alignments are the densest and target schemas with genuine domain overlap with the Mellea runtime: the ai-atlas-nexus AI Risk Ontology, the Model Context Protocol, SPDX (AI-package / SBOM), and gist_linkml (Semantic Arts upper ontology, LinkML port - anchors mellea's architectural classes against general-purpose top-level concepts).
The remaining three (ISO 27001, MITRE ATT&CK, FINOS CDM event-position) are provided as smaller, honest cross-domain alignments centred on observability, audit, and structural analogues.
gist prefix note: the gist mapping uses the local prefix
gist_linkml:(https://w3id.org/lmodel/gist/) to avoid collision with the upstream Semantic Arts namespace (gist_upstream:https://w3id.org/semanticarts/ns/ontology/gist/).
| Mapping set | exact | close | broad | narrow | related | Total | Strongest alignment |
|---|---|---|---|---|---|---|---|
mellea-to-ai-atlas-nexus |
1 | 9 | – | 1 | 25 | 36 | AdapterTypeEnum <-> nexus:AdapterType (exact) |
mellea-to-mcp |
– | 8 | – | – | 12 | 20 | ComponentCategoryEnum.MESSAGE <-> mcp:PromptMessage |
mellea-to-gist |
– | 12 | 1 | – | 7 | 20 | RequirementSpec <-> gist_linkml:Requirement (closeMatch) |
mellea-to-spdx |
– | 6 | – | – | 8 | 14 | ModelIdentifierSpec / IntrinsicAdapterSpec <-> spdx:AIPackage |
mellea-to-iso27001 |
– | 2 | – | – | 10 | 12 | TelemetryMetricSpec <-> iso27001:MonitoringItem |
mellea-to-attack |
– | – | – | – | 9 | 9 | TelemetryMetricSpec <-> attack:DataSource |
mellea-to-cdm_event_position |
– | – | – | – | 6 | 6 | RepositoryCatalog <-> common_domain_model:Portfolio (structural) |
| Total | 1 | 37 | 1 | 1 | 77 | 117 |
Coverage reflects genuine domain overlap, not row inflation. Mellea models an AI runtime / code architecture, so:
- ai-atlas-nexus (AI governance ontology) aligns on adapter type, provider, model, intrinsic, component, requirement, and lifecycle hooks.
- MCP (AI runtime protocol) aligns cleanly on stdlib component
categories -
INSTRUCTION<->Prompt,MESSAGE<->PromptMessage/SamplingMessage,TOOL_MESSAGE<->ToolUseContent/ToolResultContent,DOCUMENT<->Resource. - gist_linkml (Semantic Arts upper ontology) aligns mellea's architectural classes against general-purpose top-level concepts -
RequirementSpec<->Requirement,ComponentSpec<->Component,ModelIdentifierSpec<->ID,RepositoryCatalog<->Collection,PythonPackage<->Composite,CliCommandSpec<->TaskTemplate,BackendSpec/FormatterSpec/ContextSpec/ApiModelSpec<->Specification. Validated againstgist_linkml(gist 14.1.0). - SPDX 3 aligns on AI model provenance via
AIPackage, on repository packaging viaSbom/Package, and on plugins viaExtension. - ISO 27001 aligns on observability/audit primitives -
TelemetryMetricSpec<->MonitoringItem(closeMatch),PluginModeEnum.AUDIT<->InternalAudit. - MITRE ATT&CK aligns weakly on detection/observability -
TelemetryMetricSpec<->DataSource,HookPayloadSpec<->DataComponent. - FINOS CDM event-position has essentially no domain overlap; only structural aggregation analogues at low confidence are recorded.
Object CURIEs are validated against each target schema under
src/mellea/mappings/<target>/docs/schema/. Run just apply-sssom-overlay after editing any TSV: the schema-agnostic
apply_sssom_overlay.py script projects every row into the generated schema YAML as native LinkML mapping slots, registers each target prefix from the TSV's curie_map, and lands permissible-value subjects (<prefix>:EnumName.PV_NAME, e.g. mellea:BackendFamilyEnum.HUGGINGFACE) on the matching permissible value rather than enum body.
Continuous integration
Two GitHub Actions workflows operate against the linkml/ subdirectory (both set defaults.run.working-directory: linkml and cache against linkml/uv.lock):
| Workflow | Triggers | Job |
|---|---|---|
.github/workflows/linkml-main.yaml |
push: [main], pull_request |
just test on the 3.11–3.14 Python matrix |
.github/workflows/linkml-deploy-docs.yaml |
push: [main], workflow_dispatch |
just gen-doc + mkdocs gh-deploy |
The deploy workflow pushes to the gh-pages branch via git, so only
contents: write is required - Pages / OIDC permissions are unused.
Known gaps
- Stylistic
linkml-lintwarnings (naming conventions on permissible values, missing per-slot descriptions) - non-blocking; addressable in a follow-up. mellea/templates/(Jinja templates) is intentionally excluded - no Python declarations to capture.- Per-component instance data (concrete
BackendSpec/ComponentSpecrecords) is not yet emitted; the schema currently defines the shape of such instances. A follow-up generator pass can populate them. - Generator output under
linkml/project/(gen-sqla, gen-pandera, gen-namespaces, …) is excluded from ruff viaforce-exclude+extend-excludein the rootpyproject.toml[tool.ruff]block. In-place fixes are pointless becausejust gen-projectregenerates and clobbers them - upstream linkml templates are the source of the style noise.