Skip to content

About mellea

LinkML schema describing the Mellea codebase architecture and public data models.

Status

  • Schema version: commit df6d0fd
  • Coverage target: core library (mellea/) and CLI (cli/)
  • Generation: auto-derived from live Mellea Python sources by linkml/scripts/schema_to_linkml.py
  • Test fixtures: auto-derived from live sources by linkml/scripts/gen_test_fixtures.py
  • 62 valid + 4 counter-examples under linkml/tests/data/
  • SSSOM overlay: schema-agnostic injector linkml/scripts/apply_sssom_overlay.py merges CURIE alignments from src/mellea/mappings/*.sssom.tsv into the generated YAML's exact_mappings / close_mappings / broad_mappings / narrow_mappings / related_mappings slots (classes, enums, types, slots, and per-permissible-value)
  • Validated by: just lint (0 errors), just gen-project, just test (62 pytest fixtures pass; linkml-run-examples accepts valid set and rejects all counter-examples)

What the schema models

Subset Classes
core_runtime BackendSpec, FormatterSpec, ContextSpec, SessionSpec, ComponentSpec, RequirementSpec, SamplingStrategySpec, MethodSpec
interface_surface RepositoryCatalog, CliCommandSpec, ApiModelSpec, ApiFieldSpec, ModelIdentifierSpec, IntrinsicAdapterSpec
observability PluginSpec, HookPayloadSpec, TelemetryMetricSpec

Source-derived elements

These schema elements are re-extracted from Python sources on every regeneration - no hand-editing required when upstream changes:

Schema element Source of truth
PluginModeEnum mellea/plugins/types.py::PluginMode
HookTypeEnum mellea/plugins/types.py::HookType
AdapterTypeEnum mellea/backends/adapters/catalog.py::AdapterType
BackendFamilyEnum mellea/backends/*.py filenames
coverage_inventory AST scan of mellea/ + cli/ (per-package counts)

Project automation: just commands

All schema and artifact generation, validation, and testing is managed via just recipes. Run these from the linkml/ workspace:

# Regenerate the LinkML schema YAML from live Mellea Python sources
just gen-linkml

# Regenerate test fixtures from live sources
just gen-fixtures

# Apply SSSOM mappings to the generated schema YAML
just apply-sssom-overlay

# Convenience: refresh schema, mappings, and fixtures
just regen-all

# Full pipeline: regenerate, validate, and test
just regen-and-test

# Generate project files (Python, Java, TypeScript, OWL)
just gen-project

# Generated extended/experimental project files (optional)
just gen-project-extended

# Generate documentation
just gen-doc

# Run all tests
just test

# Lint the schema
just lint

# Build docs and run test server
just testdoc

# Install project dependencies
just install

# Update project template and LinkML packages
just update

# Clean all generated files
just clean

See the project.justfile and justfile for full details, including advanced recipes for migrations, deployment, and internal maintenance.

The overlay is idempotent: re-running on a clean tree produces no further changes. The subject-side CURIE prefix is taken from each schema's own default_prefix, so the same script is reusable for downstream LinkML schemas without modification (override with --subject-prefix if needed).

See linkml/scripts/README.md for generator details and the process for adding new auto-derived enums.

Coverage inventory (as of commit df6d0fd)

Package Total Breakdown
cli 86 CLASS=42, DATACLASS=5, ENUM=4, PYDANTIC_MODEL=20, TYPED_DICT=15
mellea/backends 28 CLASS=22, DATACLASS=2, ENUM=1, MIXIN=1, PYDANTIC_MODEL=2
mellea/core 27 CLASS=20, DATACLASS=5, ENUM=1, PROTOCOL=1
mellea/formatters 57 CLASS=39, ENUM=1, MIXIN=1, PYDANTIC_MODEL=16
mellea/helpers 6 CLASS=2, ENUM=1, PYDANTIC_MODEL=1, TYPED_DICT=2
mellea/plugins 32 CLASS=28, DATACLASS=2, ENUM=2
mellea/stdlib 69 CLASS=50, DATACLASS=10, ENUM=1, PROTOCOL=2, PYDANTIC_MODEL=4, TYPED_DICT=2
mellea/telemetry 11 CLASS=11
Total enums 11 (across all packages)

Inventory counts are regenerated automatically and stored as the coverage_inventory annotation on the schema header.

Validation fixtures

Fixture pattern Count Source
BackendSpec-<family>.yaml 7 mellea/backends/*.py
IntrinsicAdapterSpec-<name>.yaml 13 mellea/backends/adapters/catalog.py
ModelIdentifierSpec-<const>.yaml 42 mellea/backends/model_ids.py
Counter-examples (invalid) 4 hand-curated

All valid fixtures load through linkml_runtime.yaml_loader; all counter-examples are rejected by linkml-run-examples.

Cross-schema mappings

Curated SSSOM/TSV alignments to seven downstream schemas live under linkml/src/mellea/mappings/.

These files contain artificially-curated (semapv:LLMBasedMatching) mappings, parse cleanly with pypi sssom, use real Mellea CURIEs validated against linkml/src/mellea/schema/mellea.yaml, and carry per-row rationale (with name-collision notes such as RequirementSpec vs nexus:Requirement) in the comment column.

The flagship alignments are the densest and target schemas with genuine domain overlap with the Mellea runtime: the ai-atlas-nexus AI Risk Ontology, the Model Context Protocol, SPDX (AI-package / SBOM), and gist_linkml (Semantic Arts upper ontology, LinkML port - anchors mellea's architectural classes against general-purpose top-level concepts).

The remaining three (ISO 27001, MITRE ATT&CK, FINOS CDM event-position) are provided as smaller, honest cross-domain alignments centred on observability, audit, and structural analogues.

gist prefix note: the gist mapping uses the local prefix gist_linkml: (https://w3id.org/lmodel/gist/) to avoid collision with the upstream Semantic Arts namespace (gist_upstream: https://w3id.org/semanticarts/ns/ontology/gist/).

Mapping set exact close broad narrow related Total Strongest alignment
mellea-to-ai-atlas-nexus 1 9 1 25 36 AdapterTypeEnum <-> nexus:AdapterType (exact)
mellea-to-mcp 8 12 20 ComponentCategoryEnum.MESSAGE <-> mcp:PromptMessage
mellea-to-gist 12 1 7 20 RequirementSpec <-> gist_linkml:Requirement (closeMatch)
mellea-to-spdx 6 8 14 ModelIdentifierSpec / IntrinsicAdapterSpec <-> spdx:AIPackage
mellea-to-iso27001 2 10 12 TelemetryMetricSpec <-> iso27001:MonitoringItem
mellea-to-attack 9 9 TelemetryMetricSpec <-> attack:DataSource
mellea-to-cdm_event_position 6 6 RepositoryCatalog <-> common_domain_model:Portfolio (structural)
Total 1 37 1 1 77 117

Coverage reflects genuine domain overlap, not row inflation. Mellea models an AI runtime / code architecture, so:

  • ai-atlas-nexus (AI governance ontology) aligns on adapter type, provider, model, intrinsic, component, requirement, and lifecycle hooks.
  • MCP (AI runtime protocol) aligns cleanly on stdlib component categories - INSTRUCTION <-> Prompt, MESSAGE <-> PromptMessage / SamplingMessage, TOOL_MESSAGE <-> ToolUseContent / ToolResultContent, DOCUMENT <-> Resource.
  • gist_linkml (Semantic Arts upper ontology) aligns mellea's architectural classes against general-purpose top-level concepts - RequirementSpec <-> Requirement, ComponentSpec <-> Component, ModelIdentifierSpec <-> ID, RepositoryCatalog <-> Collection, PythonPackage <-> Composite, CliCommandSpec <-> TaskTemplate, BackendSpec / FormatterSpec / ContextSpec / ApiModelSpec <-> Specification. Validated against gist_linkml (gist 14.1.0).
  • SPDX 3 aligns on AI model provenance via AIPackage, on repository packaging via Sbom / Package, and on plugins via Extension.
  • ISO 27001 aligns on observability/audit primitives - TelemetryMetricSpec <-> MonitoringItem (closeMatch), PluginModeEnum.AUDIT <-> InternalAudit.
  • MITRE ATT&CK aligns weakly on detection/observability - TelemetryMetricSpec <-> DataSource, HookPayloadSpec <-> DataComponent.
  • FINOS CDM event-position has essentially no domain overlap; only structural aggregation analogues at low confidence are recorded.

Object CURIEs are validated against each target schema under src/mellea/mappings/<target>/docs/schema/. Run just apply-sssom-overlay after editing any TSV: the schema-agnostic apply_sssom_overlay.py script projects every row into the generated schema YAML as native LinkML mapping slots, registers each target prefix from the TSV's curie_map, and lands permissible-value subjects (<prefix>:EnumName.PV_NAME, e.g. mellea:BackendFamilyEnum.HUGGINGFACE) on the matching permissible value rather than enum body.

Continuous integration

Two GitHub Actions workflows operate against the linkml/ subdirectory (both set defaults.run.working-directory: linkml and cache against linkml/uv.lock):

Workflow Triggers Job
.github/workflows/linkml-main.yaml push: [main], pull_request just test on the 3.11–3.14 Python matrix
.github/workflows/linkml-deploy-docs.yaml push: [main], workflow_dispatch just gen-doc + mkdocs gh-deploy

The deploy workflow pushes to the gh-pages branch via git, so only contents: write is required - Pages / OIDC permissions are unused.

Known gaps

  • Stylistic linkml-lint warnings (naming conventions on permissible values, missing per-slot descriptions) - non-blocking; addressable in a follow-up.
  • mellea/templates/ (Jinja templates) is intentionally excluded - no Python declarations to capture.
  • Per-component instance data (concrete BackendSpec / ComponentSpec records) is not yet emitted; the schema currently defines the shape of such instances. A follow-up generator pass can populate them.
  • Generator output under linkml/project/ (gen-sqla, gen-pandera, gen-namespaces, …) is excluded from ruff via force-exclude + extend-exclude in the root pyproject.toml [tool.ruff] block. In-place fixes are pointless because just gen-project regenerates and clobbers them - upstream linkml templates are the source of the style noise.