Kili Technology
The data-centric AI platform for high-quality training data and model evaluation.

The open-source standard for data lineage, metadata collection, and job observability.
Marquez is a highly scalable metadata server and visualization platform designed to aggregate, store, and visualize metadata about data production and consumption. Built as the reference implementation for the OpenLineage standard, Marquez provides a robust technical architecture for maintaining a complete history of dataset evolution and job execution. Its core architecture utilizes a relational backend (PostgreSQL) and exposes a comprehensive RESTful API for metadata ingestion and retrieval. By 2026, Marquez has solidified its position as the foundational layer for decentralized data mesh architectures, enabling data engineers to automate impact analysis and root cause identification across polyglot data stacks. It tracks job runs, versioning of both code and data schemas, and the physical location of datasets. Its design philosophy centers on late-binding metadata, allowing it to integrate seamlessly with various orchestrators like Apache Airflow and execution engines like Spark. As an LF AI & Data project, it benefits from a neutral governance model, ensuring its longevity and interoperability in the evolving AI and Data lifecycle management market.
Native support for the OpenLineage spec, ensuring consistent metadata collection across Spark, Airflow, and Flink.
The data-centric AI platform for high-quality training data and model evaluation.
The semantic knowledge fabric for high-velocity enterprise intelligence.
Transform complex database schemas into actionable natural language insights with autonomous SQL synthesis.
The industry's first AI-powered, end-to-end data management platform for multi-cloud environments.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Tracks both when a change happened in the source system and when it was recorded in Marquez.
Detects and records changes in dataset schemas across every job run.
A React-based UI that allows users to traverse complex dependency trees and zoom into specific job nodes.
Allows attaching custom facets (JSON metadata) to job runs, such as data quality scores or resource usage.
Connects job and dataset nodes across different organizational boundaries and namespaces.
Dual-API approach for both high-throughput ingestion and complex, nested metadata queries.
Engineers spend hours manually tracing logs to find why a production table is empty.
Registry Updated:2/7/2026
Regulatory requirements demand proof of where PII data originates and flows.
Changing a column type in a core database breaks 50 downstream dashboards.