Home Tasks News Blog Stacks FAQ

findAIList

The intelligent platform for discovering, comparing, and deploying AI capabilities. Built for the next generation of builders.

Platform

Capabilities
News
Stacks
Compare
Pricing

Company

About
Blog
Careers
Contact

Contribute

Promote Tool
Edit Tool
Request Tool

Stay Synchronized

Get the latest AI capabilities in your inbox.

© 2026 findAIList. All rights reserved.

Privacy Policy Terms of Service Refund Policy

Kedro | findAIList | findAIList

findAIList/Tools/Kedro

ACTIVE

Kedro

Open Source

The open-source Python framework for reproducible, maintainable, and modular data science code.

Capabilities: Data Pipeline Orchestration ETL Development Machine Learning Engineering Data Cataloging Software Engineering for Data Science

9.5

Protocol Reliability Score

Overview

Kedro is an open-source Python framework designed to help data scientists and engineers create production-ready data pipelines. Originally developed by McKinsey's QuantumBlack and now a part of the LF AI & Data Foundation, Kedro addresses the 'notebook-to-production' gap by enforcing software engineering best practices—such as modularity, separation of concerns, and versioning—within the data science workflow. Its architecture centers around a 'Data Catalog' which abstracts data access, and a 'Pipeline' structure composed of 'Nodes' (pure Python functions). This decoupling allows teams to swap data sources or execution environments without rewriting core logic. In the 2026 market, Kedro remains the gold standard for enterprise-grade data orchestration where governance, auditability, and team collaboration are paramount. It integrates seamlessly with modern stack components like MLflow, Great Expectations, and Airflow, providing a standardized project structure (based on Cookiecutter) that enables rapid onboarding and scale-out capabilities across distributed computing environments like Apache Spark or Dask.

Advanced Technology

Data Catalog Abstraction

Separates the code logic from data storage details using a YAML-based registry for datasets.

Alternative Tools

View All Alternatives Discovery Engine

Verified Specs85.0K

Kirby (by Kadoa)

The autonomous AI web agent for reliable, structured data extraction at scale.

Automated Data ExtractionCompetitor Price Monitoring

From $49/moFreemium

Verified Specs12.0M

Kaggle Notebooks

Data Science Platform

The premier community-driven cloud environment for high-performance data science and machine learning.

Model TrainingExploratory Data Analysis

Verified Specs1.2M

Apache Airflow

Data Orchestration

The open-source gold standard for programmatic workflow orchestration and complex data pipelines.

ETL/ELT Data Pipeline OrchestrationMachine Learning Model Training Workflows

View PricingOpen Source

Verified Specs50.0K

Apache Sqoop

Data Engineering

Efficient bulk data transfer between Apache Hadoop and structured datastores.

Bulk data ingestion from RDBMS to HDFSExporting HDFS data back to relational databases

View PricingOpen Source

Reviews & Ratings

Verified feedback from the global deployment network.

No reviews yet

Write a Review

Your Name *

Your Rating *

Review Title (Optional)

Your Review (Optional)

0/500

Feedback & Queries

Post queries, share implementation strategies, and help other users.

User Comments

Kedro-Viz

An interactive tool for visualizing the data pipeline, showing the relationship between nodes and datasets.

Transcoding

The ability to save and load the same dataset in multiple formats or using different libraries automatically.

Hooks and Plugins

An extensible system allowing users to inject custom behavior into the Kedro lifecycle (e.g., after a node runs).

Modular Pipelines

Namespace-isolated pipelines that can be reused across different projects or within different parts of the same project.

Micro-packaging

Capability to package specific pipelines as standalone Python packages for distribution.

Deployment Starters

Templates for deploying Kedro pipelines to Airflow, Kubeflow, or AWS Batch.

Specifications

Enterprise Readiness

SSO (Single Sign-On)
GDPR (Framework level)
HIPAA (Framework level)
Data Sovereignty
Cloud-Native Architecture

Protocol Interface

csvparquetsqljsonyamlexcelspark.DataFramejsonparquetcsv

Native Integrations:

Pros & Cons

Advantages

Enforces clean code and reproducibility
Exceptional visualization of data flows
Agnostic to data storage and compute
Standardizes project structure across large organizations

Limitations

Significant learning curve for pure data scientists
Can feel like 'over-engineering' for very simple tasks
Setup overhead for small, one-off projects

Strategic Edge

"Unique market positioning verified."

Setup Guide

Follow the official protocol for initialization.

Pricing Matrix

LIVE

Open Source0

Knowledge Hub

Is Kedro an orchestrator like Airflow?

No, Kedro is a development framework. It helps you build pipelines, while Airflow is often used to schedule and run those pipelines in production.

Can I use Kedro with Jupyter Notebooks?

Yes, Kedro has a 'kedro jupyter' command that allows you to access the Kedro Catalog and project context within a notebook environment.

Does Kedro support big data?

Yes, through its Spark and Dask integrations, Kedro can handle massive datasets while maintaining its modular framework.

Who maintains Kedro?

It is an open-source project under the LF AI & Data Foundation, with significant contributions from QuantumBlack, AI by McKinsey.

What is the Data Catalog?

The Data Catalog is a core Kedro concept that maps logical dataset names to physical storage locations and file formats via a YAML configuration.

Execution Protocols

Productionizing a Data Science Experiment
Moving messy, non-linear Jupyter notebooks into a robust, repeatable production environment.
View Execution Protocol
01
Extract notebook logic into discrete Python functions.
02
Define inputs and outputs in the Kedro Catalog.
03
Wrap functions in Kedro Nodes.
04
Connect nodes in a Kedro Pipeline.

Deployment Health

STABLE

Monthly Visits120000

Global RankN/A

Bounce Rate35.5%

Registry Updated:2/7/2026

Capability Sectors

Data Pipelines Python Framework Machine Learning Reproducibility Quantumblack

05

Run 'kedro run' to execute and validate the full flow.

Hybrid Cloud Data Migration

Operating pipelines that need to transition data from on-premise SQL servers to cloud-native storage like Azure Data Lake.

View Execution Protocol

01

Configure on-prem SQL connections in 'catalog.yml'.

02

Configure Azure Blob storage in 'catalog.yml' for the same dataset identifiers.

03

Develop transformation nodes locally.

04

Swap configuration profiles to point to cloud targets.

05

Execute 'kedro run' to push processed data to the cloud.

Regulatory Compliance in Banking

Providing full data lineage and audit trails for credit risk models.

View Execution Protocol

01

Use Kedro-Viz to document the end-to-end data flow.

02

Implement Data Catalog versioning for every pipeline run.

03

Utilize Kedro Hooks to log metadata and checksums to a central audit DB.

04

Export the visualized DAG as a compliance report.