Kedro's command line interface¶

Kedro's command line interface (CLI) is used to give commands to Kedro via a terminal shell (such as the terminal app on macOS, or cmd.exe or PowerShell on Windows). You need to use the CLI to set up a new Kedro project, and to run it.

Autocompletion (optional)¶

If you are using macOS or Linux, you can set up your shell to autocomplete kedro commands. If you don't know the type of shell you are using, first type the following:

echo $0

If you are using BashIf you are using Z shell (ZSh)If you are using Fish

Add the following to your ~/.bashrc (or just run it on the command line):

eval "$(_KEDRO_COMPLETE=bash_source kedro)"

Add the following to ~/.zshrc:

eval "$(_KEDRO_COMPLETE=zsh_source kedro)"

Add the following to ~/.config/fish/completions/foo-bar.fish:

eval (env _KEDRO_COMPLETE=fish_source kedro)

Invoke Kedro CLI from Python (optional)¶

You can invoke the Kedro CLI as a Python module:

python -m kedro

Kedro commands¶

Here is a list of Kedro CLI commands, as a shortcut to the descriptions below. Project-specific commands are called from within a project directory and apply to that particular project. Global commands can be run anywhere and don't apply to any particular project:

Global Kedro commands¶

kedro¶

Usage:

kedro [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro new¶

Create a new kedro project.

Usage:

kedro new [OPTIONS]

Options:

  -v, --verbose                  See extensive logging and error stack traces.
  -c, --config PATH              Non-interactive mode, using a configuration
                                 yaml file. This file must supply  the keys
                                 required by the template's prompts.yml. When
                                 not using a starter, these are
                                 `project_name`, `repo_name` and
                                 `python_package`.
  -s, --starter TEXT             Specify the starter template to use when
                                 creating the project. This can be the path to
                                 a local directory, a URL to a remote VCS
                                 repository supported by `cookiecutter` or one
                                 of the aliases listed in ``kedro starter
                                 list``.
  --checkout TEXT                An optional tag, branch or commit to checkout
                                 in the starter repository.
  --directory TEXT               An optional directory inside the repository
                                 where the starter resides.
  -n, --name TEXT                The name of your new Kedro project.
  -t, --tools TEXT               Select which tools you'd like to include. By
                                 default, none are included.

                                 Tools

                                 1) Linting: Provides a basic linting setup
                                 with Ruff

                                 2) Testing: Provides basic testing setup with
                                 pytest

                                 3) Custom Logging: Provides more logging
                                 options

                                 4) Documentation: Basic documentation setup
                                 with Sphinx

                                 5) Data Structure: Provides a directory
                                 structure for storing data

                                 6) PySpark: Provides set up configuration for
                                 working with PySpark

                                 Example usage:

                                 kedro new
                                 --tools=lint,test,log,docs,data,pyspark (or
                                 any subset of these options)

                                 kedro new --tools=all

                                 kedro new --tools=none

                                 For more information on using tools, see http
                                 s://docs.kedro.org/en/stable/starters/new_pro
                                 ject_tools.html
  -e, --example TEXT             Enter y to enable, n to disable the example
                                 pipeline.
  -tc, --telemetry [yes|no|y|n]  Allow or not allow Kedro to collect usage
                                 analytics. We cannot see nor store
                                 information contained into a Kedro project.
                                 Opt in with "yes" and out with "no".
  -h, --help                     Show this message and exit.

kedro starter¶

Commands for working with project starters.

Usage:

kedro starter [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro starter list¶

List all official project starters available.

Usage:

kedro starter list [OPTIONS]

Options:

  -h, --help  Show this message and exit.

Customise or override project-specific Kedro commands¶

Note

All project related CLI commands should be run from the project’s root directory.

Kedro's command line interface (CLI) allows you to associate a set of commands and dependencies with a target, which you can then execute from inside the project directory.

The commands a project supports are specified on the framework side. If you want to customise any of the Kedro commands you can do this either by adding a file called cli.py or by injecting commands into it via the plugin framework. Find the template for the cli.py file below.

Project Kedro commands¶

kedro¶

Usage:

kedro [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro catalog¶

Commands for working with catalog.

Usage:

kedro catalog [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro catalog describe-datasets¶

Describe datasets used in the specified pipelines, grouped by type.

This command provides a structured overview of datasets used in the selected pipelines, categorizing them into three groups:

datasets: Datasets explicitly defined in the catalog.
factories: Datasets resolved from dataset factory patterns.
defaults: Datasets that do not match any pattern or explicit definition.

Usage:

kedro catalog describe-datasets [OPTIONS]

Options:

  -e, --env TEXT       Kedro configuration environment name. Defaults to
                       `local`.
  -p, --pipeline TEXT  Name of the modular pipeline to run. If not set, the
                       project pipeline is run by default.
  -h, --help           Show this message and exit.

kedro catalog list-patterns¶

List all dataset factory patterns in the catalog, ranked by priority.

This method retrieves all dataset factory patterns defined in the catalog, ordered by the priority in which they are matched.

Usage:

kedro catalog list-patterns [OPTIONS]

Options:

  -e, --env TEXT  Kedro configuration environment name. Defaults to `local`.
  -h, --help      Show this message and exit.

kedro catalog resolve-patterns¶

Resolve dataset factory patterns against pipeline datasets.

This method resolves dataset factory patterns for datasets used in the specified pipelines. It includes datasets explicitly defined in the catalog as well as those resolved from dataset factory patterns.

Usage:

kedro catalog resolve-patterns [OPTIONS]

Options:

  -e, --env TEXT       Kedro configuration environment name. Defaults to
                       `local`.
  -p, --pipeline TEXT  Name of the modular pipeline to run. If not set, the
                       project pipeline is run by default.
  -h, --help           Show this message and exit.

kedro ipython¶

Open IPython with project specific variables loaded.

Usage:

kedro ipython [OPTIONS] [ARGS]...

Options:

  -v, --verbose   See extensive logging and error stack traces.
  -e, --env TEXT  Kedro configuration environment name. Defaults to `local`.

kedro jupyter¶

Open Jupyter Notebook / Lab with project specific variables loaded.

Usage:

kedro jupyter [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro jupyter lab¶

Open Jupyter Lab with project specific variables loaded.

Usage:

kedro jupyter lab [OPTIONS] [ARGS]...

Options:

  -v, --verbose   See extensive logging and error stack traces.
  -e, --env TEXT  Kedro configuration environment name. Defaults to `local`.

kedro jupyter notebook¶

Open Jupyter Notebook with project specific variables loaded.

Usage:

kedro jupyter notebook [OPTIONS] [ARGS]...

Options:

  -v, --verbose   See extensive logging and error stack traces.
  -e, --env TEXT  Kedro configuration environment name. Defaults to `local`.

kedro jupyter setup¶

Initialise the Jupyter Kernel for a kedro project.

Usage:

kedro jupyter setup [OPTIONS] [ARGS]...

Options:

  -v, --verbose  See extensive logging and error stack traces.

kedro package¶

Package the project as a Python wheel.

Usage:

kedro package [OPTIONS]

Options:

  -h, --help  Show this message and exit.

kedro pipeline¶

Commands for working with pipelines.

Usage:

kedro pipeline [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro pipeline create¶

Create a new modular pipeline by providing a name.

Usage:

kedro pipeline create [OPTIONS] NAME

Options:

  -v, --verbose             See extensive logging and error stack traces.
  --skip-config             Skip creation of config files for the new
                            pipeline(s).
  -t, --template DIRECTORY  Path to cookiecutter template to use for
                            pipeline(s). Will override any local templates.
  -e, --env TEXT            Environment to create pipeline configuration in.
                            Defaults to `base`.
  -h, --help                Show this message and exit.

kedro pipeline delete¶

Delete a modular pipeline by providing a name.

Usage:

kedro pipeline delete [OPTIONS] NAME

Options:

  -v, --verbose   See extensive logging and error stack traces.
  -e, --env TEXT  Environment to delete pipeline configuration from. Defaults
                  to 'base'.
  -y, --yes       Confirm deletion of pipeline non-interactively.
  -h, --help      Show this message and exit.

kedro registry¶

Commands for working with registered pipelines.

Usage:

kedro registry [OPTIONS] COMMAND [ARGS]...

Options:

  -h, --help  Show this message and exit.

kedro registry describe¶

Describe a registered pipeline by providing a pipeline name. Defaults to the __default__ pipeline.

Usage:

kedro registry describe [OPTIONS] [NAME]

Options:

  -v, --verbose  See extensive logging and error stack traces.
  -h, --help     Show this message and exit.

kedro registry list¶

List all pipelines defined in your pipeline_registry.py file.

Usage:

kedro registry list [OPTIONS]

Options:

  -h, --help  Show this message and exit.

kedro run¶

Run the pipeline.

Usage:

kedro run [OPTIONS]

Options:

  --from-inputs TEXT         A list of dataset names which should be used as a
                             starting point.
  --to-outputs TEXT          A list of dataset names which should be used as
                             an end point.
  --from-nodes TEXT          A list of node names which should be used as a
                             starting point.
  --to-nodes TEXT            A list of node names which should be used as an
                             end point.
  -n, --nodes TEXT           Run only nodes with specified names.
  -r, --runner TEXT          Specify a runner that you want to run the
                             pipeline with. Available runners:
                             'SequentialRunner', 'ParallelRunner' and
                             'ThreadRunner'.
  --async                    Load and save node inputs and outputs
                             asynchronously with threads. If not specified,
                             load and save datasets synchronously.
  -e, --env TEXT             Kedro configuration environment name. Defaults to
                             `local`.
  -t, --tags TEXT            Construct the pipeline using only nodes which
                             have this tag attached. Option can be used
                             multiple times, what results in a pipeline
                             constructed from nodes having any of those tags.
  -lv, --load-versions TEXT  Specify a particular dataset version (timestamp)
                             for loading.
  -p, --pipeline TEXT        Name of the registered pipeline to run. If not
                             set, the '__default__' pipeline is run.
  -ns, --namespaces TEXT     Run only node namespaces with specified names.
  -c, --config FILE          Specify a YAML configuration file to load the run
                             command arguments from. If command line arguments
                             are provided, they will override the loaded ones.
  --conf-source TEXT         Path of a directory where project configuration
                             is stored.
  --params TEXT              Specify extra parameters that you want to pass to
                             the context initialiser. Items must be separated
                             by comma, keys - by colon or equals sign,
                             example: param1=value1,param2=value2. Each
                             parameter is split by the first comma, so
                             parameter values are allowed to contain colons,
                             parameter keys are not. To pass a nested
                             dictionary as parameter, separate keys by '.',
                             example: param_group.param1:value1.
  --only-missing-outputs     Run only nodes with missing outputs. If all
                             outputs of a node exist and are persisted, skip
                             the node execution.
  -h, --help                 Show this message and exit.

Project setup¶

Install all package dependencies¶

The following runs pip to install all package dependencies specified in requirements.txt:

pip install -r requirements.txt

For further information, see the documentation on installing project-specific dependencies.

Run the project¶

Call the run() method of the KedroSession defined in kedro.framework.session.

kedro run

KedroContext can be extended in run.py (src/<package_name>/run.py). In order to use the extended KedroContext, you need to set context_path in the pyproject.toml configuration file.

Modifying a `kedro run`¶

Kedro has options to modify pipeline runs. Below is a list of CLI arguments supported out of the box. Note that the names inside angular brackets (<>) are placeholders, and you should replace these values with the the names of relevant nodes, datasets, envs, etc. in your project.

CLI command	Description
`kedro run --from-inputs=<dataset_name1>,<dataset_name2>`	A list of dataset names which should be used as a starting point
`kedro run --to-outputs=<dataset_name1>,<dataset_name2>`	A list of dataset names which should be used as an end point
`kedro run --from-nodes=<node_name1>,<node_name2>`	A list of node names which should be used as a starting point
`kedro run --to-nodes=<node_name1>,<node_name1>`	A list of node names which should be used as an end point
`kedro run --nodes=<node_name1>,<node_name2>`	Run only nodes with specified names.
`kedro run --runner=<runner_name>`	Run the pipeline with a specific runner
`kedro run --async`	Load and save node inputs and outputs asynchronously with threads
`kedro run --env=<env_name>`	Run the pipeline in the env_name environment. Defaults to local if not provided
`kedro run --tags=<tag_name1>,<tag_name2>`	Run only nodes which have any of these tags attached.
`kedro run --load-versions=<dataset_name>:YYYY-MM-DDThh.mm.ss.sssZ`	Specify particular dataset versions (timestamp) for loading.
`kedro run --pipeline=<pipeline_name>`	Run the whole pipeline by its name
`kedro run --namespaces=<namespace>`	Run only nodes with the specified namespace
`kedro run --config=<config_file_name>.yml`	Specify all command line options in a named YAML configuration file
`kedro run --conf-source=<path_to_config_directory>`	Specify a new source directory for configuration files
`kedro run --conf-source=<path_to_compressed file>`	Only possible when using the `OmegaConfigLoader`. Specify a compressed config file in `zip` or `tar` format.
`kedro run --params=<param_key1>=<value1>,<param_key2>=<value2>`	Does a parametrised run with `{"param_key1": "value1", "param_key2": 2}`. These will take precedence over parameters defined in the `conf` directory. Additionally, dot (`.`) syntax can be used to address nested keys like `parent.child:value`
`kedro run --only-missing-outputs`	Run the nodes required to produce missing persistent outputs. If a node's persistent outputs already exist, the node and its upstream dependencies (if not needed for other missing outputs) will be skipped.

You can also combine these options together, so the following command runs all the nodes from split to predict and report:

kedro run --from-nodes=split --to-nodes=predict,report

This functionality is extended to the kedro run --config=config.yml command, which allows you to specify run commands in a configuration file.

A parameterised run is best used for dynamic parameters, i.e. running the same pipeline with different inputs, for static parameters that do not change we recommend following the Kedro project setup methodology.

Deploy the project¶

The following packages your application as one .whl file within the dist/ folder of your project. It packages the project configuration separately in a tar.gz file:

kedro package

See the Python documentation for further information about packaging.

Project quality¶

Project development¶

Modular pipelines¶

Create a new modular pipeline in your project¶

kedro pipeline create <pipeline_name>

Delete a modular pipeline¶

The following command deletes all the files related to a modular pipeline in your Kedro project.

kedro pipeline delete <pipeline_name>

Registered pipelines¶

Describe a registered pipeline¶

kedro registry describe <pipeline_name>

The output includes all the nodes in the pipeline. If no pipeline name is provided, this command returns all nodes in the __default__ pipeline.

List all registered pipelines in your project¶

kedro registry list

Data Catalog¶

Lists all datasets used in the specified pipelines¶

This command lists all datasets used in the specified pipeline(s), grouped by how they are defined.

datasets: Explicitly defined in catalog.yml
factories: Resolved using dataset factory patterns
defaults: Handled by user catch-all or default runtime patterns

kedro catalog describe-datasets

The command also accepts an optional --pipeline argument that allows you to specify the pipeline name(s) (comma-separated values) in order to filter datasets used only by those named pipeline(s). For example:

kedro catalog describe-datasets --pipeline=ds,de

Note

If no pipelines are specified, the __default__ pipeline is used.

Resolve dataset factories in the catalog¶

This command resolves datasets used in the pipeline against all dataset patterns, returning their full catalog configuration. It includes datasets explicitly defined in the catalog as well as those resolved from dataset factory patterns.

kedro catalog resolve-patterns

The command also accepts an optional --pipeline argument that allows you to specify the pipeline name(s) (comma-separated values).

kedro catalog resolve-patterns --pipeline=ds,de

Note

If no pipelines are specified, the __default__ pipeline is used.

List all dataset factory patterns defined in the catalog ordered by priority¶

kedro catalog list-patterns

The output includes a list of any dataset factories in the catalog, ranked by the priority on which they are matched against.

Notebooks¶

To start a Jupyter Notebook:

kedro jupyter notebook

To start JupyterLab:

kedro jupyter lab

To start an IPython shell:

kedro ipython

The Kedro IPython extension makes the following variables available in your IPython or Jupyter session:

catalog (type kedro.io.DataCatalog): Data Catalog instance that contains all defined datasets; this is a shortcut for context.catalog
context (type kedro.framework.context.KedroContext): Kedro project context that provides access to Kedro's library components
pipelines (type dict[str, Pipeline]): Pipelines defined in your pipeline registry
session (type kedro.framework.session.session.KedroSession): Kedro session that orchestrates a pipeline run

To reload these variables (e.g. if you updated catalog.yml) use the %reload_kedro line magic, which can also be used to see the error message if any of the variables above are undefined.

Kedro's command line interface¶

Autocompletion (optional)¶

Invoke Kedro CLI from Python (optional)¶

Kedro commands¶

Global Kedro commands¶

kedro¶

kedro new¶

kedro starter¶

kedro starter list¶

Customise or override project-specific Kedro commands¶

Project Kedro commands¶

kedro¶

kedro catalog¶

kedro catalog describe-datasets¶

kedro catalog list-patterns¶

kedro catalog resolve-patterns¶

kedro ipython¶

kedro jupyter¶

kedro jupyter lab¶

kedro jupyter notebook¶

kedro jupyter setup¶

kedro package¶

kedro pipeline¶

kedro pipeline create¶

kedro pipeline delete¶

kedro registry¶

kedro registry describe¶

kedro registry list¶

kedro run¶

Project setup¶

Install all package dependencies¶

Run the project¶

Modifying a kedro run¶

Deploy the project¶

Project quality¶

Project development¶

Modular pipelines¶

Create a new modular pipeline in your project¶

Delete a modular pipeline¶

Registered pipelines¶

Describe a registered pipeline¶

List all registered pipelines in your project¶

Data Catalog¶

Lists all datasets used in the specified pipelines¶

Resolve dataset factories in the catalog¶

List all dataset factory patterns defined in the catalog ordered by priority¶

Notebooks¶

Modifying a `kedro run`¶