AbstractVersionedDataset
kedro.io.AbstractVersionedDataset ¶
AbstractVersionedDataset(filepath, version, exists_function=None, glob_function=None)
Bases: AbstractDataset[_DI, _DO]
, ABC
AbstractVersionedDataset
is the base class for all versioned dataset
implementations.
All datasets that implement versioning should extend this abstract class and implement the methods marked as abstract.
Example: ::
>>> from pathlib import Path, PurePosixPath
>>> import pandas as pd
>>> from kedro.io import AbstractVersionedDataset
>>>
>>>
>>> class MyOwnDataset(AbstractVersionedDataset):
>>> def __init__(self, filepath, version, param1, param2=True):
>>> super().__init__(PurePosixPath(filepath), version)
>>> self._param1 = param1
>>> self._param2 = param2
>>>
>>> def load(self) -> pd.DataFrame:
>>> load_path = self._get_load_path()
>>> return pd.read_csv(load_path)
>>>
>>> def save(self, df: pd.DataFrame) -> None:
>>> save_path = self._get_save_path()
>>> df.to_csv(str(save_path))
>>>
>>> def _exists(self) -> bool:
>>> path = self._get_load_path()
>>> return Path(path.as_posix()).exists()
>>>
>>> def _describe(self):
>>> return dict(version=self._version, param1=self._param1, param2=self._param2)
Example catalog.yml specification: ::
my_dataset:
type: <path-to-my-own-dataset>.MyOwnDataset
filepath: data/01_raw/my_data.csv
versioned: true
param1: <param1-value> # param1 is a required argument
# param2 will be True by default
Parameters:
-
filepath
(PurePosixPath
) –Filepath in POSIX format to a file.
-
version
(Version | None
) –If specified, should be an instance of
kedro.io.core.Version
. If itsload
attribute is None, the latest version will be loaded. If itssave
attribute is None, save version will be autogenerated. -
exists_function
(Callable[[str], bool] | None
, default:None
) –Function that is used for determining whether a path exists in a filesystem.
-
glob_function
(Callable[[str], list[str]] | None
, default:None
) –Function that is used for finding all paths in a filesystem, which match a given pattern.
Source code in kedro/io/core.py
712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 |
|
_fetch_latest_load_version ¶
_fetch_latest_load_version()
Source code in kedro/io/core.py
741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 |
|
_fetch_latest_save_version ¶
_fetch_latest_save_version()
Generate and cache the current save version
Source code in kedro/io/core.py
764 765 766 767 |
|
_get_load_path ¶
_get_load_path()
Source code in kedro/io/core.py
777 778 779 780 781 782 783 |
|
_get_save_path ¶
_get_save_path()
Source code in kedro/io/core.py
793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 |
|
_get_versioned_path ¶
_get_versioned_path(version)
Source code in kedro/io/core.py
809 810 |
|
_release ¶
_release()
Source code in kedro/io/core.py
870 871 872 |
|
_save_wrapper
classmethod
¶
_save_wrapper(save_func)
Decorate save_func
with logging and error handling code.
Source code in kedro/io/core.py
812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 |
|
exists ¶
exists()
Checks whether a dataset's output already exists by calling the provided _exists() method.
Returns:
-
bool
–Flag indicating whether the output already exists.
Raises:
-
DatasetError
–when underlying exists method raises error.
Source code in kedro/io/core.py
850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 |
|
resolve_load_version ¶
resolve_load_version()
Compute the version the dataset should be loaded with.
Source code in kedro/io/core.py
769 770 771 772 773 774 775 |
|
resolve_save_version ¶
resolve_save_version()
Compute the version the dataset should be saved with.
Source code in kedro/io/core.py
785 786 787 788 789 790 791 |
|