filter.py
Functions for filtering data by data contracts.
- scribe_data.cli.contracts.filter.filter_contract_metadata(contract_file: Path) dict[str, Any][source]
Extract and filter metadata from a language-specific data contract file.
- Parameters:
- contract_filePath
Path to the YAML contract file for a specific language.
- Returns:
- Dict[str, Any]
A structured dictionary containing filtered metadata with keys: - ‘nouns’: {‘numbers’: […], ‘genders’: […]} - ‘verbs’: {‘conjugations’: […]}
- scribe_data.cli.contracts.filter.filter_exported_data(input_file: Path, contract_metadata: dict[str, Any], data_type: str) dict[str, Any][source]
Filter exported language data based on contract metadata requirements.
This function processes JSON export files, keeping only the data forms specified in the corresponding language contract.
- Parameters:
- input_filePath
Path to the input JSON file with exported language data.
- contract_metadataDict[str, Any]
Metadata from the language’s contract file.
- data_typestr
Type of data to filter (‘nouns’ or ‘verbs’).
- Returns:
- Dict[str, Any]
Filtered dictionary of lexemes, containing only specified forms. Preserves ‘lastModified’ and ‘lexemeID’ for each lexeme.
- scribe_data.cli.contracts.filter.export_data_filtered_by_contracts(contracts_dir: Path, input_dir: Path, output_dir: Path) None[source]
Export contract-filtered data to a new directory with a standardized structure.
This function processes data contracts for all languages, filtering and exporting data that meets the specified contract requirements.
- Parameters:
- contracts_dirPath
Directory containing the contracts to filter with. Defaults to DEFAULT_DATA_CONTRACTS_DIR.
- input_dirPath
Directory containing original JSON export data. Defaults to DEFAULT_JSON_EXPORT_DIR.
- output_dirPath
Directory to export filtered contract data. Defaults to scribe_data_filtered_* based on the data type.
- Returns:
- None
Prints information on the data that has been filtered.