check_project_metadata.py
Check the Scribe-Data metadata files to make sure that all information is included.
Examples
>>> python3 src/scribe_data/check/check_project_metadata.py
- scribe_data.check.check_project_metadata.get_available_languages() dict[str, dict[str, list[str]]][source]
Get available languages from the data extraction folder.
- Returns:
- dict[str, List[str]]
A dictionary with the language name as the key and a list of its sub-languages (if available).
- scribe_data.check.check_project_metadata.get_missing_languages(reference_languages: dict, target_languages: dict) list[str][source]
Compare two language dictionaries and return a list of languages and sub-languages that exist.
- Parameters:
- reference_languagesdict
A dictionary of languages from the reference source.
- target_languagesdict
A dictionary of languages from the target source to check for missing entries.
- Returns:
- List[str]
A list of languages and sub-languages that are in target_languages but not in reference_languages.
- scribe_data.check.check_project_metadata.validate_language_properties(languages_dict: dict) dict[source]
Validate the presence of ‘qid’ and ‘iso’ properties for each language and its sub-languages.
- Parameters:
- languages_dictdict
A dictionary where each key is a language, and the value is another dictionary containing details about the language. If the language has sub-languages, they are stored under the ‘sub_languages’ key.
- Returns:
- dict: A dictionary with two lists:
“missing_qids”: Languages or sub-languages missing the ‘qid’ property.
“missing_isos”: Languages or sub-languages missing the ‘iso’ property.
Each entry in these lists is in the format “parent_language - sub_language” for sub-languages, or simply “parent_language” for the parent languages.
- scribe_data.check.check_project_metadata.check_language_metadata() None[source]
Validate language metadata by performing various checks.
- Raises:
- SystemExit:
If any missing languages or properties are found, the function exits the script with a status code of 1.
Notes
Checks include:
Ensures that all languages listed in queries are present in language_metadata.yaml, and vice versa.
- Checks if each language in language_metadata.yaml has the required properties:
‘qid’ (a unique identifier)
‘iso’ (ISO language code)
This function helps identify missing languages or missing properties, ensuring data consistency across both sources.