check_project_metadata.py

View code on Github

Check the Scribe-Data metadata files to make sure that all information is included.

Examples

>>> python3 src/scribe_data/check/check_project_metadata.py
scribe_data.check.check_project_metadata.get_available_languages() dict[str, list[str]][source]

Get available languages from the data extraction folder.

Returns:
dict[str, list[str]]

A dictionary with the language name as the key and a list of its sub-languages (if available).

scribe_data.check.check_project_metadata.get_missing_languages(reference_languages: dict, target_languages: dict) list[str][source]

Compare two language dictionaries and return a list of languages and sub-languages that exist.

Parameters:
reference_languagesdict

A dictionary of languages from the reference source.

target_languagesdict

A dictionary of languages from the target source to check for missing entries.

Returns:
list[str]

A list of languages and sub-languages that are in target_languages but not in reference_languages.

scribe_data.check.check_project_metadata.validate_language_properties(languages_dict: dict) dict[source]

Validate the presence of ‘qid’ and ‘iso’ properties for each language and its sub-languages.

Parameters:
languages_dictdict

A dictionary where each key is a language, and the value is another dictionary containing details about the language. If the language has sub-languages, they are stored under the ‘sub_languages’ key.

Returns:
dict: A dictionary with two lists:
  • “missing_qids”: Languages or sub-languages missing the ‘qid’ property.

  • “missing_isos”: Languages or sub-languages missing the ‘iso’ property.

Each entry in these lists is in the format “parent_language - sub_language” for sub-languages, or simply “parent_language” for the parent languages.

scribe_data.check.check_project_metadata.check_language_metadata()[source]

Validate language metadata by performing various checks.

Raises:
SystemExit:

If any missing languages or properties are found, the function exits the script with a status code of 1.

Notes

Checks include:

  1. Ensures that all languages listed in language_data_extraction are present in language_metadata.json, and vice versa.

  2. Checks if each language in language_metadata.json has the required properties:
    • ‘qid’ (a unique identifier)

    • ‘iso’ (ISO language code)

This function helps identify missing languages or missing properties, ensuring data consistency across both sources.