check_project_structure.py

View code on Github

Check the structure of Scribe-Data to make sure that all files are correctly named and included.

Examples

>>> python3 src/scribe_data/check/check_project_structure.py
scribe_data.check.check_project_structure.check_for_sparql_files(folder_path: str, data_type: str, language: str, subdir: str | None, missing_queries: list) bool[source]

Check if a data-type folder contains at least one .sparql file.

Parameters:
folder_pathstr

The path to the data-type folder.

data_typestr

The name of the data type being checked.

languagestr

The name of the language being processed.

subdirstr or None

The name of the sub-directory (for languages with sub-dialects), or None.

missing_querieslist

A list to which missing SPARQL query files will be appended.

Returns:
bool

True if at least one .sparql file is found, False otherwise.

scribe_data.check.check_project_structure.check_data_type_folders(path: str, language: str, subdir: str | None, errors: list, missing_folders: list, missing_queries: list) None[source]

Validate the contents of data type folders within a language directory.

Parameters:
pathstr

The path to the directory containing data type folders.

languagestr

The name of the language being processed.

subdirstr or None

The name of the sub-directory (for languages with sub-dialects), or None.

errorslist

A list to which error messages will be appended.

missing_folderslist

A list to which missing folders will be appended.

missing_querieslist

A list to which missing SPARQL query files will be appended.

Notes

This function checks each data type folder for the presence of expected files and reports any unexpected files. It allows for multiple SPARQL query files, a format Python file, and a queried JSON file for each data type.

The function checks for the following valid files in each data type folder:
  • Files starting with query_ and ending with .sparql

  • A format_{data_type}.py file

  • A {data_type}_queried.json file

It skips validation for emoji_keywords data type folder.

Any files not matching these patterns (except __init__.py) are reported as unexpected.

scribe_data.check.check_project_structure.check_project_structure() None[source]

Validate that all directories follow the expected project structure and check for unexpected files and directories.

Notes

Also validate SPARQL query file names in data_type folders and SUBDIRECTORIES.