generate_query.py

View code on Github

Generate SPARQL queries for missing lexeme forms.

scribe_data.check.check_missing_forms.generate_query.get_available_filename(base_path: str) str[source]

Find the next available filename by incrementing counter if file exists.

Parameters:
base_pathstr

Base path for the query file.

Returns:
str

Available filename that doesn’t conflict with existing files.

Examples

If no files exist:
  • Returns query_{data_type}.sparql

If query_{data_type}.sparql exists:
  • Renames existing query_{data_type}.sparql to query_{data_type}_1.sparql

  • Returns query_{data_type}_2.sparql

If last file is query_{data_type}_N.sparql:
  • Returns query_{data_type}_(N+1).sparql

scribe_data.check.check_missing_forms.generate_query.generate_query(missing_features: dict, query_dir: Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/scribe-data/checkouts/latest/src/scribe_data/wikidata/queries_all_data'), sub_lang_iso_code: str | None = '') str | None[source]

Generate SPARQL queries for missing lexeme forms.

Parameters:
missing_featuresdict

Dictionary containing missing features by language and data type. Format: {language_qid: {data_type_qid: [[form_qids]]}}.

query_dirPath, optional

Directory where query files should be saved. If None, uses default queries directory.

sub_lang_iso_codestr, optional

The ISO-2 code of a sub-language if there is one being provided.

Returns:
str | None

Path to the generated query file.

Notes

  • Generates a single query file combining all forms for a given language and data type combination.

  • Query files are named incrementally if duplicates exist.

  • Creates necessary directories if they don’t exist.