get_forms.py

View code on Github

Get forms from Wikidata SPARQL query files.

scribe_data.check.check_missing_forms.get_forms.parse_sparql_files() dict[source]

Read and parse all SPARQL query files to extract form information.

Returns:
dict

Accumulated forms for each language and lexical category. Format: {language: {lexical_category: [forms]}}.

Notes

Recursively searches through WIKIDATA_QUERIES_ALL_DATA_DIR directory for .sparql files and accumulates all form information.

scribe_data.check.check_missing_forms.get_forms.parse_sparql_query(query_text: str) dict[source]

Parse a SPARQL query to extract lexical categories and features.

Parameters:
query_textstr

Content of the SPARQL query file.

Returns:
dict

Dictionary containing parsed information. Format: {language: {lexical_category: [forms]}}.

Notes

Extracts: - Language QID - Lexical category QID - Grammatical features from OPTIONAL blocks