wikidata_lexeme_dump.py

View code on Github

Functions for downloading Wikidata lexeme dumps.

scribe_data.cli.download.wikidata_lexeme_dump.parse_date(date_string: str) date | None[source]

Parse a date string into a datetime.date object (formats: YYYYMMDD, YYYY/MM/DD, YYYY-MM-DD).

Parameters:
date_stringstr

The date string to be parsed.

Returns:
datetime.date

Parsed date object if the format is valid.

None

If the date format is invalid.

scribe_data.cli.download.wikidata_lexeme_dump.available_closest_lexeme_dump_file(target_entity: str, other_old_dumps: list, check_wd_dump_exists: Callable[[str], str | None]) str | None[source]

Find the closest available dump file based on the target date.

Parameters:
target_entitystr

The target date for which the dump is requested (format: YYYY/MM/DD or similar).

other_old_dumpslist

List of available dump folders as strings.

check_wd_dump_existsfunction

A function to validate if the dump file exists.

Returns:
str

The closest available dump file date (as a string).

None

If no suitable dump is found.

scribe_data.cli.download.wikidata_lexeme_dump.download_wd_lexeme_dump(target_entity: str = 'latest-lexemes') str | None[source]

Download a Wikimedia lexeme dump based on the specified target entity or date.

Parameters:
target_entitystr, optional

The target dump to download. Defaults to “latest-lexemes”. - If “latest-lexemes”, downloads the latest dump. - If a valid date (e.g., YYYYMMDD), attempts to download the dump for that date.

Returns:
str

The URL of the requested or closest available dump.

None

If no suitable dump is found or the request fails.

scribe_data.cli.download.wikidata_lexeme_dump.wd_lexeme_dump_download_wrapper(dump_snapshot: str | None = None, output_dir: Path | None = PosixPath('scribe_data_wikidata_dumps_export'), default: bool = False) Path | bool | None[source]

Download Wikidata lexeme dumps given user preferences.

Parameters:
dump_snapshotstr

Optional date string in YYYYMMDD format for specific dumps.

output_dirPath

Optional directory path for the downloaded file. Defaults to ‘scribe_data_wikidata_dumps_export’ directory.

defaultbool, optional

If True, skips the user confirmation prompt. Defaults to False.

Returns:
Path or None
  • If successful and a dump is downloaded, returns the file path to the downloaded dump.

  • If an existing usable dump is detected, returns the path to the existing dump.

  • Returns None if the user chooses not to proceed with the download or no valid dump URL is found.