wikidata_lexeme_dump.py
Functions for downloading Wikidata lexeme dumps.
- scribe_data.cli.download.wikidata_lexeme_dump.parse_date(date_string: str) date | None[source]
Parse a date string into a datetime.date object (formats: YYYYMMDD, YYYY/MM/DD, YYYY-MM-DD).
- Parameters:
- date_stringstr
The date string to be parsed.
- Returns:
- datetime.date
Parsed date object if the format is valid.
- None
If the date format is invalid.
- scribe_data.cli.download.wikidata_lexeme_dump.available_closest_lexeme_dump_file(target_entity: str, other_old_dumps: list, check_wd_dump_exists: Callable[[str], str | None]) str | None[source]
Find the closest available dump file based on the target date.
- Parameters:
- target_entitystr
The target date for which the dump is requested (format: YYYY/MM/DD or similar).
- other_old_dumpslist
List of available dump folders as strings.
- check_wd_dump_existsfunction
A function to validate if the dump file exists.
- Returns:
- str
The closest available dump file date (as a string).
- None
If no suitable dump is found.
- scribe_data.cli.download.wikidata_lexeme_dump.download_wd_lexeme_dump(target_entity: str = 'latest-lexemes') str | None[source]
Download a Wikimedia lexeme dump based on the specified target entity or date.
- Parameters:
- target_entitystr, optional
The target dump to download. Defaults to “latest-lexemes”. - If “latest-lexemes”, downloads the latest dump. - If a valid date (e.g., YYYYMMDD), attempts to download the dump for that date.
- Returns:
- str
The URL of the requested or closest available dump.
- None
If no suitable dump is found or the request fails.
- scribe_data.cli.download.wikidata_lexeme_dump.wd_lexeme_dump_download_wrapper(dump_snapshot: str | None = None, output_dir: Path | None = PosixPath('scribe_data_wikidata_dumps_export'), default: bool = False) Path | bool | None[source]
Download Wikidata lexeme dumps given user preferences.
- Parameters:
- dump_snapshotstr
Optional date string in YYYYMMDD format for specific dumps.
- output_dirPath
Optional directory path for the downloaded file. Defaults to ‘scribe_data_wikidata_dumps_export’ directory.
- defaultbool, optional
If True, skips the user confirmation prompt. Defaults to False.
- Returns:
- Path or None
If successful and a dump is downloaded, returns the file path to the downloaded dump.
If an existing usable dump is detected, returns the path to the existing dump.
Returns None if the user chooses not to proceed with the download or no valid dump URL is found.