cli/ ==== `View code on Github `_ Scribe-Data provides a command-line interface (CLI) for efficient interaction with its language data functionality. .. toctree:: :maxdepth: 2 contracts/index convert/index download/index interactive/index list/index total/index .. toctree:: :maxdepth: 1 cli_utils get Usage ----- The basic syntax for using the Scribe-Data CLI is: .. code-block:: bash scribe-data [global_options] command [command_options] Global Options -------------- - ``-h, --help``: Show this help message and exit. - ``-v, --version``: Show the version of Scribe-Data. - ``-u, --upgrade``: Upgrade the Scribe-Data CLI. Commands -------- The Scribe-Data CLI supports the following commands: 1. ``list`` (alias: ``l``) 2. ``get`` (alias: ``g``) 3. ``total`` (alias: ``t``) 4. ``convert`` (alias: ``c``) 5. ``download`` (alias: ``d``) 6. ``interactive`` (alias: ``i``) Note: For all language arguments, if the language is more than one word then the argument value needs to be passed with double quotes around it. For example: .. code-block:: bash scribe-data total --language German --data-type nouns scribe-data total --language "Hindi Hindustani" --data-type nouns List Command ~~~~~~~~~~~~ List languages, data types and combinations of each that Scribe-Data can be used for. Usage ^^^^^ .. code-block:: bash scribe-data list [arguments] Options ^^^^^^^ - ``-lang, --language [LANGUAGE]``: List options for all or given languages. - ``-dt, --data-type [DATA_TYPE]``: List options for all or given data types. - ``-a, --all ALL``: List all languages and data types. Example output ^^^^^^^^^^^^^^ The scribe-data list command (also accessible via ``scribe-data list -a``) displays both the available languages and data types. .. code-block:: text $ scribe-data list Language ISO QID ========================== English en Q1860 ... Available data types: All languages =================================== adjectives adverbs conjunctions emoji-keywords nouns personal-pronouns postpositions prepositions pronouns proper-nouns verbs .. code-block:: text $scribe-data list --language Language ISO QID ========================== English en Q1860 ... .. code-block:: text $scribe-data list -dt Available data types: All languages =================================== adjectives adverbs conjunctions emoji-keywords nouns personal-pronouns postpositions prepositions pronouns proper-nouns verbs Get Command ~~~~~~~~~~~ Get data from Wikidata or Wiktionary for the given languages and data types. Usage ^^^^^ .. code-block:: bash scribe-data get [arguments] Options ^^^^^^^ - ``-lang, --language LANGUAGE``: The language(s) to get. - ``-dt, --data-type DATA_TYPE``: The data type(s) to get. - ``-od, --output-dir OUTPUT_DIR``: The output directory path for results. - ``-ot, --output-type {json,csv,tsv}``: The output file type. - ``-ope, --outputs-per-entry OUTPUTS_PER_ENTRY``: How many outputs should be generated per data entry. - ``-o, --overwrite``: Whether to overwrite existing files (default: False). - ``-a, --all``: Get all languages and data types. Can be combined with `-dt` to get all languages for a specific data type, or with `-lang` to get all data types for a specific language. - ``-i, --interactive``: Run in interactive mode. - ``-ic, --identifier-case``: The case format for identifiers in the output data (default: camel). - ``-wtp, --wiktionary-project WIKTIONARY_PROJECT``: The Wiktionary project to extract translations from (e.g. ``enwiktionary`` for English Wiktionary). Examples ^^^^^^^^ .. code-block:: bash $ scribe-data get --all Getting data for all languages and all data types... .. code-block:: bash $ scribe-data get --all -dt nouns Getting all nouns for all languages... .. code-block:: bash $ scribe-data get --all -lang English Getting all data types for English... .. code-block:: bash $ scribe-data get -l English --data-type verbs -od ~/path/for/output Getting and formatting English verbs Data updated: 100%|████████████████████████| 1/1 [00:XY<00:00, XY.Zs/process] To extract Wiktionary translations for a specific language: .. code-block:: bash $ scribe-data get -dt translations -lang de -wtp enwiktionary To extract Wiktionary translations for all supported languages: .. code-block:: bash $ scribe-data get -dt translations -wtp enwiktionary If we want to retrieve data using lexeme dumps, we can use the following command: .. code-block:: bash $ scribe-data get -lang german -dt nouns -wdp **Example Output:** .. code-block:: text Languages to process: German Data types to process: nouns Existing dump files found: - scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2 ? Do you want to: (Use arrow keys) » Delete existing dumps Skip download Use existing latest dump Download new version **Instructions:** 1. Use the arrow keys to navigate through the options. 2. Press **Enter** to confirm your selection. **Options Explained:** - **Delete existing dumps**: Removes the existing dump files before downloading new ones. - **Skip download**: Skips the download process. - **Use existing latest dump**: Processes the existing dump file without downloading a new version. - **Download new version**: Downloads the latest version of the lexeme dump. **Note:** Ensure you have sufficient disk space and a stable internet connection if downloading a new version. **If No Existing Dump Files Are Found:** 1. If no existing dump files are found, the command will display the following message: .. code-block:: text No existing dump files found. Downloading new version... 2. The command will then proceed to download the latest dump file: .. code-block:: text Downloading dump to scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2... scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2: 100%|███████████████████| 370M/370M [04:20<00:00, 1.42MiB/s] Wikidata lexeme dump download completed successfully! Behavior and Output ^^^^^^^^^^^^^^^^^^^ 1. The command will first check for existing data: .. code-block:: text Updating data for language(s): English; data type(s): verbs Data updated: 0%| 2. If existing files are found, you'll be prompted to choose an option: .. code-block:: text Existing file(s) found for English verbs: 1. verbs.json Choose an option: 1. Overwrite existing data (press 'o') 2. Skip process (press anything else) Enter your choice: 3. After making a selection, the get process begins: .. code-block:: text Getting and formatting English verbs Data updated: 100%|████████████████████████| 1/1 [00:XY<00:00, XY.Zs/process] 4. If no data is found, you'll see a warning: .. code-block:: text No data found for language 'english' and data type '['verbs']'. Warning: No data file found for 'English' ['verbs']. The command must not have worked. Notes ^^^^^ 1. The data type can be specified with ``--data-type`` or ``-dt``. 2. The command creates timestamped JSON files by default, even if no data is found. 3. If multiple files exist, you'll be given options to manage them (keep existing, overwrite, keep both, or cancel). 4. The process may take some time, especially for large datasets. Troubleshooting: ^^^^^^^^^^^^^^^^ - If you receive a "No data found" warning, check your internet connection and verify that the language and data type are correctly specified. - If you're having issues with file paths, remember to use quotes around paths with spaces. - If the command seems to hang at 0% or 100%, be patient as the process can take several minutes depending on the dataset size and your internet connection. Total Command ~~~~~~~~~~~~~ Check Wikidata for the total available data for the given languages and data types. Usage ^^^^^ .. code-block:: bash scribe-data total [arguments] Options ^^^^^^^ - ``-lang, --language LANGUAGE``: The language(s) to check totals for. Can be a language name or QID. - ``-dt, --data-type DATA_TYPE``: The data type(s) to check totals for. - ``-a, --all``: Get totals for all languages and data types. Examples ^^^^^^^^ 1. Get totals for all languages and data types: .. code-block:: text $ scribe-data total --all Total lexemes for all languages and data types: ================================================= Language Data Type Total Wikidata Lexemes ================================================= English nouns 123,456 verbs 234,567 ... 2. Get totals for all data types in English: .. code-block:: text $ scribe-data total --language English Returning total counts for English data types... Language Data Type Total Wikidata Lexemes ================================================================ English adjectives 12,345 adverbs 23,456 nouns 34,567 ... 3. Get totals using a Wikidata QID: .. code-block:: text $ scribe-data total --language Q1860 Wikidata QID Q1860 passed. Checking all data types. Language Data Type Total Wikidata Lexemes ================================================================ Q1860 adjectives 12,345 adverbs 23,456 articles 30 conjunctions 40 nouns 56,789 personal pronouns 60 ... 4. Get totals for a specific language and data type combination: .. code-block:: text $ scribe-data total --language English -dt nouns Language: English Data type: nouns Total number of lexemes: 12,345 5. Get totals for a specific QID and data type combination: .. code-block:: text $ scribe-data total --language Q1860 -dt verbs Language: Q1860 Data type: verbs Total number of lexemes: 23,456 Download Command ~~~~~~~~~~~~~~~~ Download Wikidata lexeme dumps or Wiktionary dumps for offline data extraction. Usage ^^^^^ .. code-block:: bash scribe-data download Options ^^^^^^^ - ``--wiktionary-dump``: Download a Wiktionary dump instead of a Wikidata lexeme dump. - ``-lang, --language LANGUAGE``: The language edition of Wiktionary to download (e.g. ``de`` for German Wiktionary). Defaults to English Wiktionary (``enwiktionary``) when omitted. Examples ^^^^^^^^ Download the English Wiktionary dump: .. code-block:: bash scribe-data download --wiktionary-dump Download a specific language's Wiktionary dump: .. code-block:: bash scribe-data download --wiktionary-dump --language de Behavior and Output ^^^^^^^^^^^^^^^^^^^ - **If Existing Dump Files Are Found:** 1. If existing dump files are found, the command will display the following message: .. code-block:: text Existing dump files found: - scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2 2. The command will prompt the user with options to choose from: .. code-block:: text ? Do you want to: (Use arrow keys) » Delete existing dumps Skip download Use existing latest dump Download new version - **If Downloading New Version:** 1. If the user chooses to proceed with the download, the dump will be downloaded to the specified directory: .. code-block:: text Downloading dump to scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2... scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2: 100%|███████████████████| 370M/370M [04:20<00:00, 1.42MiB/s] Wikidata lexeme dump download completed successfully! Convert Command ~~~~~~~~~~~~~~~ Convert data returned by Scribe-Data to different file types, including SQLite databases for multiple languages and data types. Usage ^^^^^ .. code-block:: bash scribe-data convert [arguments] Options ^^^^^^^ - ``-f, --file FILE``: The file to convert to a new type. - ``-lang, --language LANGUAGE``: The language(s) to convert (for SQLite conversion). - ``-dt, --data-type DATA_TYPE``: The data type(s) to convert (for SQLite conversion). - ``-ko, --keep-original``: Whether to keep the file to be converted (default: True). - ``-ot, --output-type {json,csv,tsv,sqlite}``: The output file type. Examples ^^^^^^^^ 1. **Convert multiple languages and data types to SQLite:** .. code-block:: bash $ scribe-data convert -lang english french -dt nouns verbs -ot sqlite Creating/Updating SQLite databases for the following languages: English, French Updating only the following tables: nouns, verbs Databases created: 0%| | 0/2 [00:00