https://raw.githubusercontent.com/scribe-org/Scribe-Data/main/.github/resources/images/ScribeDataLogo.png

platform rtd ci_static_analysis ci_pytest issues language pypi pypistatus license coc mastodon matrix

Wikidata and Wiktionary language data extraction

Installation

Scribe-Data is available for installation via pip:

# Using uv (recommended - fast, Rust-based installer):
uv pip install scribe-data

# Or using pip:
pip install scribe-data

The latest development version can further be installed the source code on GitHub:

# With uv (recommended):
uv sync --all-groups  # install all dependencies
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)

# Or with pip:
python -m venv .venv  # create virtual environment
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)
pip install -e .

To utilize the Scribe-Data CLI, you can execute variations of the following command in your terminal:

scribe-data -h  # view the cli options
scribe-data [command] [arguments]

Available Commands

  • list (l): List languages, data types and combinations of each that Scribe-Data can be used for.

  • get (g): Get data from Wikidata and other sources for the given languages and data types.

  • total (t): Check Wikidata for the total available data for the given languages and data types.

  • convert (c): Convert data returned by Scribe-Data to different file types.

  • download (d): Download Wikidata lexeme or Wiktionary dumps.

  • interactive (i): Run in interactive mode.

  • export_contracts (ec): Export Scribe-Data contracts to a local directory.

  • check_contracts (cc): Check the data in a Scribe-Data export directory to see that all needed language data is included.

  • filter_data (fd): Filter exported Scribe-Data data based on provided data contract values.

Contents

Contributing

Project Indices