https://raw.githubusercontent.com/scribe-org/Scribe-Data/main/.github/resources/images/ScribeDataLogo.png

Wikidata and Wiktionary language data extraction

Installation

Scribe-Data is available for installation via pip:

# Using uv (recommended - fast, Rust-based installer):
uv pip install scribe-data

# Or using pip:
pip install scribe-data

The latest development version can further be installed the source code on GitHub:

# With uv (recommended):
uv sync --all-groups  # install all dependencies
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)

# Or with pip:
python -m venv .venv  # create virtual environment
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)
pip install -e .

To utilize the Scribe-Data CLI, you can execute variations of the following command in your terminal:

scribe-data -h  # view the cli options
scribe-data [command] [arguments]

Available Commands

list (l): List languages, data types and combinations of each that Scribe-Data can be used for.
get (g): Get data from Wikidata and other sources for the given languages and data types.
total (t): Check Wikidata for the total available data for the given languages and data types.
convert (c): Convert data returned by Scribe-Data to different file types.
download (d): Download Wikidata lexeme or Wiktionary dumps.
interactive (i): Run in interactive mode.
export_contracts (ec): Export Scribe-Data contracts to a local directory.
check_contracts (cc): Check the data in a Scribe-Data export directory to see that all needed language data is included.
filter_data (fd): Filter exported Scribe-Data data based on provided data contract values.

Contents

Scribe-Data
- utils.py
- check/
- cli/
- unicode/
- wikidata/
- wiktionary/

Project Indices

Index

Installation

Available Commands

Contents

Contributing

Project Indices