84/ 100 ยท B

Popular and well-maintained. A little polish away from elite status.

๐Ÿค— The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python21,620 starsApache-2.0updated 4d ago
DocumentationREADME, setup, examples, license
89
EngineeringTests, CI, linting, lockfiles
84
Project healthDescription, activity, stars, deps
78

What to fix first

The highest-impact improvements for this repo.

  1. 1
    CI/CD
    EngineeringInfo

    Add `tsc --noEmit`, `mypy`, or `cargo check` to catch type errors before they merge.

  2. 2
    CI/CD
    EngineeringInfo

    Upload coverage to Codecov, Coveralls, or report it with `--coverage` flags.

  3. 3
    README
    DocumentationWarning

    Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.

Detailed breakdown

Documentation

89
  • README80
    • README is present.
    • README is well structured with multiple sections.
    • No screenshots or images in the README (โˆ’20 pts).Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.
    • README has code examples.
    • README links to a live demo or deployed app.
    • README includes status badges.
  • Install and run instructions90
    • README documents how to install the project.
    • README documents how to run the project.
    • If your project uses environment variables, add a .env.example listing them (+10 pts).Add a .env.example listing all required environment variables so contributors know what to set up.
  • License100
    • Licensed under Apache-2.0.
  • Contributing guide95
    • Contributing guide is detailed and thorough.
    • Contributing guide includes setup/install instructions.
    • Contributing guide describes code style expectations.
    • Contributing guide lacks a testing section (โˆ’8 pts).Show contributors how to run the test suite (e.g. npm test, pytest, cargo test).
    • Contributing guide describes the PR/review workflow.
    • Contributing guide includes code examples.
    • Code of conduct present.

Engineering

84
  • Tests100
    • Test files detected (tests).
    • Pytest configured via [tool.pytest.ini_options] in pyproject.toml with test files present.
  • CI/CD100

    Not applicable?

    • CI is configured (.github/workflows/build_documentation.yml).
    • CI workflow runs tests.
    • CI runs on pull requests, not just on pushes to main.
    • CI workflow runs a lint or format check.
    • Optional: add type checking to CI.Add `tsc --noEmit`, `mypy`, or `cargo check` to catch type errors before they merge.
    • Optional: report test coverage in CI.Upload coverage to Codecov, Coveralls, or report it with `--coverage` flags.
    • CI tests across multiple environments or versions.
  • Linting and formatting60
    • pyproject.toml configures a Python formatter or linter (ruff/black).
    • No [tool.mypy] in pyproject.toml (โˆ’20 pts vs having both ruff and mypy).Install mypy and add a [tool.mypy] section to pyproject.toml for type checking.
  • Reproducibility0
    • No dependency lockfile found (โˆ’70 pts).Commit a lockfile (package-lock.json, poetry.lock, uv.lock, etc.) so installs produce the same result everywhere.
    • No Dockerfile or runtime version pin found. Adding one earns +10 pts.Add a Dockerfile, .nvmrc, or .python-version to pin the runtime version and make the environment reproducible.
    • No Dependabot config (adding it earns up to +20 pts).Add .github/dependabot.yml with at least one package-ecosystem entry so dependencies are updated automatically.
  • Issue and PR templates100
    • Issue or PR templates present.
    • Security policy present.

Project health

78
  • Dependency manifest55
    • Dependency manifest found (pyproject.toml).
  • Repository metadata70
    • Repository has a description.
    • Primary language detected: Python.
  • Activity100
    • Actively maintained (pushed within the last month).
    • 21,620 stars.
  • Housekeeping100
    • .gitignore present.
Repository files24 root entries
  • .dvc
    Good: .gitignore present.
  • .github
    Good: CI is configured (.github/workflows/build_documentation.yml).
    Good: Issue or PR templates present.
  • benchmarks
  • docs
  • notebooks
  • src
  • templates
  • tests
    Good: Test files detected (tests).
  • utils
  • .dvcignore
  • .gitignore
  • .pre-commit-config.yaml
  • .zenodo.json
  • ADD_NEW_DATASET.md
  • AUTHORS
  • CITATION.cff
  • CODE_OF_CONDUCT.md
    Good: Code of conduct present.
  • CONTRIBUTING.md
    Good: Contributing guide is detailed and thorough.
    Good: Contributing guide includes setup/install instructions.
    Good: Contributing guide describes code style expectations.
    Info: Contributing guide lacks a testing section (โˆ’8 pts).Fix: Show contributors how to run the test suite (e.g. npm test, pytest, cargo test).
    Good: Contributing guide describes the PR/review workflow.
    Good: Contributing guide includes code examples.
  • LICENSE
    Good: Licensed under Apache-2.0.
  • Makefile
  • pyproject.toml
    Good: Dependency manifest found (pyproject.toml).
  • README.md
    Good: README is present.
    Good: README is well structured with multiple sections.
    Warning: No screenshots or images in the README (โˆ’20 pts).Fix: Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.
    Good: README has code examples.
    Good: README links to a live demo or deployed app.
    Good: README includes status badges.
    Good: README documents how to install the project.
    Good: README documents how to run the project.
  • SECURITY.md
    Good: Security policy present.
  • setup.py