84/ 100 ยท B
Popular and well-maintained. A little polish away from elite status.
๐ค The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Python21,620 starsApache-2.0updated 4d ago
DocumentationREADME, setup, examples, license
EngineeringTests, CI, linting, lockfiles
Project healthDescription, activity, stars, deps
What to fix first
The highest-impact improvements for this repo.
- 1CI/CDEngineeringInfo
Add `tsc --noEmit`, `mypy`, or `cargo check` to catch type errors before they merge.
- 2CI/CDEngineeringInfo
Upload coverage to Codecov, Coveralls, or report it with `--coverage` flags.
- 3READMEDocumentationWarning
Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.
Detailed breakdown
Documentation
89- README80
- README is present.
- README is well structured with multiple sections.
- No screenshots or images in the README (โ20 pts).Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.
- README has code examples.
- README links to a live demo or deployed app.
- README includes status badges.
- Install and run instructions90
- README documents how to install the project.
- README documents how to run the project.
- If your project uses environment variables, add a .env.example listing them (+10 pts).Add a .env.example listing all required environment variables so contributors know what to set up.
- License100
- Licensed under Apache-2.0.
- Contributing guide95
- Contributing guide is detailed and thorough.
- Contributing guide includes setup/install instructions.
- Contributing guide describes code style expectations.
- Contributing guide lacks a testing section (โ8 pts).Show contributors how to run the test suite (e.g. npm test, pytest, cargo test).
- Contributing guide describes the PR/review workflow.
- Contributing guide includes code examples.
- Code of conduct present.
Engineering
84- Tests100
- Test files detected (tests).
- Pytest configured via [tool.pytest.ini_options] in pyproject.toml with test files present.
- CI/CD100
Not applicable?
- CI is configured (.github/workflows/build_documentation.yml).
- CI workflow runs tests.
- CI runs on pull requests, not just on pushes to main.
- CI workflow runs a lint or format check.
- Optional: add type checking to CI.Add `tsc --noEmit`, `mypy`, or `cargo check` to catch type errors before they merge.
- Optional: report test coverage in CI.Upload coverage to Codecov, Coveralls, or report it with `--coverage` flags.
- CI tests across multiple environments or versions.
- Linting and formatting60
- pyproject.toml configures a Python formatter or linter (ruff/black).
- No [tool.mypy] in pyproject.toml (โ20 pts vs having both ruff and mypy).Install mypy and add a [tool.mypy] section to pyproject.toml for type checking.
- Reproducibility0
- No dependency lockfile found (โ70 pts).Commit a lockfile (package-lock.json, poetry.lock, uv.lock, etc.) so installs produce the same result everywhere.
- No Dockerfile or runtime version pin found. Adding one earns +10 pts.Add a Dockerfile, .nvmrc, or .python-version to pin the runtime version and make the environment reproducible.
- No Dependabot config (adding it earns up to +20 pts).Add .github/dependabot.yml with at least one package-ecosystem entry so dependencies are updated automatically.
- Issue and PR templates100
- Issue or PR templates present.
- Security policy present.
Project health
78- Dependency manifest55
- Dependency manifest found (pyproject.toml).
- Repository metadata70
- Repository has a description.
- Primary language detected: Python.
- Activity100
- Actively maintained (pushed within the last month).
- 21,620 stars.
- Housekeeping100
- .gitignore present.
Repository files24 root entries
- .dvcGood: .gitignore present.
- .githubGood: CI is configured (.github/workflows/build_documentation.yml).Good: Issue or PR templates present.
- benchmarks
- docs
- notebooks
- src
- templates
- testsGood: Test files detected (tests).
- utils
- .dvcignore
- .gitignore
- .pre-commit-config.yaml
- .zenodo.json
- ADD_NEW_DATASET.md
- AUTHORS
- CITATION.cff
- CODE_OF_CONDUCT.mdGood: Code of conduct present.
- CONTRIBUTING.mdGood: Contributing guide is detailed and thorough.Good: Contributing guide includes setup/install instructions.Good: Contributing guide describes code style expectations.Info: Contributing guide lacks a testing section (โ8 pts).Fix: Show contributors how to run the test suite (e.g. npm test, pytest, cargo test).Good: Contributing guide describes the PR/review workflow.Good: Contributing guide includes code examples.
- LICENSEGood: Licensed under Apache-2.0.
- Makefile
- pyproject.tomlGood: Dependency manifest found (pyproject.toml).
- README.mdGood: README is present.Good: README is well structured with multiple sections.Warning: No screenshots or images in the README (โ20 pts).Fix: Add a GIF, screenshot, or logo image. It is the fastest way to show what your project does.Good: README has code examples.Good: README links to a live demo or deployed app.Good: README includes status badges.Good: README documents how to install the project.Good: README documents how to run the project.
- SECURITY.mdGood: Security policy present.
- setup.py