This project is a GitHub scraper that uses Puppeteer to extract information about GitHub organizations and their repositories. It collects data such as organization details, top languages, and repository information.
-
Clone the repository:
git clone https://github.com/ranbot-ai/github-scraper.git cd github-scraper -
Install the dependencies:
npm install
-
Set the
ORG_NAMEenvironment variable to the GitHub organization you want to scrape:env ORG_NAME=ranbot-ai npx ts-node scraper.ts
- Extracts organization information including name, description, top languages, employee count, website, and social links.
- Scrapes repository data such as name, link, description, stars, forks, and pull requests.
- Handles pagination to scrape multiple pages of repositories.
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License.