Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

�️ Python Web Scraper

A powerful and flexible web scraping tool built with Python that can discover, categorize, and download various types of digital files from websites.


📂 Project Structure

web-scrapper/
│
├── main.py                      # Main application entry point
├── script.py                    # Alternative/legacy script
├── requirements.txt             # Python dependencies
├── README.md                    # Project documentation
└── utils/
    ├── config.py               # Configuration settings
    ├── scraper.py              # Core scraping functionality
    ├── downloader.py           # File download utilities
    ├── scraper_utils.py        # Helper utilities
    └── utils.py                # Additional utilities

� Features

  • Multi-format Support: Scrapes various file types (documents, images, videos, audio, etc.)
  • Smart Categorization: Automatically categorizes files by type
  • Domain Filtering: Optional filtering by domain
  • Progress Tracking: Visual progress bars for downloads
  • Logging: Comprehensive logging for debugging and monitoring
  • Safe Downloads: Validates URLs and handles errors gracefully

🛠️ Setup Instructions

1. Navigate to the Web Scraper Directory

cd web-scrapper

2. Install Dependencies

pip install -r requirements.txt

3. Run the Application

python main.py

💻 Usage

  1. Enter Target URL: Provide the website URL to scrape
  2. Set Domain Filter (Optional): Filter results by specific domain
  3. Select Files: Choose which files to download from the discovered list
  4. Confirm Download: Review selections and start the download process

📋 Supported File Types

  • Documents: PDF, DOC, DOCX, TXT, etc.
  • Images: JPG, PNG, GIF, SVG, etc.
  • Videos: MP4, AVI, MOV, etc.
  • Audio: MP3, WAV, FLAC, etc.
  • Archives: ZIP, RAR, TAR, etc.
  • And many more...

⚙️ Configuration

Edit utils/config.py to customize:

  • File type categories
  • Save directory paths
  • Download settings
  • Logging preferences

📝 Notes

  • Downloads are saved to a downloads/ folder (automatically created)
  • Log files track all scraping activities
  • Respects robots.txt and implements polite scraping practices
  • Always ensure you have permission to scrape target websites

📝 Professional and Clear README.md

# 🌐 Python Web Scraper

A simple, efficient, and easy-to-use Python web scraper built to extract digital files (images, videos, documents, etc.) from URLs on websites.

---

## ✨ Features

-**Simple URL scraping** for multiple file types.
- 📁 **Automatically organizes downloads** into categories: Images, Videos, Audio, Documents, and Others.
- 🛠️ **User-friendly prompts** for easy navigation.
- 📈 **Progress bars** to visually track downloads.
- 🔒 **Secure**: Automatically sanitizes filenames and URLs.

---

## 📌 Requirements

- Python 3.8 or higher
- Libraries: `requests`, `beautifulsoup4`, `tqdm`

```bash
pip install requests beautifulsoup4 tqdm
```

🚀 How to Use

  1. Clone the repository:
git clone <repository-url>
cd <repository-name>
  1. Run the main script:
python main.py
  1. Follow the on-screen prompts to input URLs and select files to download.

📜 Legal Disclaimer

⚠️ Important: This tool is created strictly for educational and legitimate uses only. The developer does not condone or endorse any form of illegal activities or piracy. Users are solely responsible for their own actions and should comply with local laws, website terms of service, and copyright regulations.

By using this tool, you agree to abide by the legal and ethical standards applicable in your jurisdiction.


🛡️ Contributing

Contributions are welcome! Please submit pull requests or open issues to improve this project.


📞 Contact

For further inquiries, please open an issue or submit your questions through the repository.


🌟 Happy scraping! 🌟