Name	Name	Last commit message	Last commit date
parent directory ..
utils	utils
README.md	README.md
main.py	main.py
requirements.txt	requirements.txt
script.py	script.py

�️ Python Web Scraper

A powerful and flexible web scraping tool built with Python that can discover, categorize, and download various types of digital files from websites.

📂 Project Structure

web-scrapper/
│
├── main.py                      # Main application entry point
├── script.py                    # Alternative/legacy script
├── requirements.txt             # Python dependencies
├── README.md                    # Project documentation
└── utils/
    ├── config.py               # Configuration settings
    ├── scraper.py              # Core scraping functionality
    ├── downloader.py           # File download utilities
    ├── scraper_utils.py        # Helper utilities
    └── utils.py                # Additional utilities

� Features

Multi-format Support: Scrapes various file types (documents, images, videos, audio, etc.)
Smart Categorization: Automatically categorizes files by type
Domain Filtering: Optional filtering by domain
Progress Tracking: Visual progress bars for downloads
Logging: Comprehensive logging for debugging and monitoring
Safe Downloads: Validates URLs and handles errors gracefully

🛠️ Setup Instructions

1. Navigate to the Web Scraper Directory

cd web-scrapper

2. Install Dependencies

pip install -r requirements.txt

3. Run the Application

python main.py

💻 Usage

Enter Target URL: Provide the website URL to scrape
Set Domain Filter (Optional): Filter results by specific domain
Select Files: Choose which files to download from the discovered list
Confirm Download: Review selections and start the download process

📋 Supported File Types

Documents: PDF, DOC, DOCX, TXT, etc.
Images: JPG, PNG, GIF, SVG, etc.
Videos: MP4, AVI, MOV, etc.
Audio: MP3, WAV, FLAC, etc.
Archives: ZIP, RAR, TAR, etc.
And many more...

⚙️ Configuration

Edit utils/config.py to customize:

File type categories
Save directory paths
Download settings
Logging preferences

📝 Notes

Downloads are saved to a downloads/ folder (automatically created)
Log files track all scraping activities
Respects robots.txt and implements polite scraping practices
Always ensure you have permission to scrape target websites

📝 Professional and Clear README.md

# 🌐 Python Web Scraper

A simple, efficient, and easy-to-use Python web scraper built to extract digital files (images, videos, documents, etc.) from URLs on websites.

---

## ✨ Features

- ✅ **Simple URL scraping** for multiple file types.
- 📁 **Automatically organizes downloads** into categories: Images, Videos, Audio, Documents, and Others.
- 🛠️ **User-friendly prompts** for easy navigation.
- 📈 **Progress bars** to visually track downloads.
- 🔒 **Secure**: Automatically sanitizes filenames and URLs.

---

## 📌 Requirements

- Python 3.8 or higher
- Libraries: `requests`, `beautifulsoup4`, `tqdm`

```bash
pip install requests beautifulsoup4 tqdm
```

🚀 How to Use

Clone the repository:

git clone <repository-url>
cd <repository-name>

Run the main script:

python main.py

Follow the on-screen prompts to input URLs and select files to download.

📜 Legal Disclaimer

⚠️ Important: This tool is created strictly for educational and legitimate uses only. The developer does not condone or endorse any form of illegal activities or piracy. Users are solely responsible for their own actions and should comply with local laws, website terms of service, and copyright regulations.

By using this tool, you agree to abide by the legal and ethical standards applicable in your jurisdiction.

🛡️ Contributing

Contributions are welcome! Please submit pull requests or open issues to improve this project.

📞 Contact

For further inquiries, please open an issue or submit your questions through the repository.

🌟 Happy scraping! 🌟

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

�️ Python Web Scraper

📂 Project Structure

� Features

🛠️ Setup Instructions

1. Navigate to the Web Scraper Directory

2. Install Dependencies

3. Run the Application

💻 Usage

📋 Supported File Types

⚙️ Configuration

📝 Notes

📝 Professional and Clear README.md

🚀 How to Use

📜 Legal Disclaimer

🛡️ Contributing

📞 Contact

FilesExpand file tree

web-scrapper

Directory actions

More options

Directory actions

More options

Latest commit

History

web-scrapper

Folders and files

parent directory

README.md

�️ Python Web Scraper

📂 Project Structure

� Features

🛠️ Setup Instructions

1. Navigate to the Web Scraper Directory

2. Install Dependencies

3. Run the Application

💻 Usage

📋 Supported File Types

⚙️ Configuration

📝 Notes

📝 Professional and Clear README.md

🚀 How to Use

📜 Legal Disclaimer

🛡️ Contributing

📞 Contact