Download Your Favorite Videos with Python: A Simple Web Scraping Guide

Spread the love

Introduction:

Are you tired of manually searching and downloading your favorite videos from websites? If so, Python has your back! In this blog post, we’ll introduce a simple Python script that helps you download MP4 files from a website and save them to a local directory. We’ll use the requests and BeautifulSoup libraries for web scraping and downloading files.

Please note that web scraping and downloading files may violate some websites’ terms of service. Make sure you have permission to scrape a website and follow its robots.txt rules.

Getting Started:

First, make sure you have Python installed on your system. If not, download and install it from https://www.python.org/downloads/.

Next, install the required libraries using pip:

pip install requests beautifulsoup4

The Python Script:

Here’s the complete Python script that extracts office MP4 files from a website and saves them to a “videos” directory:

import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, unquote

url = 'https://archive.org/download/the-office-us-2005-s-09-e-23-finale-1080p-blu-ray-x-265-silence'
base_url = 'https://ia601506.us.archive.org/32/items/the-office-us-2005-s-09-e-23-finale-1080p-blu-ray-x-265-silence/'

response = requests.get(url)
content = response.content

soup = BeautifulSoup(content, 'html.parser')
table_rows = soup.find_all('tr')

mp4_links = []

for row in table_rows:
    link = row.find('a')
    if link and link['href'].endswith('.mp4'):
        mp4_links.append(link['href'])

video_dir = 'videos'
if not os.path.exists(video_dir):
    os.makedirs(video_dir)

for link in mp4_links:
    decoded_link = unquote(link)
    filename = decoded_link.split('/')[-1]
    download_url = urljoin(base_url, link)

    print(f'Downloading {filename}...')
    response = requests.get(download_url, stream=True)

    with open(os.path.join(video_dir, filename), 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)

    print(f'{filename} downloaded.')

Running the Script:

Save the script to a file, e.g., download_videos.py, and run it using the following command:

python download_videos.py

The script will download all the MP4 files found on the website and save them in a directory named “videos”.

Advanced Version:

To make it run faster, Async your function as below:

import os
import aiohttp
import asyncio
from bs4 import BeautifulSoup
from urllib.parse import urljoin, unquote
from concurrent.futures import ThreadPoolExecutor

url = 'https://archive.org/download/the-office-us-2005-s-09-e-23-finale-1080p-blu-ray-x-265-silence'
base_url = 'https://ia601506.us.archive.org/32/items/the-office-us-2005-s-09-e-23-finale-1080p-blu-ray-x-265-silence/'

async def get_links():
    async with aiohttp.ClientSession() as session:
        async with session.get(url, ssl=False) as response:
            content = await response.text()

    soup = BeautifulSoup(content, 'html.parser')
    table_rows = soup.find_all('tr')

    mp4_links = []

    for row in table_rows:
        link = row.find('a')
        if link and link['href'].endswith('.mp4'):
            mp4_links.append(link['href'])

    return mp4_links

async def download_file(link, base_url, video_dir):
    decoded_link = unquote(link)
    filename = decoded_link.split('/')[-1]
    download_url = urljoin(base_url, link)

    print(f'Downloading {filename}...')
    async with aiohttp.ClientSession() as session:
        async with session.get(download_url, ssl=False) as response:
            with open(os.path.join(video_dir, filename), 'wb') as file:
                while True:
                    chunk = await response.content.read(8192)
                    if not chunk:
                        break
                    file.write(chunk)

    print(f'{filename} downloaded.')

async def main():
    video_dir = 'videos'
    if not os.path.exists(video_dir):
        os.makedirs(video_dir)

    mp4_links = await get_links()

    with ThreadPoolExecutor() as executor:
        await asyncio.gather(*[asyncio.wrap_future(executor.submit(asyncio.run, download_file(link, base_url, video_dir))) for link in mp4_links])

if __name__ == '__main__':
    asyncio.run(main())

Conclusion:

With this simple Python script, you can easily download your favorite videos from websites without the hassle of searching and downloading them manually. As always, remember to respect website owners’ terms of service and follow their robots.txt rules. Happy downloading!

Zeren
If you want to know more about me, please get on the about page. :)
Posts created 18

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top
error: Content is protected !!