Tasks — click to preview output

TASK 01

Dawn Headlines

dawn.com

TASK 02

PSX Indices

psx.com.pk

TASK 03

QS Rankings

topuniversities.com

TASK 04

Daraz Listings

daraz.pk

TASK 05

Goodreads Books

goodreads.com

Live output preview

CSV output

Scraper code

Files generated

dawn_top30_headlines.csv — 30 rows × 4 cols

#	Headline	Category	Timestamp	URL
1	Pakistan raises interest rate amid IMF talks	Economy	2025-09-14 08:12	dawn.com/…
2	Karachi heatwave prompts emergency measures	Pakistan	2025-09-14 07:45	dawn.com/…
3	CPEC Phase II projects resume after delay	Business	2025-09-14 06:30	dawn.com/…
… 27 more rows

psx_indices.csv + psx_mainboard.csv

Index	Current	Change	Change %	Volume
KSE-100	78,432.10	+312.4	+0.40%	245M
KSE-30	26,891.55	+98.2	+0.37%	112M
KMI-30	51,204.80	-142.0	-0.28%	89M

qs_top50.csv + qs_by_country_top15.csv + qs_by_region.csv

Rank	University	Country	Score	Research
1	MIT	USA	100.0	99.8
2	University of Cambridge	UK	99.2	98.9
3	Stanford University	USA	98.7	99.1
… 47 more rows

daraz_iphone15_listings.csv — 20 rows × 6 cols

#	Title	Price	Seller	Delivery
1	Apple iPhone 15 Pro 128GB — Non Approved — Used	Rs. 199,999	Ratings 2	COD
2	Apple iPhone 15 Pro Max 6.7″ PTA Approved	Rs. 467,842	No Ratings	COD
3	Apple iPhone 15 Pro 256GB — Non Approved — Used	Rs. 209,999	Ratings 1	COD
4	Apple iPhone 15 128GB 6.1″ Factory Locked	Rs. 239,999	Help Center	COD
5	Apple iPhone 15 Pro Max 256GB PTA Approved	Rs. 455,195	No Ratings	COD
… 15 more rows

goodreads_books.csv + goodreads_books_partial.csv

#	Title	Author	Rating	Reviews
1	The Hitchhiker's Guide to the Galaxy	Douglas Adams	4.22	532K
2	Dune	Frank Herbert	4.25	889K
3	1984	George Orwell	4.19	4.1M

# task1_dawn.py — Dawn top 30 headlines
import requests, csv
from bs4 import BeautifulSoup

url = "https://www.dawn.com"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
headlines = soup.select("article.story h2 a")[:30]

with open("dawn_top30_headlines.csv", "w", newline="") as f:
    w = csv.writer(f)
    w.writerow(["Headline", "URL", "Category"])
    for h in headlines:
        w.writerow([h.text.strip(), h["href"]])

# task2_PSX.py — PSX indices & mainboard
import requests, csv, json
from bs4 import BeautifulSoup

url = "https://www.psx.com.pk/market-summary/"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
rows = soup.select("table.indices tr")

data = []
for row in rows[1:]:
    cols = [td.text.strip() for td in row.select("td")]
    if cols: data.append(cols)

# task3_QSWR.py — QS World Rankings top 50
from selenium import webdriver
from selenium.webdriver.common.by import By
import csv, time

driver = webdriver.Chrome()
driver.get("https://www.topuniversities.com/world-university-rankings")
time.sleep(3)
rows = driver.find_elements(By.CSS_SELECTOR, "div.uni-row")

# task4_Daraz.py — iPhone 15 listings scraper
from selenium import webdriver
from selenium.webdriver.common.by import By
import csv, time

driver = webdriver.Chrome()
driver.get("https://www.daraz.pk/iphone-15/")
time.sleep(3)

products = driver.find_elements(By.CSS_SELECTOR, "div[data-qa-locator]")
with open("daraz_iphone15_listings.csv", "w") as f:
    w = csv.writer(f)
    w.writerow(["Title","Price","Seller","Ratings","DeliveryOptions","ProductURL"])

# task5_GoodReads.py — top books scraper
import requests, csv
from bs4 import BeautifulSoup

url = "https://www.goodreads.com/list/show/1.Best_Books_Ever"
headers = {"User-Agent": "Mozilla/5.0"}
soup = BeautifulSoup(requests.get(url, headers=headers).text, "html.parser")
books = soup.select("tr[itemtype]")

Web-Scraping-Python-Project/ ├── task1_dawn/ │ ├── task1_dawn.py │ ├── export_task1_reports.py │ ├── dawn_top30_headlines.csv │ ├── dawn_top30_headlines.json │ └── reports/ ← 2 PDF + 2 DOCX exports ├── task2_psx/ │ ├── task2_PSX.py │ ├── export_task2_reports.py │ ├── psx_indices.csv + psx_mainboard.csv │ └── reports/ ← 4 PDF + 4 DOCX exports ├── task3_qswr/ │ ├── task3_QSWR.py │ ├── export_task3_reports.py │ ├── qs_top50.csv + qs_by_country_top15.csv + qs_by_region.csv │ └── reports/ ← 5 PDF exports ├── task4_daraz/ │ ├── task4_Daraz.py │ ├── export_task4_reports.py │ ├── daraz_iphone15_listings.csv │ └── reports/ ← 3 PDF exports ├── task5_goodreads/ │ ├── task5_GoodReads.py │ ├── export_task5_reports.py │ ├── goodreads_books.csv + goodreads_books_partial.csv │ └── reports/ ← 1 PDF export ├── main.py ├── requirements.txt └── Intro to the Project

Tech stack

Py

Python 3

Core language

BS

BeautifulSoup4

HTML parsing

Se

Selenium

JS-rendered sites

Rq

Requests

HTTP client

Pd

ReportLab

PDF generation

Quick setup

$ git clone https://github.com/RAZAAli901/Web-Scraping-Python-Project.git

$ pip install -r requirements.txt

$ python task1_dawn/task1_dawn.py

Web Scraping Python Project