Python 3.x BeautifulSoup4 Selenium CSV / JSON 5 Tasks Complete

Web Scraping Python Project

A structured collection of real-world scrapers targeting Dawn News, PSX, QS Rankings, Daraz, and Goodreads — with automated report exports.

5
Scrapers
8+
CSV/JSON outputs
10+
PDF reports
100%
Python
Tasks — click to preview output
TASK 01
Dawn Headlines
dawn.com
TASK 02
PSX Indices
psx.com.pk
TASK 03
QS Rankings
topuniversities.com
TASK 04
Daraz Listings
daraz.pk
TASK 05
Goodreads Books
goodreads.com
Live output preview
CSV output
Scraper code
Files generated
dawn_top30_headlines.csv — 30 rows × 4 cols
#HeadlineCategoryTimestampURL
1Pakistan raises interest rate amid IMF talksEconomy2025-09-14 08:12dawn.com/…
2Karachi heatwave prompts emergency measuresPakistan2025-09-14 07:45dawn.com/…
3CPEC Phase II projects resume after delayBusiness2025-09-14 06:30dawn.com/…
… 27 more rows
# task1_dawn.py — Dawn top 30 headlines import requests, csv from bs4 import BeautifulSoup url = "https://www.dawn.com" soup = BeautifulSoup(requests.get(url).text, "html.parser") headlines = soup.select("article.story h2 a")[:30] with open("dawn_top30_headlines.csv", "w", newline="") as f: w = csv.writer(f) w.writerow(["Headline", "URL", "Category"]) for h in headlines: w.writerow([h.text.strip(), h["href"]])
Web-Scraping-Python-Project/ ├── task1_dawn/ │ ├── task1_dawn.py │ ├── export_task1_reports.py │ ├── dawn_top30_headlines.csv │ ├── dawn_top30_headlines.json │ └── reports/ ← 2 PDF + 2 DOCX exports ├── task2_psx/ │ ├── task2_PSX.py │ ├── export_task2_reports.py │ ├── psx_indices.csv + psx_mainboard.csv │ └── reports/ ← 4 PDF + 4 DOCX exports ├── task3_qswr/ │ ├── task3_QSWR.py │ ├── export_task3_reports.py │ ├── qs_top50.csv + qs_by_country_top15.csv + qs_by_region.csv │ └── reports/ ← 5 PDF exports ├── task4_daraz/ │ ├── task4_Daraz.py │ ├── export_task4_reports.py │ ├── daraz_iphone15_listings.csv │ └── reports/ ← 3 PDF exports ├── task5_goodreads/ │ ├── task5_GoodReads.py │ ├── export_task5_reports.py │ ├── goodreads_books.csv + goodreads_books_partial.csv │ └── reports/ ← 1 PDF export ├── main.py ├── requirements.txt └── Intro to the Project
Tech stack
Py
Python 3
Core language
BS
BeautifulSoup4
HTML parsing
Se
Selenium
JS-rendered sites
Rq
Requests
HTTP client
Pd
ReportLab
PDF generation
Quick setup
$ git clone https://github.com/RAZAAli901/Web-Scraping-Python-Project.git
$ pip install -r requirements.txt
$ python task1_dawn/task1_dawn.py