You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
153 lines
1.3 KiB
Markdown
153 lines
1.3 KiB
Markdown
2 years ago
|
# Python 爬虫的研究
|
||
|
|
||
|
## 1. 基础
|
||
|
|
||
|
|
||
|
### 1.1 HTTP基本原理
|
||
|
|
||
|
### 1.2 Web网页基础
|
||
|
|
||
|
### 1.3 爬虫的基本原理
|
||
|
|
||
|
### 1.4 Session 与 Cookies
|
||
|
|
||
|
### 1.5 多路加速, 多线程
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
## 2. 爬虫基本库
|
||
|
|
||
|
### 2.1 Requests 库的基本使用
|
||
|
|
||
|
|
||
|
### 2.2 正则表达式
|
||
|
|
||
|
|
||
|
### 2.3 爬虫解析利器 PyQuery
|
||
|
|
||
|
|
||
|
### 2.4 高效存储 MongoDB
|
||
|
|
||
|
|
||
|
### 2.5 Requests + PyQuery + PyMongo 基本案例实战
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
|
||
|
## 3. 多种形式爬取
|
||
|
|
||
|
### 3.1 Ajax 案例
|
||
|
|
||
|
|
||
|
|
||
|
### 3.2 Selenium 案例
|
||
|
|
||
|
|
||
|
|
||
|
### 3.3 aiohttp 异步爬虫案例
|
||
|
|
||
|
|
||
|
### 3.4 Pyppeteer 案例
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
## 4. 反爬应对
|
||
|
|
||
|
### 4.1 代理及代理池
|
||
|
|
||
|
|
||
|
### 4.2 验证码破解
|
||
|
|
||
|
|
||
|
### 4.3 模拟登录
|
||
|
|
||
|
|
||
|
### 4.4 JavaScript 逆向
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
## 5. APP爬虫
|
||
|
|
||
|
|
||
|
### 5.1 抓包工具使用 Charles
|
||
|
|
||
|
|
||
|
|
||
|
### 5.2 实时处理利器 mitmproxy
|
||
|
|
||
|
|
||
|
|
||
|
### 5.3 Appium 的使用
|
||
|
|
||
|
|
||
|
### 5.4 自动化工具 airtest
|
||
|
|
||
|
|
||
|
### 5.5 Xposed
|
||
|
|
||
|
|
||
|
### 5.6 APP 逆向
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
|
||
|
## 6. 智能化解析
|
||
|
|
||
|
### 6.1 技术
|
||
|
|
||
|
|
||
|
|
||
|
### 6.2 工具
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
### 6.3 算法
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
### 6.4 实现
|
||
|
|
||
|
|
||
|
---
|
||
|
|
||
|
## 7. Scrapy 框架
|
||
|
|
||
|
|
||
|
### 7.1 Scrapy 基础
|
||
|
|
||
|
|
||
|
### 7.2 Spider 用法
|
||
|
|
||
|
|
||
|
### 7.3 Middleware 用法
|
||
|
|
||
|
|
||
|
### 7.4 Item Pipeline 用法
|
||
|
|
||
|
|
||
|
### 7.5 动态页面处理
|
||
|
|
||
|
|
||
|
### 7.6 Scrapy-Redis
|
||
|
|
||
|
|
||
|
### 7.7 Scrapyd 部署工具
|
||
|
|
||
|
|
||
|
### 7.8 Scrapy 对接 Docker
|
||
|
|
||
|
|
||
|
### 7.9 Scrapy 对接 Kubernetes 并实现定时爬取
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|