## ç½ç«å°å
> ç»ç»æå»º
* GitHub Pages(å½å¤): https://ailearning.apachecn.org
* Gitee Pages(å½å
): https://apachecn.gitee.io/ailearning
> ç¬¬ä¸æ¹ç«é¿
å°åA: xxx (欢è¿çè¨ï¼æä»¬å®åè¡¥å
)
## ä¸è½½
### Docker
```
docker pull apachecn0/ailearning
docker run -tid -p :80 apachecn0/ailearning
# è®¿é® http://localhost:{port} æ¥çææ¡£
```
### PYPI
```
pip install apachecn-ailearning
apachecn-ailearning
# è®¿é® http://localhost:{port} æ¥çææ¡£
```
### NPM
```
npm install -g ailearning
ailearning
# è®¿é® http://localhost:{port} æ¥çææ¡£
```
## ç»ç»ä»ç»
* åä½orä¾µæï¼è¯·èç³»: `[email protected]`
* **æä»¬ä¸æ¯ Apache ç宿¹ç»ç»/æºæ/å¢ä½ï¼åªæ¯ Apache ææ¯æ ï¼ä»¥å AIï¼çç±å¥½è
ï¼**
* **ApacheCN - å¦ä¹ 群ã724187166ã
**
> ä¸ç§æ°ææ¯ä¸æ¦å¼å§æµè¡ï¼ä½ è¦ä¹åä¸åè·¯æºï¼è¦ä¹æä¸ºéºè·¯ç³ãââStewart Brand
# 路线å¾
* å
¥é¨åªç: æ¥éª¤ 1 => 2 => 3ï¼ä½ å¯ä»¥å½å¤§çï¼
* ä¸çº§è¡¥å
- èµæåº:
> è¡¥å
* 头æ¡è§é¢æ±æ»:
* ç®æ³å·é¢:
* é¢è¯æ±è:
* æºå¨å¦ä¹ 宿:
* NLPæå¦è§é¢:
* **AI常ç¨å½æ°è¯´æ**:
## 1.æºå¨å¦ä¹ - åºç¡
> æ¯æçæ¬
| Version | Supported |
| ------- | ------------------ |
| 3.6.x | :x: |
| 2.7.x | :white_check_mark: |
注æäºé¡¹:
- æºå¨å¦ä¹ 宿: ä»
ä»
åªæ¯å¦ä¹ ï¼è¯·ä½¿ç¨ python 2.7.x çæ¬ ï¼3.6.x åªæ¯ä¿®æ¹äºé¨åï¼
### åºæ¬ä»ç»
* èµææ¥æº: Machine Learning in Action(æºå¨å¦ä¹ 宿-个人ç¬è®°)
* ç»ä¸æ°æ®å°å:
* ç¾åº¦äºæå
å°å:
* 书ç±ä¸è½½å°å:
* æºå¨å¦ä¹ ä¸è½½å°å:
* 深度å¦ä¹ æ°æ®å°å:
* æ¨èç³»ç»æ°æ®å°å:
* è§é¢ç½ç«: ä¼é
· ï¼bilibili / Acfun / ç½æäºè¯¾å ï¼å¯ç´æ¥å¨çº¿ææ¾ãï¼æä¸æ¹æç¸åºé¾æ¥ï¼
* -- æ¨è [红è²ç³å¤´](https://github.com/RedstoneWill): [å°æ¹¾å¤§å¦æè½©ç°æºå¨å¦ä¹ ç¬è®°](https://github.com/apachecn/ntu-hsuantienlin-ml)
* -- æ¨è [æºå¨å¦ä¹ ç¬è®°](https://feisky.xyz/machine-learning): https://feisky.xyz/machine-learning
### å¦ä¹ ææ¡£
| 模å | ç« è | ç±»å | è´è´£äºº(GitHub) | QQ |
| --- | --- | --- | --- | --- |
| æºå¨å¦ä¹ 宿 | [第 1 ç« : æºå¨å¦ä¹ åºç¡](docs/ml/1.md) | ä»ç» | [@æ¯çº¢å¨](https://github.com/ElmaDavies) | 1306014226 |
| æºå¨å¦ä¹ 宿 | [第 2 ç« : KNN è¿é»ç®æ³](docs/ml/2.md) | åç±» | [@尤永æ±](https://github.com/youyj521) | 279393323 |
| æºå¨å¦ä¹ 宿 | [第 3 ç« : å³çæ ](docs/ml/3.md) | åç±» | [@æ¯æ¶](https://github.com/jingwangfei) | 844300439 |
| æºå¨å¦ä¹ 宿 | [第 4 ç« : æ´ç´ è´å¶æ¯](docs/ml/4.md) | åç±» | [@wnma3mz](https://github.com/wnma3mz)
[@åæ](https://github.com/kailian) | 1003324213
244970749 |
| æºå¨å¦ä¹ 宿 | [第 5 ç« : Logisticåå½](docs/ml/5.md) | åç±» | [@å¾®å
åå°](https://github.com/DataMonk2017) | 529925688 |
| æºå¨å¦ä¹ 宿 | [第 6 ç« : SVM æ¯æåéæº](docs/ml/6.md) | åç±» | [@ç德红](https://github.com/VPrincekin) | 934969547 |
| ç½ä¸ç»åå
容 | [第 7 ç« : éææ¹æ³ï¼éæºæ£®æå AdaBoostï¼](docs/ml/7.md) | åç±» | [@çå»](https://github.com/jiangzhonglian) | 529815144 |
| æºå¨å¦ä¹ 宿 | [第 8 ç« : åå½](docs/ml/8.md) | åå½ | [@å¾®å
åå°](https://github.com/DataMonk2017) | 529925688 |
| æºå¨å¦ä¹ 宿 | [第 9 ç« : æ åå½](docs/ml/9.md) | åå½ | [@å¾®å
åå°](https://github.com/DataMonk2017) | 529925688 |
| æºå¨å¦ä¹ 宿 | [第 10 ç« : K-Means èç±»](docs/ml/10.md) | èç±» | [@徿æ¸
](https://github.com/xuzhaoqing) | 827106588 |
| æºå¨å¦ä¹ 宿 | [第 11 ç« : å©ç¨ Apriori ç®æ³è¿è¡å
³èåæ](docs/ml/11.md) | é¢ç¹é¡¹é | [@åæµ·é£](https://github.com/WindZQ) | 1049498972 |
| æºå¨å¦ä¹ 宿 | [第 12 ç« : FP-growth 髿åç°é¢ç¹é¡¹é](docs/ml/12.md) | é¢ç¹é¡¹é | [@ç¨å¨](https://github.com/mikechengwei) | 842725815 |
| æºå¨å¦ä¹ 宿 | [第 13 ç« : å©ç¨ PCA æ¥ç®åæ°æ®](docs/ml/13.md) | å·¥å
· | [@å»ç«å¨](https://github.com/lljuan330) | 835670618 |
| æºå¨å¦ä¹ 宿 | [第 14 ç« : å©ç¨ SVD æ¥ç®åæ°æ®](docs/ml/14.md) | å·¥å
· | [@å¼ ä¿ç](https://github.com/marsjhao) | 714974242 |
| æºå¨å¦ä¹ 宿 | [第 15 ç« : å¤§æ°æ®ä¸ MapReduce](docs/ml/15.md) | å·¥å
· | [@wnma3mz](https://github.com/wnma3mz) | 1003324213 |
| Ml项ç®å®æ | [第 16 ç« : æ¨èç³»ç»ï¼å·²è¿ç§»ï¼](docs/ml/16.md) | é¡¹ç® | [æ¨èç³»ç»ï¼è¿ç§»åå°åï¼](https://github.com/apachecn/RecommenderSystems) | |
| ç¬¬ä¸æçæ»ç» | [2017-04-08: ç¬¬ä¸æçæ»ç»](docs/report/2017-04-08.md) | æ»ç» | æ»ç» | 529815144 |
### ç½ç«è§é¢
> [ç¥ä¹é®ç-çç¸å¦-æºå¨å¦ä¹ 该æä¹å
¥é¨ï¼](https://www.zhihu.com/question/20691338/answer/248678328)
å½ç¶æç¥éï¼ç¬¬ä¸å¥å°±ä¼è¢«åæ§½ï¼å 为ç§çåºèº«ç人ï¼ä¸å±çåäºä¸å£å¾æ²«ï¼è¯´å»Xï¼è¿è¯è®º Andrew Ng çè§é¢ãã
æè¿ç¥éè¿æä¸é¨å人ï¼ç Andrew Ng çè§é¢å°±æ¯ç䏿ï¼é£ç¥ç§çæ°å¦æ¨å¯¼ï¼é£è¿·ä¹å¾®ç¬çè±æççæå¦ï¼æä½å°å䏿¯è¿æ ·èµ°è¿æ¥çï¼ï¼ æçå¿å¯è½æ¯ä½ 们é½çï¼å 为æå¨ç½ä¸æ¶èè¿ä¸10é¨ãæºå¨å¦ä¹ ãç¸å
³è§é¢ï¼å¤å å½å
æ¬å飿 ¼çæç¨: 7æ+å°è±¡ ççï¼æé½å¾é¾å»å¬æï¼ç´å°æä¸å¤©ï¼è¢«ä¸ä¸ªç¾åº¦çé«çº§ç®æ³åæå¸æ¨è说: ãæºå¨å¦ä¹ 宿ãè¿ä¸éï¼éä¿ææï¼ä½ å»è¯è¯ï¼ï¼
æè¯äºè¯ï¼è¿å¥½æçPythonåºç¡åè°è¯è½åè¿ä¸éï¼åºæ¬ä¸ä»£ç é½è°è¯è¿ä¸éï¼å¾å¤é«å¤§ä¸ç "ç论+æ¨å¯¼"ï¼å¨æç¼ä¸åæäºå 个 "å åä¹é¤+循ç¯"ï¼ææ³è¿ä¸å°±æ¯åæè¿æ ·çç¨åºåæ³è¦çå
¥é¨æç¨ä¹ï¼
å¾å¤ç¨åºå说æºå¨å¦ä¹ TM 太é¾å¦äºï¼æ¯çï¼ç TM é¾å¦ï¼ææ³æé¾çæ¯: 没æä¸æ¬åãæºå¨å¦ä¹ 宿ã飿 ·çä½è
æ¿æä»¥ç¨åºå Coding è§åº¦å»ç»å¤§å®¶è®²è§£ï¼ï¼
æè¿å 天ï¼GitHub æ¶¨äº 300é¢ starï¼å 群ç200äººï¼ ç°å¨è¿å¨ä¸æçå¢å ++ï¼ææ³å¤§å®¶å¯è½é½æ¯æå身åå§ï¼
å¾å¤æ³å
¥é¨æ°æå°±æ¯è¢«å¿½æ çæ¶èæ¶èåæ¶èï¼ä½æ¯æåè¿æ¯ä»ä¹é½æ²¡æå¦å°ï¼ä¹å°±æ¯"èµæºæ¶èå®¶"ï¼ä¹è®¸æ°æè¦çå°±æ¯ [MachineLearning(æºå¨å¦ä¹ ) å¦ä¹ 路线å¾](https://docs.apachecn.org/map)ãæ²¡éï¼æå¯ä»¥ç»ä½ 们çä¸ä»½ï¼å 为æä»¬è¿éè¿è§é¢è®°å½ä¸æ¥æä»¬çå¦ä¹ è¿ç¨ãæ°´å¹³å½ç¶ä¹æéï¼ä¸è¿å¯¹äºæ°æå
¥é¨ï¼ç»å¯¹æ²¡é®é¢ï¼å¦æä½ è¿ä¸ä¼ï¼é£ç®æè¾ï¼ï¼
> è§é¢æä¹çï¼

1. ç论ç§çåºèº«-建议å»å¦ä¹ Andrew Ng çè§é¢ï¼Ng çè§é¢ç»å¯¹æ¯æå¨ï¼è¿ä¸ªæ¯åº¸ç½®çï¼
2. ç¼ç è½å强 - 建议çæä»¬ç[ãæºå¨å¦ä¹ 宿-æå¦çã](https://space.bilibili.com/97678687/#!/channel/detail?cid=22486)
3. ç¼ç è½åå¼± - 建议çæä»¬ç[ãæºå¨å¦ä¹ 宿-讨论çã](https://space.bilibili.com/97678687/#!/channel/detail?cid=13045)ï¼ä¸è¿å¨ççè®ºçæ¶åï¼ç æå¦ç-ç论é¨åï¼è®¨è®ºççåºè¯å¤ªå¤ï¼ä¸è¿å¨è®²è§£ä»£ç çæ¶åæ¯ä¸è¡ä¸è¡è®²è§£çï¼æä»¥ï¼æ ¹æ®èªå·±çéæ±ï¼èªç±çç»åã
> ãå
è´¹ãæ°å¦æå¦è§é¢ - 坿±å¦é¢ å
¥é¨ç¯
* [@äºæ¯æ¢]() æ¨è: 坿±å¦é¢-ç½æå
¬å¼è¯¾
| æ¦ç | ç»è®¡ | 线æ§ä»£æ° |
| - | - | - |
| [坿±å¦é¢(æ¦ç)](http://open.163.com/special/Khan/probability.html) | [坿±å¦é¢(ç»è®¡å¦)](http://open.163.com/special/Khan/khstatistics.html)| [坿±å¦é¢(线æ§ä»£æ°)](http://open.163.com/special/Khan/linearalgebra.html)
> æºå¨å¦ä¹ è§é¢ - ApacheCN æå¦ç
|||
| - | - |
| AcFun | Bç« |
|
|
|
| ä¼é
· | ç½æäºè¯¾å |
|
|
|
> ãå
è´¹ãæºå¨/深度å¦ä¹ è§é¢ - å´æ©è¾¾
| æºå¨å¦ä¹ | 深度å¦ä¹ |
| - | - |
| [å´æ©è¾¾æºå¨å¦ä¹ ](http://study.163.com/course/courseMain.htm?courseId=1004570029) | [ç¥ç»ç½ç»å深度å¦ä¹ ](http://mooc.study.163.com/course/2001281002?tid=2001392029) |
## 2.深度å¦ä¹
> æ¯æçæ¬
| Version | Supported |
| ------- | ------------------ |
| 3.6.x | :white_check_mark: |
| 2.7.x | :x: |
### å
¥é¨åºç¡
1. [ååä¼ é](/docs/dl/ååä¼ é.md): https://www.cnblogs.com/charlotte77/p/5629865.html
2. [CNNåç](/docs/dl/CNNåç.md): http://www.cnblogs.com/charlotte77/p/7759802.html
3. [RNNåç](/docs/dl/RNNåç.md): https://blog.csdn.net/qq_39422642/article/details/78676567
4. [LSTMåç](/docs/dl/LSTMåç.md): https://blog.csdn.net/weixin_42111770/article/details/80900575
### Pytorch - æç¨
-- å¾
æ´æ°
### TensorFlow 2.0 - æç¨
-- å¾
æ´æ°
> ç®å½ç»æ:
* [å®è£
æå](docs/TensorFlow2.x/å®è£
æå.md)
* [Keras å¿«éå
¥é¨](docs/TensorFlow2.x/Keraså¿«éå
¥é¨.md)
* [å®æé¡¹ç® 1 çµå½±æ
æåç±»](docs/TensorFlow2.x/宿项ç®_1_çµå½±æ
æåç±».md)
* [å®æé¡¹ç® 2 æ±½è½¦çæ²¹æç](docs/TensorFlow2.x/宿项ç®_2_æ±½è½¦çæ²¹æç.md)
* [å®æé¡¹ç® 3 ä¼å è¿æå忬 æå](docs/TensorFlow2.x/宿项ç®_3_ä¼å_è¿æå忬 æå.md)
* [å®æé¡¹ç® 4 å¤è¯è¯èªå¨çæ](docs/TensorFlow2.x/宿项ç®_4_å¤è¯è¯èªå¨çæ.md)
ååï¼åè¯ï¼
è¯æ§æ 注
å½åå®ä½è¯å«
奿³åæ
WordNetå¯ä»¥è¢«ç使¯ä¸ä¸ªåä¹è¯è¯å
¸
è¯å¹²æåï¼stemmingï¼ä¸è¯å½¢è¿åï¼lemmatizationï¼
* https://www.biaodianfu.com/nltk.html/amp
TensorFlow 2.0å¦ä¹ ç½å
* https://github.com/lyhue1991/eat_tensorflow2_in_30_days
## 3.èªç¶è¯è¨å¤ç
> æ¯æçæ¬
| Version | Supported |
| ------- | ------------------ |
| 3.6.x | :white_check_mark: |
| 2.7.x | :x: |
å¦ä¹ è¿ç¨ä¸-å
å¿å¤æçååï¼ï¼ï¼
```python
èªä»å¦ä¹ NLP以åï¼æåç°å½å
ä¸å½å¤çå
¸ååºå«:
1. å¯¹èµæºçæåº¦æ¯å®å
¨ç¸åç:
1) å½å
: 就好å为äºåæ°ï¼ä¸¾åå·¥ä½è£
é¼çä¼è®®ï¼å°±æ¯æ²¡æå¹²è´§ï¼å
¨é¨é½æ¯è±¡å¾æ§çPPTä»ç»ï¼ä¸æ¯é对å¨åçåä½
2ï¼å½å¤: å°±å¥½åæ¯ä¸ºäºæ¨å¨nlpè¿æ¥ä¸æ ·ï¼å享è
åç§å¹²è´§èµæåå
·ä½çå®ç°ãï¼ç¹å«æ¯: pythonèªç¶è¯è¨å¤çï¼
2. 论æçå®ç°:
1) åç§é«å¤§ä¸ç论æå®ç°ï¼å´è¿æ¯æ²¡çå°ä¸ä¸ªåæ ·çGitHub项ç®ï¼ï¼å¯è½æçæç´¢è½åå·®äºç¹ï¼ä¸ç´æ²¡æ¾å°ï¼
2ï¼å½å¤å°±ä¸ä¸¾ä¾äºï¼æç䏿ï¼
3. 弿ºçæ¡æ¶
1ï¼å½å¤ç弿ºæ¡æ¶: tensorflow/pytorch ææ¡£+æç¨+è§é¢ï¼å®æ¹æä¾ï¼
2) å½å
ç弿ºæ¡æ¶: é¢é¢ï¼è¿ç举ä¾ä¸åºæ¥ï¼ä½æ¯çé¼å¹å¾ä¸æ¯å½å¤å·®ï¼ï¼MXNetè½ç¶æä¼å¤å½äººåä¸å¼åï¼ä½ä¸è½ç®æ¯å½å
弿ºæ¡æ¶ãåºäºMXNetç卿妿·±åº¦å¦ä¹ (http://zh.d2l.ai & https://discuss.gluon.ai/t/topic/753)䏿æç¨,å·²ç»ç±æ²ç¥(ææ²)以åé¿æ¯é¡¿Â·å¼ 讲æå½å¶ï¼å
¬å¼åå¸(ææ¡£+第ä¸å£æç¨+è§é¢ï¼ã)
æ¯ä¸æ¬¡æ·±å
¥é½è¦å»ç¿»å¢ï¼æ¯ä¸æ¬¡æ·±å
¥é½è¦Googleï¼æ¯ä¸æ¬¡ççå½å
ç说: å工大ã讯é£ãä¸ç§å¤§ãç¾åº¦ãé¿éå¤çé¼ï¼ä½æ¯èµæè¿æ¯å¾å½å¤å»æ¾ï¼
ææ¶åççæºæ¨çï¼ççæç¹ç§ä¸èµ·èªå·±å½å
çææ¯ç¯å¢ï¼
å½ç¶è°¢è°¢å½å
å¾å¤å客大佬ï¼ç¹å«æ¯ä¸äºå
¥é¨çDemoååºæ¬æ¦å¿µããæ·±å
¥çæ°´å¹³æéï¼æ²¡çæã
```

* **ãå
¥é¨é¡»ç¥ãå¿
é¡»äºè§£**:
* **ãå
¥é¨æç¨ãå¼ºçæ¨è: PyTorch èªç¶è¯è¨å¤ç**:
* Python èªç¶è¯è¨å¤ç 第äºç:
* æ¨èä¸ä¸ª[liuhuanyong大佬](https://github.com/liuhuanyong)æ´ççnlpå
¨é¢ç¥è¯ä½ç³»:
* 弿º - è¯åéåºéå:
*
*
*
*
*
*
*
### 1.使ç¨åºæ¯ ï¼ç¾åº¦å
¬å¼è¯¾ï¼
> 第ä¸é¨å å
¥é¨ä»ç»
* 1.) [èªç¶è¯è¨å¤çå
¥é¨ä»ç»](/docs/nlp/1.èªç¶è¯è¨å¤çå
¥é¨ä»ç».md)
> 第äºé¨å æºå¨ç¿»è¯
* 2.) [æºå¨ç¿»è¯](/docs/nlp/2.æºå¨ç¿»è¯.md)
> 第ä¸é¨å ç¯ç« åæ
* 3.1.) [ç¯ç« åæ-å
容æ¦è¿°](/docs/nlp/3.1.ç¯ç« åæ-å
容æ¦è¿°.md)
* 3.2.) [ç¯ç« åæ-å
容æ ç¾](/docs/nlp/3.2.ç¯ç« åæ-å
容æ ç¾.md)
* 3.3.) [ç¯ç« åæ-æ
æåæ](/docs/nlp/3.3.ç¯ç« åæ-æ
æåæ.md)
* 3.4.) [ç¯ç« åæ-èªå¨æè¦](/docs/nlp/3.4.ç¯ç« åæ-èªå¨æè¦.md)
> 第åé¨å UNIT-è¯è¨çè§£ä¸äº¤äºææ¯
* 4.) [UNIT-è¯è¨çè§£ä¸äº¤äºææ¯](/docs/nlp/4.UNIT-è¯è¨çè§£ä¸äº¤äºææ¯.md)
### åºç¨é¢å
#### 䏿åè¯:
* æå»ºDAGå¾
* 卿è§åæ¥æ¾ï¼ç»¼åæ£ååï¼æ£åå æååè¾åºï¼æ±å¾DAGæå¤§æ¦çè·¯å¾
* 使ç¨äºSBMEè¯æè®ç»äºä¸å¥ HMM + Viterbi 模åï¼è§£å³æªç»å½è¯é®é¢
#### 1.ææ¬åç±»ï¼Text Classificationï¼
ææ¬åç±»æ¯ææ è®°å¥åæææ¡£ï¼ä¾å¦çµåé®ä»¶åå¾é®ä»¶åç±»åæ
æåæã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
ææ¬åç±»æ°æ®éã
1. [è·¯é社Newswire主é¢åç±»](http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html)ï¼è·¯é社-21578ï¼ã1987å¹´è·¯é社åºç°çä¸ç³»åæ°é»æä»¶ï¼æç±»å«ç¼å¶ç´¢å¼ã[å¦è§RCV1ï¼RCV2åTRC2](http://trec.nist.gov/data/reuters/reuters.html)ã
2. [IMDBçµå½±è¯è®ºæ
æåç±»ï¼æ¯å¦ç¦ï¼](http://ai.stanford.edu/~amaas/data/sentiment)ãæ¥èªç½ç«imdb.comçä¸ç³»åçµå½±è¯è®ºåå
¶ç§¯æææ¶æçæ
绪ã
3. [æ°é»ç»çµå½±è¯è®ºæ
æåç±»ï¼åº·å¥å°ï¼](http://www.cs.cornell.edu/people/pabo/movie-review-data/)ãæ¥èªç½ç«imdb.comçä¸ç³»åçµå½±è¯è®ºåå
¶ç§¯æææ¶æçæ
绪ã
æå
³æ´å¤ä¿¡æ¯ï¼è¯·åé
å¸å:
[åæ ç¾ææ¬åç±»çæ°æ®é](http://ana.cachopo.org/datasets-for-single-label-text-categorization)ã
> æ
æåæ
æ¯èµå°å: https://www.kaggle.com/c/word2vec-nlp-tutorial
* æ¹æ¡ä¸(0.86): WordCount + æ´ç´ Bayes
* æ¹æ¡äº(0.94): LDA + å类模åï¼knn/å³çæ /é»è¾åå½/svm/xgboost/éæºæ£®æï¼
* a) å³çæ ææä¸æ¯å¾å¥½ï¼è¿ç§è¿ç»ç¹å¾ä¸å¤ªéåç
* b) éè¿åæ°è°æ´ 200 个topicï¼ä¿¡æ¯éä¿åææè¾ä¼ï¼è®¡ç®ä¸»é¢ï¼
* æ¹æ¡ä¸(0.72): word2vec + CNN
* 说å®è¯: 没æä¸ä¸ªå¥½çæºå¨ï¼æ¯è°ä¸åºæ¥ä¸ä¸ªå¥½çç»æ (: é
**éè¿AUC æ¥è¯ä¼°æ¨¡åçææ**
#### 2.è¯è¨æ¨¡åï¼Language Modelingï¼
è¯è¨å»ºæ¨¡æ¶åå¼åä¸ç§ç»è®¡æ¨¡åï¼ç¨äºé¢æµå¥åä¸çä¸ä¸ä¸ªåè¯æä¸ä¸ªåè¯ä¸çä¸ä¸ä¸ªåè¯ã宿¯è¯é³è¯å«åæºå¨ç¿»è¯çä»»å¡ä¸çå置任å¡ã
宿¯è¯é³è¯å«åæºå¨ç¿»è¯çä»»å¡ä¸çå置任å¡ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
è¯è¨å»ºæ¨¡æ°æ®éã
1. [å¤è
¾å ¡é¡¹ç®](https://www.gutenberg.org/)ï¼ä¸ç³»åå
费书ç±ï¼å¯ä»¥ç¨çº¯ææ¬æ£ç´¢åç§è¯è¨ã
2. è¿ææ´å¤æ£å¼çè¯æåºå¾å°äºå¾å¥½çç ç©¶; ä¾å¦:
[叿大å¦ç°ä»£ç¾å½è±è¯æ åè¯æåº](https://en.wikipedia.org/wiki/Brown_Corpus)ã大éè±è¯åè¯æ ·æ¬ã
[è°·æ10亿åè¯æåº](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark)ã
> æ°è¯åç°
* 䏿åè¯æ°è¯åç°
* python3å©ç¨äºä¿¡æ¯åå·¦å³ä¿¡æ¯çµç䏿åè¯æ°è¯åç°
*
> å¥åç¸ä¼¼åº¦è¯å«
* 项ç®å°å: https://www.kaggle.com/c/quora-question-pairs
* è§£å³æ¹æ¡: word2vec + Bi-GRU
> ææ¬çº é
* bi-gram + levenshtein
#### 3.å¾ååå¹ï¼Image Captioningï¼
mageå广¯ä¸ºç»å®å¾åçæææ¬æè¿°çä»»å¡ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
å¾åå广°æ®éã
1. [ä¸ä¸æä¸çå
Œ
±å¯¹è±¡ï¼COCOï¼](http://mscoco.org/dataset/#overview)ãå
å«è¶
è¿12ä¸å¼ 带æè¿°çå¾åçéå
2. [Flickr 8K](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html)ãä»flickr.comè·åç8å个æè¿°å¾åçéåã
3. [Flickr 30K](http://shannon.cs.illinois.edu/DenotationGraph/)ãä»flickr.comè·åç3ä¸ä¸ªæè¿°å¾åçéåã
欲äºè§£æ´å¤ï¼è¯·çå¸å:
[æ¢ç´¢å¾åå广°æ®éï¼2016å¹´](http://sidgan.me/technical/2016/01/09/Exploring-Datasets)
#### 4.æºå¨ç¿»è¯ï¼Machine Translationï¼
æºå¨ç¿»è¯æ¯å°ææ¬ä»ä¸ç§è¯è¨ç¿»è¯æå¦ä¸ç§è¯è¨çä»»å¡ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
æºå¨ç¿»è¯æ°æ®éã
1. [å æ¿å¤§ç¬¬36å±è®®ä¼çåè°å½ä¼è®®å](https://www.isi.edu/natural-language/download/hansard/)ãæå¯¹çè±è¯åæ³è¯å¥åã
2. [欧洲议ä¼è¯è®¼å¹³è¡è¯æåº1996-2011](http://www.statmt.org/europarl/)ãå¥å对ä¸å¥æ¬§æ´²è¯è¨ã
æå¤§éæ åæ°æ®éç¨äºå¹´åº¦æºå¨ç¿»è¯ææ; çå°:
[ç»è®¡æºå¨ç¿»è¯](http://www.statmt.org/)
> æºå¨ç¿»è¯
* Encoder + Decoder(Attention)
* åèæ¡ä¾: http://pytorch.apachecn.org/cn/tutorials/intermediate/seq2seq_translation_tutorial.html
#### 5.é®çç³»ç»ï¼Question Answeringï¼
é®çæ¯ä¸é¡¹ä»»å¡ï¼å
¶ä¸æä¾äºä¸ä¸ªå¥åæææ¬æ ·æ¬ï¼ä»ä¸æåºé®é¢å¹¶ä¸å¿
é¡»åçé®é¢ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
é®é¢åçæ°æ®éã
1. [æ¯å¦ç¦é®é¢åçæ°æ®éï¼SQuADï¼](https://rajpurkar.github.io/SQuAD-explorer/)ãåçæå
³ç»´åºç¾ç§æç« çé®é¢ã
2. [Deepmindé®é¢åçè¯æåº](https://github.com/deepmind/rc-data)ã仿¯æ¥é®æ¥åçæå
³æ°é»æç« çé®é¢ã
3. [äºé©¬éé®çæ°æ®](http://jmcauley.ucsd.edu/data/amazon/qa/)ãåçæå
³äºé©¬é产åçé®é¢ã
æå
³æ´å¤ä¿¡æ¯ï¼è¯·åé
å¸å:
[æ°æ®é: æå¦ä½è·å¾é®çç½ç«çè¯æåºï¼å¦QuoraæYahoo AnswersæStack Overflowæ¥åæçæ¡è´¨éï¼](https://www.quora.com/Datasets-How-can-I-get-corpus-of-a-question-answering-website-like-Quora-or-Yahoo-Answers-or-Stack-Overflow-for-analyzing-answer-quality)
#### 6.è¯é³è¯å«ï¼Speech Recognitionï¼
è¯é³è¯å«æ¯å°å£è¯çé³é¢è½¬æ¢ä¸ºäººç±»å¯è¯»ææ¬çä»»å¡ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
è¯é³è¯å«æ°æ®éã
1. [TIMITå£°å¦ - è¯é³è¿ç»è¯é³è¯æåº](https://catalog.ldc.upenn.edu/LDC93S1)ã䏿¯å
è´¹çï¼ä½å å
¶å¹¿æ³ä½¿ç¨èä¸å¸ãå£è¯ç¾å½è±è¯åç¸å
³ç转å½ã
2. [VoxForge](http://voxforge.org/)ãç¨äºæå»ºç¨äºè¯é³è¯å«ç弿ºæ°æ®åºç项ç®ã
3. [LibriSpeech ASRè¯æåº](http://www.openslr.org/12/)ãä»LibriVoxæ¶éç大éè±è¯æå£°è¯»ç©ã
#### 7.èªå¨ææï¼Document Summarizationï¼
ææ¡£æè¦æ¯å建è¾å¤§ææ¡£çç®çææä¹æè¿°çä»»å¡ã
ä¸é¢æ¯ä¸äºå¾å¥½çåå¦è
ææ¡£æè¦æ°æ®éã
1. [æ³å¾æ¡ä¾æ¥åæ°æ®é](https://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports)ãæ¶éäº4000份æ³å¾æ¡ä»¶åå
¶æè¦ã
2. [TIPSTERææ¬æè¦è¯ä¼°ä¼è®®è¯æåº](http://www-nlpir.nist.gov/related_projects/tipster_summac/cmp_lg.html)ãæ¶éäºè¿200份æä»¶åå
¶æè¦ã
3. [è±è¯æ°é»ææ¬çAQUAINTè¯æåº](https://catalog.ldc.upenn.edu/LDC2002T31)ã䏿¯å
è´¹çï¼èæ¯å¹¿æ³ä½¿ç¨çãæ°é»æç« çè¯æåºã
欲äºè§£æ´å¤ä¿¡æ¯:
[ææ¡£çè§£ä¼è®®ï¼DUCï¼ä»»å¡](http://www-nlpir.nist.gov/projects/duc/data.html)ã
[å¨åªéå¯ä»¥æ¾å°ç¨äºææ¬æè¦çè¯å¥½æ°æ®éï¼](https://www.quora.com/Where-can-I-find-good-data-sets-for-text-summarization)
> å½åå®ä½è¯å«
* Bi-LSTM CRF
* åèæ¡ä¾: http://pytorch.apachecn.org/cn/tutorials/beginner/nlp/advanced_tutorial.html
* CRFæ¨èææ¡£: https://www.jianshu.com/p/55755fc649b1
> ææ¬æè¦
* **æ½åå¼**
* word2vec + textrank
* word2vecæ¨èææ¡£: https://www.zhihu.com/question/44832436/answer/266068967
* textrankæ¨èææ¡£: https://blog.csdn.net/BaiHuaXiu123/article/details/77847232
## Graphå¾è®¡ç®ãæ
¢æ
¢æ´æ°ã
* æ°æ®é: [data/nlp/graph](data/nlp/graph)
* å¦ä¹ èµæ: spark graphX宿.pdf ãæä»¶å¤ªå¤§ä¸æ¹ä¾¿æä¾ï¼èªå·±ç¾åº¦ã
## ç¥è¯å¾è°±
* ç¥è¯å¾è°±ï¼æåªè®¤ [SimmerChan](https://www.zhihu.com/people/simmerchan): [ãç¥è¯å¾è°±-ç»AIè£
个大èã](https://zhuanlan.zhihu.com/knowledgegraph)
* 说å®è¯ï¼ææ¯çè¿å主èå¥åçå客é¿å¤§çï¼åçççæ¯æ·±å
¥æµ
åºãæå¾åæ¬¢ï¼æä»¥å°±å享ç»å¤§å®¶ï¼å¸æä½ 们ä¹å欢ã
### è¿ä¸æ¥é
读
妿æ¨å¸ææ´æ·±å
¥ï¼æ¬èæä¾äºå
¶ä»æ°æ®éå表ã
1. [ç»´åºç¾ç§ç ç©¶ä¸ä½¿ç¨çææ¬æ°æ®é](https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research#Text_data)
2. [æ°æ®é: 计ç®è¯è¨å¦å®¶åèªç¶è¯è¨å¤çç 究人å使ç¨çä¸»è¦ææ¬è¯æåºæ¯ä»ä¹ï¼](https://www.quora.com/Datasets-What-are-the-major-text-corpora-used-by-computational-linguists-and-natural-language-processing-researchers-and-what-are-the-characteristics-biases-of-each-corpus)
3. [æ¯å¦ç¦ç»è®¡èªç¶è¯è¨å¤çè¯æåº](https://nlp.stanford.edu/links/statnlp.html#Corpora)
4. [æåæ¯é¡ºåºæåçNLPæ°æ®éå表](https://github.com/niderhoff/nlp-datasets)
5. [è¯¥æºæNLTK](http://www.nltk.org/nltk_data/)
6. [å¨DL4J䏿弿·±åº¦å¦ä¹ æ°æ®](https://deeplearning4j.org/opendata)
7. [NLPæ°æ®é](https://github.com/caesar0301/awesome-public-datasets#natural-language)
8. å½å
弿¾æ°æ®é: https://bosonnlp.com/dev/resource
## è´¡ç®è
ä¿¡æ¯
* è´¡ç®è
/è´è´£äºº/群管ç:
**欢è¿è´¡ç®è
䏿ç追å **
## å
责声æ - ãåªä¾å¦ä¹ åèã
* ApacheCN 纯粹åºäºå¦ä¹ ç®çä¸ä¸ªäººå
´è¶£ç¿»è¯æ¬ä¹¦
* ApacheCN ä¿ç对æ¤çæ¬è¯æçç½²åæåå
¶å®ç¸å
³æå©
## **åè®®**
* 以å项ç®å议为åã
* ApacheCN è´¦å·ä¸æ²¡æåè®®ç项ç®ï¼ä¸å¾è§ä¸º [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.zh)ã
---
## èµææ¥æº:
* ãæ¯èµæ¶éå¹³å°ã: https://github.com/iphysresearch/DataSciComp
* https://github.com/pbharrin/machinelearninginaction
* https://machinelearningmastery.com/datasets-natural-language-processing
## æè°¢ä¿¡
æè¿æ ææ¶å°ç¾¤åæ¨éç龿¥ï¼åç°å¾å°å¤§ä½¬é«åº¦ç认å¯ï¼å¹¶å¨çå¿çæ¨å¹¿
卿¤æè°¢:
* [éåä½](https://www.zhihu.com/org/liang-zi-wei-48):
* 人工æºè½åæ²¿è®²ä¹ :
## èµå©æä»¬