labunix's blog

labunixのラボUnix

numpy,pandas,matplotlib,scikit-learnを入れてみる

■numpy,pandas,matplotlib,scikit-learnを入れてみる

$ python3 -V
Python 3.11.2

$ python3 -m venv data-analysis
$ source data-analysis/bin/activate

■寄り道:パッケージの依存関係を確認できるようにする
 pipの依存関係解決は甘めなので

$ pip3 install pipdeptree
$ pipdeptree 
pipdeptree==2.23.1
├── packaging [required: >=23.1, installed: 24.1]
└── pip [required: >=23.1.2, installed: 24.2]
setuptools==66.1.1

■numpyを入れる

$ pip3 install numpy
$ pipdeptree
numpy==2.1.0
pipdeptree==2.23.1
├── packaging [required: >=23.1, installed: 24.1]
└── pip [required: >=23.1.2, installed: 24.2]
setuptools==66.1.1

■pandasを入れる
 pandasはnumpyのバージョンに依存している

$ pip install pandas
$ pipdeptree
pandas==2.2.2
├── numpy [required: >=1.23.2, installed: 2.1.0]
├── python-dateutil [required: >=2.8.2, installed: 2.9.0.post0]
│   └── six [required: >=1.5, installed: 1.16.0]
├── pytz [required: >=2020.1, installed: 2024.1]
└── tzdata [required: >=2022.7, installed: 2024.1]
pipdeptree==2.23.1
├── packaging [required: >=23.1, installed: 24.1]
└── pip [required: >=23.1.2, installed: 24.2]
setuptools==66.1.1

■matplotlibを入れる
 matplotlibはnumpyのバージョンに依存している

$ pip install matplotlib
$ pipdeptree
matplotlib==3.9.2
├── contourpy [required: >=1.0.1, installed: 1.3.0]
│   └── numpy [required: >=1.23, installed: 2.1.0]
├── cycler [required: >=0.10, installed: 0.12.1]
├── fonttools [required: >=4.22.0, installed: 4.53.1]
├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
├── numpy [required: >=1.23, installed: 2.1.0]
├── packaging [required: >=20.0, installed: 24.1]
├── pillow [required: >=8, installed: 10.4.0]
├── pyparsing [required: >=2.3.1, installed: 3.1.4]
└── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
    └── six [required: >=1.5, installed: 1.16.0]
pandas==2.2.2
├── numpy [required: >=1.23.2, installed: 2.1.0]
├── python-dateutil [required: >=2.8.2, installed: 2.9.0.post0]
│   └── six [required: >=1.5, installed: 1.16.0]
├── pytz [required: >=2020.1, installed: 2024.1]
└── tzdata [required: >=2022.7, installed: 2024.1]
pipdeptree==2.23.1
├── packaging [required: >=23.1, installed: 24.1]
└── pip [required: >=23.1.2, installed: 24.2]
setuptools==66.1.1

■scikit-learnを入れる
 scikit-learnはnumpyのバージョンに依存している

$ pip install scikit-learn
$ pipdeptree
matplotlib==3.9.2
├── contourpy [required: >=1.0.1, installed: 1.3.0]
│   └── numpy [required: >=1.23, installed: 2.1.0]
├── cycler [required: >=0.10, installed: 0.12.1]
├── fonttools [required: >=4.22.0, installed: 4.53.1]
├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
├── numpy [required: >=1.23, installed: 2.1.0]
├── packaging [required: >=20.0, installed: 24.1]
├── pillow [required: >=8, installed: 10.4.0]
├── pyparsing [required: >=2.3.1, installed: 3.1.4]
└── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
    └── six [required: >=1.5, installed: 1.16.0]
pandas==2.2.2
├── numpy [required: >=1.23.2, installed: 2.1.0]
├── python-dateutil [required: >=2.8.2, installed: 2.9.0.post0]
│   └── six [required: >=1.5, installed: 1.16.0]
├── pytz [required: >=2020.1, installed: 2024.1]
└── tzdata [required: >=2022.7, installed: 2024.1]
pipdeptree==2.23.1
├── packaging [required: >=23.1, installed: 24.1]
└── pip [required: >=23.1.2, installed: 24.2]
scikit-learn==1.5.1
├── joblib [required: >=1.2.0, installed: 1.4.2]
├── numpy [required: >=1.19.5, installed: 2.1.0]
├── scipy [required: >=1.6.0, installed: 1.14.1]
│   └── numpy [required: >=1.23.5,<2.3, installed: 2.1.0]
└── threadpoolctl [required: >=3.1.0, installed: 3.5.0]
setuptools==66.1.1

■特定のパッケージがインストールされている理由を調べる

$ pipdeptree --reverse --packages numpy
numpy==2.1.0
├── contourpy==1.3.0 [requires: numpy>=1.23]
│   └── matplotlib==3.9.2 [requires: contourpy>=1.0.1]
├── matplotlib==3.9.2 [requires: numpy>=1.23]
├── pandas==2.2.2 [requires: numpy>=1.23.2]
├── scikit-learn==1.5.1 [requires: numpy>=1.19.5]
└── scipy==1.14.1 [requires: numpy>=1.23.5,<2.3]
    └── scikit-learn==1.5.1 [requires: scipy>=1.6.0]

$ pipdeptree --reverse --packages six
six==1.16.0
└── python-dateutil==2.9.0.post0 [requires: six>=1.5]
    ├── matplotlib==3.9.2 [requires: python-dateutil>=2.7]
    └── pandas==2.2.2 [requires: python-dateutil>=2.8.2]

■依存関係の可視化(dot言語で扱える形式)

$ pip install graphviz
$ pipdeptree  --graph-output  dot 
digraph {
	"python-dateutil" -> six [label=">=1.5"]
	"python-dateutil" [label="python-dateutil\n2.9.0.post0"]
	"scikit-learn" -> joblib [label=">=1.2.0"]
	"scikit-learn" -> numpy [label=">=1.19.5"]
	"scikit-learn" -> scipy [label=">=1.6.0"]
	"scikit-learn" -> threadpoolctl [label=">=3.1.0"]
	"scikit-learn" [label="scikit-learn\n1.5.1"]
	contourpy -> numpy [label=">=1.23"]
	contourpy [label="contourpy\n1.3.0"]
	cycler [label="cycler\n0.12.1"]
	fonttools [label="fonttools\n4.53.1"]
	graphviz [label="graphviz\n0.20.3"]
	joblib [label="joblib\n1.4.2"]
	kiwisolver [label="kiwisolver\n1.4.5"]
	matplotlib -> "python-dateutil" [label=">=2.7"]
	matplotlib -> contourpy [label=">=1.0.1"]
	matplotlib -> cycler [label=">=0.10"]
	matplotlib -> fonttools [label=">=4.22.0"]
	matplotlib -> kiwisolver [label=">=1.3.1"]
	matplotlib -> numpy [label=">=1.23"]
	matplotlib -> packaging [label=">=20.0"]
	matplotlib -> pillow [label=">=8"]
	matplotlib -> pyparsing [label=">=2.3.1"]
	matplotlib [label="matplotlib\n3.9.2"]
	numpy [label="numpy\n2.1.0"]
	packaging [label="packaging\n24.1"]
	pandas -> "python-dateutil" [label=">=2.8.2"]
	pandas -> numpy [label=">=1.23.2"]
	pandas -> pytz [label=">=2020.1"]
	pandas -> tzdata [label=">=2022.7"]
	pandas [label="pandas\n2.2.2"]
	pillow [label="pillow\n10.4.0"]
	pip [label="pip\n24.2"]
	pipdeptree -> packaging [label=">=23.1"]
	pipdeptree -> pip [label=">=23.1.2"]
	pipdeptree [label="pipdeptree\n2.23.1"]
	pyparsing [label="pyparsing\n3.1.4"]
	pytz [label="pytz\n2024.1"]
	scipy -> numpy [label=">=1.23.5,<2.3"]
	scipy [label="scipy\n1.14.1"]
	setuptools [label="setuptools\n66.1.1"]
	six [label="six\n1.16.0"]
	threadpoolctl [label="threadpoolctl\n3.5.0"]
	tzdata [label="tzdata\n2024.1"]
}

■直接svgにする

$ pipdeptree  --graph-output  svg > data-analysis.svg

■同様にpngやpdfにもできるが、やっぱりdot言語で修正してsvgに変換する方が良い

$ sdiff -ls -w 160 data-analysis.dot{,.org}
	"scikit-learn" [label="scikit-learn\n1.5.1" ,shape = box]	      |		"scikit-learn" [label="scikit-learn\n1.5.1"]
	matplotlib [label="matplotlib\n3.9.2" ,shape = box]		      |		matplotlib [label="matplotlib\n3.9.2"]
	numpy [label="numpy\n2.1.0" ,shape = box]			      |		numpy [label="numpy\n2.1.0"]
	pandas [label="pandas\n2.2.2" ,shape = box]			      |		pandas [label="pandas\n2.2.2"]

■dorコマンドでsvg,pngに変換

$ dot -V
dot - graphviz version 2.43.0 (0)

$ dot -Tsvg data-analysis.dot -o data-analysis.svg
$ dot -Tpng data-analysis.dot -o data-analysis.png


■現状を再現できるように保存する

$ pip freeze > requirements.txt

■再現するときには以下で

$ pip install -r requirements.txt

■venv環境を抜ける

$ deactivate