2023-11-01ãã1ã¶æéã®è¨äºä¸è¦§
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ BERT ãã転移å¦ç¿ãã¾ãããã®ç« ã®ããã¾ã§ã®å®è£ ã¨ç¹ããããªããªãã¾ãããTransformers ã©ã¤ãã©ãªã® Trainer ã使ãã¾ãã import os import datasets import evaluate import numpy as np import pandas as pd froâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ä½ãããã®ãã©ã¡ã¼ã¿ããã¥ã¼ãã³ã°ãã¾ãã # ref: https://www.shoeisha.co.jp/book/detail/9784798157184 import re from collections import defaultdict import joblib import pandas as pd import torch from genâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ RNN ã§ç¢ºççå¾é éä¸æ³ãç¨ãã¦å¦ç¿ãã è¨èªå¦ç100æ¬ãã㯠2020ã82. 確ççå¾é éä¸æ³ã«ããå¦ç¿ã - u++ã®åå¿é² ã¨åæ§ã§ãã # ref: https://www.shoeisha.co.jp/book/detail/9784798157184 import re from collâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ CNN ãå®è£ ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼ã®ãµã³ãã«ã³ã¼ããä¸é¨æµç¨ãã¾ããã # ref: https://www.shoeisha.co.jp/book/detailâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ RNN ãåæ¹åãã¾ãããå ·ä½çã«ã¯ bidirectional=True ã«ããç¶ã層㮠hidden_size ã 2 åã«ãã¦ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ Google Newsãã¼ã¿ã»ããã®å¦ç¿æ¸ã¿åèªãã¯ãã«ã§åèªåãè¾¼ã¿ãåæåãã¦å¦ç¿ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼ã®ãµã³ãã«ã³ã¼ãâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ãããããã§ã®å¦çã追å ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼ã®ãµã³ãã«ã³ã¼ããä¸é¨æµç¨ãã¾ããã # ref: https://www.shoeisha.co.â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ 確ççå¾é éä¸æ³ã«ããå¦ç¿ã®å¦çã追å ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼ã®ãµã³ãã«ã³ã¼ããä¸é¨æµç¨ãã¾ããã # ref: https://wwâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ RNN ãå®è£ ãã¾ãããªãå®è£ æã«ã¯ãç¾å ´ã§ä½¿ããï¼PyTorchéçºå ¥é 深層å¦ç¿ã¢ãã«ã®ä½æã¨ã¢ããªã±ã¼ã·ã§ã³ã¸ã®å®è£ ãï¼ç¿æ³³ç¤¾ï¼ã®ãµã³ãã«ã³ã¼ããä¸é¨æµç¨ãã¾ããã import re from collections import defaultdictâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æ示éãã«æç´ã«å®è£ ãã¾ãã from collections import defaultdict import joblib import pandas as pd def text2id(text): return [word2token[word] for word in text.split()] X_train = pd.read_table('ch06/trainâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ãããã¯ã¼ã¯ã 3 層ã«å¤æ´ãã¦ãã¾ãã import joblib import matplotlib.pyplot as plt import numpy as np import torch from torch import nn, optim from torch.utils.data import DataLoader, TensorDataset from â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ .to('cuda:0') 㧠GPU ã«è»¢éãã¾ãã import joblib import matplotlib.pyplot as plt import numpy as np import torch from torch import nn, optim from torch.utils.data import DataLoader, TensorDataset from tqâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ãããããåã®å¦çã追å ãã¾ããã import joblib import matplotlib.pyplot as plt import numpy as np import torch from torch import nn, optim from torch.utils.data import DataLoader, TensorDataset from tqdâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ã¨ããã¯ãã¨ã«ã¢ãã«ãä¿åãã¾ãã import joblib import matplotlib.pyplot as plt import numpy as np import torch from torch import nn, optim X_train = joblib.load('ch08/X_train.joblib') y_train = joblib.lâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æ失ã¨æ£è§£çããããããã¾ãã import joblib import matplotlib.pyplot as plt import numpy as np import torch from torch import nn, optim X_train = joblib.load('ch08/X_train.joblib') y_train = joblib.load(â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æ£è§£çãè¨ç®ãã¾ãã import joblib import numpy as np import torch from torch import nn, optim X_train = joblib.load('ch08/X_train.joblib') y_train = joblib.load('ch08/y_train.joblib') X_train = torch.froâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ 確ççå¾é éä¸æ³ã§ 100 ã¨ããã¯å¦ç¿ãã¾ãã import joblib import numpy as np import torch from torch import nn, optim X_train = joblib.load('ch08/X_train.joblib') y_train = joblib.load('ch08/y_train.jobliâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æ失ã¨å¾é ãè¨ç®ãã¾ãã import joblib import numpy as np import torch import torch.nn as nn X_train = joblib.load('ch08/X_train.joblib') y_train = joblib.load('ch08/y_train.joblib') X_train = torch.from_â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ å層ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ãå®ç¾©ããäºæ¸¬ãã¾ãã import joblib import numpy as np import torch import torch.nn as nn X_train = joblib.load('ch08/X_train.joblib') X_train = torch.from_numpy(X_train.astype(â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ SWEM *1ã¨å¼ã°ããç¹å¾´éãçæãã¾ãã import joblib import numpy as np import pandas as pd from gensim.models import KeyedVectors from tqdm import tqdm def culcSwem(row): global model swem = [model[w] if â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ t-SNEã§å¯è¦åãã以å¤ã¯ è¨èªå¦ç100æ¬ãã㯠2020ã67. k-meansã¯ã©ã¹ã¿ãªã³ã°ã - u++ã®åå¿é² ã¨åæ§ã§ãã import matplotlib.pyplot as plt import numpy as np import pandas as pd from gensim.models import Keâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ Wardæ³ã«ããé層åã¯ã©ã¹ã¿ãªã³ã°ãå®è¡ã»å¯è¦åãã以å¤ã¯ è¨èªå¦ç100æ¬ãã㯠2020ã67. k-meansã¯ã©ã¹ã¿ãªã³ã°ã - u++ã®åå¿é² ã¨åãã§ãã import matplotlib.pyplot as plt import numpy as np import pandas aâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æåã« questions-words.txt ããå½åãåå¾ãã¾ããç¶ãã¦ãå½åã«é¢ããåèªãã¯ãã«ãæ½åºãï¼k-meansã¯ã©ã¹ã¿ãªã³ã°ãã¯ã©ã¹ã¿æ°k=5ã¨ãã¦å®è¡ãã¾ãã import numpy as np import pandas as pd from gensim.models â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ è¨èªå¦ç100æ¬ãã㯠2020ã61. åèªã®é¡ä¼¼åº¦ã - u++ã®åå¿é² 㨠è¨èªå¦ç100æ¬ãã㯠2020ã64. ã¢ããã¸ã¼ãã¼ã¿ã§ã®å®é¨ã - u++ã®åå¿é² ã®çµã¿åããã§ãã import numpy as np import pandas as pd from gensim.mâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ pandas ã§èªã¿è¾¼ã¿ãæ£è§£çãè¨ç®ãã¾ããããã¤ãæ¹æ³ã¯ããã¾ãããããã§ã¯åãã¨ã«ä¸è´ãã¦ãããå¦ãã® bool å¤ãå¾ãå¾ãsum() 㧠true ã®åæ°ãæ°ãã¦ãã¾ãããã®å¤ãåæ°ã§å²ãã¨ã確çã«å¤æã§ãã¾ãã importâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ è¨èªå¦ç100æ¬ãã㯠2020ã63. å æ³æ§ææ§ã«ããã¢ããã¸ã¼ã - u++ã®åå¿é² ã®å¦çãç¹°ãè¿ãã®ã¿ã§ããfor æã§åãã¦ãè¯ãã§ãããããã§ã¯ pandas ã® progress_apply ã使ãã¾ãããæéããããã®ã§ãtqdm ã§å®è¡â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ most_similar ã®å¼æ°*1ãæ´»ç¨ãã¾ãã from gensim.models import KeyedVectors model = KeyedVectors.load_word2vec_format('ch07/GoogleNews-vectors-negative300.bin', binary=True) result = model.most_similar(posâ¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ most_similar *1 ã使ãã¾ããtopn ã§ä¸ä½ä½ä»¶ãè¿ãããæå®ã§ãã¾ããããã©ã«ã㧠topn=10 ã¨ãªã£ã¦ãã¾ãããåãããããã®ããã«æ¢ãã¦æ示çã«æå®ãã¦ãã¾ãã from gensim.models import KeyedVectors model = â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ ã³ãµã¤ã³é¡ä¼¼åº¦ãè¨ç®ãã¾ãã from gensim.models import KeyedVectors model = KeyedVectors.load_word2vec_format('ch07/GoogleNews-vectors-negative300.bin', binary=True) print(model.similarity("United_States",â¦
åé¡æ nlp100.github.io åé¡ã®æ¦è¦ æ¬ç« ã§ã¯ãåèªã®æå³ãå®ãã¯ãã«ã§è¡¨ç¾ããåèªãã¯ãã«ï¼åèªåãè¾¼ã¿ï¼ãæ±ãã¾ããæåã«ãå¦ç¿æ¸ã¿ã¢ãã«ããã¦ã³ãã¼ãããåèªãã¯ãã«ã表示ãã¾ãã from gensim.models import KeyedVectors model = KeyedVecâ¦