ã¨ã ã¹ãªã¼ã¨ã³ã¸ãã¢ãªã³ã°ã°ã«ã¼ã AIã»æ©æ¢°å¦ç¿ãã¼ã ã§ã½ããã¦ã§ã¢ã¨ã³ã¸ãã¢ããã¦ãã䏿(po3rin) ã§ããæ¤ç´¢ã¨Goã好ãã§ãã
æè¿ã¨ã ã¹ãªã¼ã§ã¯è¿ éãã¤ç°¡åã«å®è£ ã§ããã¨ãã観ç¹ããæ å ±æ¤ç´¢(IR)ã«ãã³ãã£ããã¢ã«ã´ãªãºã ãé©ç¨ããæ½çãé²è¡ä¸ã§ãããã®éç¨ã§Cascade Modelã«å¤è ãã³ãã£ãããé©ç¨ããã¢ã«ã´ãªãºã ã調ã¹ãã®ã§ãPythonã«ããå®è£ ã¨ã¨ãã«ç´¹ä»ãã¦ããã¾ãã
- Introduction
- äºåç¥è
- Cascade Model ã¨ã¯
- Cascade Model ã«é©ç¨ããå¤è ãã³ãã£ãã
- ã¾ã¨ã
- Reference
Introduction
ãã³ãã£ããã¢ã«ã´ãªãºã ã¯è¿ éãã¤ç°¡åã«å®è£ ã§ãããã¬ã¼ãã³ã°ãã¼ã¿ãå¿ è¦ã¨ãããç¶ç¶çãªãã¹ã/å¦ç¿ãå¯è½ã§ããããããããããªã³ã©ã¤ã³ã¢ããªã±ã¼ã·ã§ã³ã§é©ç¨ããã人æ°ã®é«ãææ³ã§ãã
ããããIRã«ãã³ãã£ããã¢ã«ã´ãªãºã ãé©ç¨ãããã¨ããå ´åãå°ã工夫ããå¿ è¦ãããã¾ãããã®éã®1ã¤ã®æ¹æ³ãCascade Modelãä»®å®ããä¸ã§ã®ãã³ãã£ããã¢ã«ã´ãªãºã ã®é©ç¨ã§ãã
IRÃå¤è ãã³ãã£ããã«ã¤ãã¦ã¯ICTIR '17ã®ãã¥ã¼ããªã¢ã«[1]ãé常ã«åå¼·ã«ãªãã®ã§ããããã§ãCascade Model以å¤ã«ãæ§ã ãªãããã¯ã«è§¦ãããã¦ãã¾ããä»åã¯ãã¡ãã§ç´¹ä»ããã¦ããã¢ã«ã´ãªãºã ãä¸å¿ã«ç´¹ä»ãã¦ããã¾ãã
https://dl.acm.org/doi/10.1145/3121050.3121108
äºåç¥è
ãã®è¨äºã§ã¯ãåºæ¬çãªå¤è ãã³ãã£ããã¢ã«ã´ãªãºã ã§ããUpper Confidence Bound(UCB)ã¢ã«ã´ãªãºã ã¨Thompson Sampling(TS)ã«ã¤ãã¦ã®çè§£ãããåæã§é²ãã¾ãã
UCBãTSã®è§£èª¬ã«ã¯ãã¾ãã¾ãªæ¸ç±ãããã°ã大éã«ããã®ã§ãã¡ããåç §ãã¦ããã ãã®ãè¯ãã§ããããç§ã®ããããã¯Pythonã®å®è£ ãã¤ãã¦ãããã¦ã§ãæé©åã§ã¯ãããæ©æ¢°å¦ç¿ãã§ãã
Cascade Model ã¨ã¯
Cascade Model ã¯çµæãªã¹ããé ä½ã®é«ãã¢ã¤ãã ããé çªã«èµ°æ»ãã¦ãããã¨ãä»®å®ããã¢ãã«ã§Craswell[2]ãã«ãã£ã¦ææ¡ããã¾ãããããã«å¼·åãªä»®å®ã¨ãã¦ãã¦ã¼ã¶ã¼ãã¢ã¤ãã ã好ãã§ããå ´åã¯å¿ ãã¯ãªãã¯ãè¡ãããã®å¾ã®é ä½ã«ä¸¦ãã§ããã¢ã¤ãã ã¯å ¨ã¦ã¦ã¼ã¶ã¼ã«èµ°æ»ãããªããã¨ãåæã¨ãã¦ãã¾ããå¾ã»ã©è©³ããè¦ã¦ããã¾ããããã®ã¢ãã«ã«ãããã¸ã·ã§ã³ãã¤ã¢ã¹ãæ±ãã®ãç°¡åã«ãªãã¾ãã
æ¦è¦å³ã¯ä¸è¨ã«ãªãã¾ãããã®ä¾ã§ã¯ã¦ã¼ã¶ã¼ãã¢ã¤ãã 3ãã¯ãªãã¯ããã®ã§ã¢ã¤ãã 4ã¨ã¢ã¤ãã 5ã¯ã¦ã¼ã¶ã¼ã«èµ°æ»ããã¾ããã
Cascade Modelã¯åç´ã§ãããéå»ã®ã¯ãªãã¯ãã¼ã¿ã®ä½ç½®ãã¤ã¢ã¹ã説æããã®ã«å¹æçã¨ãã¦å¤ãã®ã¢ã«ã´ãªãºã ã®ã¢ãã«ã¨ãã¦æ¡ç¨ããã¦ãã¾ãã
å
¨ã¢ã¤ãã éåã®ä¸ãã
åã®ã¢ã¤ãã ãé¸ã³ã©ã³ãã³ã°ãããã®ã
ã¨ãããããã¦ã¼ã¶ã¼ã«è¡¨ç¤ºããæ¤ç´¢çµæã¨ãã¾ãã
ãã¢ã¤ãã
ãã¦ã¼ã¶ã¼ã«ã¨ã£ã¦å¥½ã¿ã§ãã確ç(ããã§ã¯èªå¼ç¢ºçã¨å¼ã¶)ã¨ãã¾ããèªå¼ç¢ºçã¯ä»åã®è¨å®ã§ã¯ã¯ãªãã¯çã§ãã
ããããã¨ããã¦ã¼ã¶ã¼ã«èµ°æ»ããã確çã¯
ã¨ãªãã¾ãããã£ã¦ãå°ãªãã¨ã1ã¤ã®ã¢ã¤ãã ãã¯ãªãã¯ããã確çã¯
ã¨ãªãã¾ãã
Cascade Model ã«é©ç¨ããå¤è ãã³ãã£ãã
ãã®ç« ã§ã¯Cascade Modelã«é©ç¨ããåºæ¬çãªå¤è ãã³ãã£ããã¢ã«ã´ãªãºã ã§ããCascadeUCB1ã¨ãã¢ã¤ãã ã®ç¹å¾´éãèæ ®ããCascadeLinTSã®2ã¤ãç´¹ä»ãã¾ãã
Cascading Bandits
Cascade Modelã«å¤è ãã³ãã£ããã¢ã«ã´ãªãºã ãé©ç¨ãããã¨ãèãã¾ããæ¦è¦å³ã¯ä¸è¨ã®ããã«ãªãã¾ãã
ã¨ã¼ã¸ã§ã³ããå ¨ã¢ã¤ãã ãªã¹ãããã¦ã¼ã¶ã¼ã«è¡¨åºããã¢ã¤ãã ãé¸ã³ã¾ããã¦ã¼ã¶ã¼ããã®ã¯ãªãã¯ããã£ã¼ãããã¯ã¨ãã¦åãåããæ¬¡ã®ãªã¹ãã使ãã¾ãã
Cascade Modelã«æåã«å¤è ãã³ãã£ãããé©ç¨ããKveton [3] ãã¯CascadeUCB1ã¨CascadeKL-UCBãææ¡ãã¦ãã¾ãããã®2ã¤ã®ææ³ã¯ä¸»ã«upper confidence bound(UCB)ã®è¨ç®æ¹æ³ã®ã¿ãç°ãªãã¾ãã
2ã¤ã®ææ¡ææ³ã¯å¤è ãã³ãã£ããåé¡ã§ããå©ç¨ãããUCBããã¼ã¹ã«ãªã£ã¦ãã¾ãã
ä»åã¯å®è£ ãç°¡åãªCascadeUCB1ã®ä¾ãè¦ã¦ããã¾ãã
ã¹ãããã§ã¦ã¼ã¶ã¼ã«
åã®ã¢ã¤ãã 群
ã表示ãããã¨ãèãã¾ããä½çªç®ã®ã¢ã¤ãã ãã¯ãªãã¯ããããã
ã¨ãã¦ã¹ãããtã§ã¦ã¼ã¶ã¼ãèµ°æ»ããã¢ã¤ãã ã®è¦³æ¸¬ããã
ããããã¾ãã
ã¦ã¼ã¶ã¼ã«è¡¨åºããã¢ã¤ãã ã¯ä¸è¨ã®ç®ç颿°ãæå¤§åããå½¢ã§æ±ºå®ãããã¨ã«ãªãã¾ãããããCascade Modelä¸ã«ãã³ãã£ããã¢ã«ã´ãªãºã ãé©ç¨ããéã®ãã¤ã³ãã§ããç®ç颿°ã¯Cascade Modelã§ã解説ããããã«ãå°ãªãã¨ã1ã¤ã®ã¢ã¤ãã ãã¯ãªãã¯ããã確çã¨ãªã£ã¦ãã¾ãã
CascadeUCB1ã«ããã¦ããªã³ã©ã¤ã³ã§è¿ããªã¹ãã¯æ¢ç´¢ã®ããã«UCBã®å¤ã§ããã§æ±ºå®ãã¾ãã
ããã§ ã¯ã¢ã¤ãã
ã®è¦³æ¸¬ããã
åã®éã¿ã®å¹³åã§ããã
ã¯ã¢ã¤ãã
ãã¹ããã
ã¾ã§ã«è¦³æ¸¬ãããåæ°ã§ãã
ã¯
ã¹ãããæã®
å¨ãã®ä¿¡é ¼åºéã§ãã
æçµçã«ãã£ã¼ãããã¯ã§æ´æ°ããããã¦ã¼ã¶ã¼ã¸ã®æ¨è¦ãªã¹ããæ±ºå®ãã¾ãã
çµæçã«ãé«ããã®ããé ã«æ¨è¦ãªã¹ãã«å ãã¦ããã°è¯ããã¨ã«ãªãã¾ãã
ããã¾ã§ã®æ¬ä¼¼ã³ã¼ããä¸è¨ã«å¼ç¨ãã¾ãã
çä¼¼ã³ã¼ãã§ã¯å¹³åã®æ´æ°ãç´æ¥è¡ã£ã¦ãããã¨ã«æ³¨æãã¦ãã ããã
Pythonã«ããCascadeUCB1ã®å®è£
ä»åã®å®è£ ã§ã¯Python3.9ãå©ç¨ãã¾ããã¢ã¸ã¥ã¼ã«ã¯ä¸è¨ãå©ç¨ãã¾ãã
import math import random from abc import ABC, abstractmethod import pandas as pd import numpy as np from matplotlib import pyplot as plt from scipy.stats import bernoulli from tqdm import tqdm
å¾ã»ã©å¥ã®ã¢ã«ã´ãªãºã ã¨ãæ¯è¼ããã®ã§ãå®è£ ããããããã«æ½è±¡ã¯ã©ã¹ãå®ç¾©ãã¦ããã¾ãã
class Agent(ABC): @abstractmethod def get_list(self, k: int) -> list[int]: pass @abstractmethod def observe(self, a: list[int], click : int) -> None: pass class Env(ABC): @abstractmethod def click(self, A: list[int]) -> int: pass @abstractmethod def weights(self, A: list[int]) -> list[float]: pass @abstractmethod def optimal_weights(self, k: int) -> list[float]: pass
Agent
ã¯ã©ã¹ã®è§£èª¬ããã¾ããget_list
ã¡ã½ããã¯Agentãã¦ã¼ã¶ã¼ã«è¦ãããªã¹ããçæãã¾ããobserve
ã¯ã¦ã¼ã¶ã¼ããã®ãã£ã¼ãããã¯ãããã©ã¡ã¼ã¿ãæ´æ°ãã¾ãã
Env
ã¯ã©ã¹ã§ã¯click
ã¡ã½ããããè¿ãã¾ããããä½ãã¯ãªãã¯ããªãã£ãå ´åã¯
-1
ãè¿ãããã«ãã¾ããweights
ã¡ã½ããã¯ã渡ããããªã¹ãã®éã¿ãè¿ããoptimal_weights
ã¯çæ³ãªã¹ãã®éã¿ãè¿ãã¾ããããã2ã¤ã®ã¡ã½ããã¯ãªã°ã¬ãããè¨ç®ããã¨ãã«ä½¿ãã¾ããå¾ã»ã©è©³ãã説æãã¾ããããªã°ã¬ããã¯ã¹ãããã§ã®çæ³ã®æ¹çã¨ã®ç´¯ç©å ±é
¬ã®å·®ã§ãã
å®éã«ä»åã·ãã¥ã¬ã¼ãããç°å¢ãç¨æãã¾ããããCascade Modelã«ãªãã£ã¦CascadingModelEnv
ãå®è£
ãã¾ãã
class CascadingModelEnv(Env): def __init__(self, E: list[float]): self.E = E def click(self, A: list[int]) -> int: for i, item in enumerate(A): if self.E[item] > np.random.random(): return i+1 return -1 def weights(self, A: list[int]) -> list[float]: return [self.E[i] for i in A] def optimal_weights(self, k: int) -> list[float]: return sorted(E, reverse=True)[:k]
CascadingModelEnv
ã®åæåæã«å
¨ã¦ã®ã¢ã¤ãã éåã®ã¯ãªãã¯ç¢ºçãæ¸¡ãã¾ããEã®indexããã®ã¢ã¤ãã ã®IDã¨ãã¦å©ç¨ãã¾ãã
åä½ç¢ºèªãã¦ããã¨ãçã£ããããªãã¹ãã°ã©ã ã«ãªã£ã¦ãããã¨ããããã¾ãã
## test E = [0, 0.1, 0.2, 0.3] env = CascadingModelEnv(E) results = [] for i in range(1000): results.append(env.click([1,2,3])) plt.xlabel("item_index") plt.ylabel("click_freq") plt.hist(results)
次ã«ã¨ã¼ã¸ã§ã³ããä½ãã¾ããããä¸ã§ç´¹ä»ããçä¼¼ã³ã¼ããæ·¡ã ã¨å®è£ ãã¾ããUCBè¨ç®ã®é¨åã¯CascadeUCB1ã使ã£ã¦ãããã¨ã«æ³¨æãã¾ãããã
class CascadeUCB1Agent(Agent): def __init__(self, E: list[float], p: float): self.t = 1 self.counts = [1 for _ in range(len(E))] self.weights = [bernoulli.rvs(p=p) for _ in range(len(E))] def ucb(self, e: int): return self.weights[e] + math.sqrt(1.5*math.log(self.t - 1)/self.counts[e]) def get_list(self, k: int) -> list[int]: self.t += 1 ucbs = [self.ucb(e) for e in range(len(E))] return sorted(range(len(ucbs)), key=lambda i: ucbs[i], reverse=True)[:k] def is_click(self, click: int, k: int) -> int: return 1 if click == k else 0 def observe(self, a: list[int], click : int) -> None: if click == -1: click = len(a) for i in range(min(len(a), click)): e = a[i] before_count = self.counts[e] self.counts[e] += 1 self.weights[e] = ( (before_count * self.weights[e]) + self.is_click(click, i+1) ) / self.counts[e]
ããã§CascadeUCB1ãå®è¡ããæºåãã§ãã¾ããããã®ã¨ã¼ã¸ã§ã³ãã®ããªã·ã¼ã¯ãªã°ã¬ããã®ç´¯ç©ã«ãã£ã¦è©ä¾¡ã§ãã¾ãããªã°ã¬ããã¯è«æ 3 ã¨åãããã«
ã¨ãã¦è¨ç®ãã¾ããããã§ã¯çæ³ãªã¹ãã§ãããã¦ã¼ã¶ã¼ã«æãã¯ãªãã¯ããããã
åã®ã¢ã¤ãã ã®éåã§ããä»åå®è£
ããã¨ã¼ã¸ã§ã³ãã®è©ä¾¡ã®ããã«ãªã°ã¬ãããè¨ç®ãã
regret
颿°ãå®ç¾©ãã¾ãã
def f(weights: list[float]) -> float: v = 1 for w in weights: v *= (1-w) return 1-v def regret(optimal_weights: list[float], weights: list[float]) -> float: return f(optimal_weights) - f(weights)
è©ä¾¡ã®æºåãã§ããã®ã§ã次ã«å®éã«ã·ãã¥ã¬ã¼ã·ã§ã³ãã颿°ãç¨æãã¾ãã
def simulate(agent: Agent, env: Env, k: int, steps: int) -> list[float]: optimal_weights = env.optimal_weights(k=k) cumulative_regret = 0 regret_cumulative_history = [] for i in tqdm(range(steps)): a = agent.get_list(k=k) click = env.click(a) agent.observe(a, click) setting_weights = env.weights(a) cumulative_regret += regret(optimal_weights, setting_weights) regret_cumulative_history.append(cumulative_regret) return regret_cumulative_history
æçµçã«ç´¯ç©ãªã°ã¬ãããè¿ãã®ã¯ãå¾ã§ã°ã©ãã¨ãã¦æåãã¦ãæ£ããåãã¦ãããã¨ã確èªããããã§ãã
æ©éä»åã®ã·ãã¥ã¬ã¼ã·ã§ã³ç¨ã®ç°å¢ã§åããã¦ã¿ã¾ããä»åã®è¨å®ã§ã¯ã
ã
ã§è¡ãã¾ãã
E = [0.3, 0.2, 0.25, 0.1, 0.1, 0.24, 0.2, 0.1, 0.21, 0.1] env = CascadingModelEnv(E=E) agent = CascadeUCB1Agent(E=E, p=0.2) regret_cumulative_history = simulate(agent=agent, env=env, k=3, steps=100000) plt.xlabel("step t") plt.ylabel("Regret") plt.plot(regret_cumulative_history)
çµæã¯ä¸è¨ã«ãªãã¾ãã
ãªã°ã¬ãããåæãã¦ãã¾ããæ£ããã¯ãªãã¯çã®é«ããªã¹ããçæãã¦ããã¦ããããã§ãã
Linear Cascading Bandits
æ¢ç´¢å¯¾è±¡ã®ããã¥ã¡ã³ãæ°ã大ããå ´åãã¦ã¼ã¶ã¼ã«ãã¼ã¿ã»ããå
ã®ãã¹ã¦ã®ã¢ã¤ãã ãå°ãªãã¨ã1åã¯è¡¨ç¤ºããå¿
è¦ããããããæ¥ã
ããããã®ã¢ã¤ãã ãç¾ããã¡ãã£ã¢ãµã¤ããªã©ã§ã¯CascadeUCB1ã¯å®ç¨çã§ã¯ããã¾ãããããã§Zongã[4]ã¯linear cascading banditsã¨ããã¢ããã¼ããææ¡ãã¦ãã¾ããããã¯ã¢ã¤ãã ã®å¼å確çãã¢ã¤ãã ã®ç¹å¾´ããç·å½¢é¢æ°ã§åºåãããã¨ä»®å®ããææ³ã§ãã
Zongãã®è«æã§ã¯CascadeLinTSã¨CascadeLinUCBã¨ããææ³ãææ¡ãã¦ãã¾ããä»åã¯è«æå ã§è©ä¾¡ã®é«ãã£ãCascadeLinTSã®ä¾ã追ã£ã¦ããã¾ãã
CascadeLinTS㯠Thompson Sampling(TS)[5]ããã¼ã¹ã«ãªã£ã¦ãããLinTS[6]ã¯TSã®æèãã³ãã£ããã¸ã®æ¡å¼µã«ãªã£ã¦ãããã¢ã¤ãã ãªã©ã®ç´ æ§ãèæ ®ã§ãã¾ããCascadeLinTSã¯LinTSãCascade Modelã«é©ç¨ããææ³ã§ãã
ã¢ã¤ãã ãã¨ã®èªå¼ç¢ºçã®æ¨å®å¤ã¯ã¯ä¸è¨ã®ããã«å®ç¾©ãã¾ãã
ã大ãããã®ãä¸ããé ã«ã¦ã¼ã¶ã¼ã«è¡¨ç¤ºãããªã¹ãã¨ãã¦ä¸¦ã¦ã¦ã¼ã¶ã¼ã«è¡¨ç¤ºãã¾ãã
ããã§ã¯ã¢ã¤ãã
ã®
次å
ã®ç¹å¾´ãã¯ãã«
ã§ããã
ã¯
次å
ã®ãã©ã¡ã¼ã¿ãã¯ãã«
ã§ãã
ã¯å
¨ã¦ã®ã¢ã¤ãã ã«ã¤ãã¦å
±éã«å©ç¨ããã¾ãã
CascadeLinTSã¯ã夿¬¡å
æ£è¦åå¸ãããã©ã¡ã¼ã¿ã¼ãã¯ãã«ããµã³ããªã³ã°ãã¾ãã
ããã³ããã¯ãã«
ã¯å¤æ¬¡å
æ£è¦åå¸ã®äºå¾åå¸ã®ãã©ã¡ã¼ã¿ãæ±ããéã«åºç¾ãããã¤ãã®å½¢ã§ãã詳ããå°åºã¯é·ããªãã®ã§ä»ã®æç®ãåç
§ãã¦ãã ãããåã®ããããã¯ãªã©ã¤ãªã¼ã®ãã¦ã§ãæé©åã§ã¯ãããæ©æ¢°å¦ç¿ãã®æèä»ããã³ãã£ããã®ç« ã§ãããã®ç« ã§ã¯æèä»ããã³ãã£ããã®å°åºã®æµãã§ã夿¬¡å
æ£è¦åå¸ã®ç´¹ä»ãäºå¾åå¸ã®ãã©ã¡ã¼ã¿ã®å°åºãæ´æ°å¼ã®å°åºãè¡ã£ã¦ãã¾ãã
è£è¶³ãã¦ããã¨ãè¡åã¯è¡ãã¹ããã
ã§è¦³æ¸¬ããããã¹ã¦ã®ã¢ã¤ãã ã®ç¹å¾´ãã¯ãã«ã¨ãã
ãã¹ããã
ã§è¦³æ¸¬ããããã¹ã¦ã®å¼å確çã®åãã¯ãã«ã¨ãã¾ãã
ã¯
åä½è¡åã§ããã
ã¯å¦ç¿çãå¶å¾¡ãããã©ã¡ã¼ã¿ã§ãçæ³çã«ã¯è¦³æ¸¬ãã¤ãºã®åæ£ã§ãããã¨ãæã¾ããã¨è«æã§èª¬æããã¦ãã¾ãã
ããã¾ã§ã§ãã¨
ããããã°ã
ããµã³ããªã³ã°ã§ãããã¨ããããã¾ããããã¨ã¯ãã¹ããããã¨ã«
ã¨
ãæ´æ°ãã¦ããå¿
è¦ãããã¾ãã
ã¨
ã¯ä¸è¨ã®ããã«æ´æ°ãã¾ãã
æ¬ä¼¼ã³ã¼ãã¯ä¸è¨ã«ãªãã¾ããCascadeLinTSã®å ¨ä½åãAlgorithm1ã«ãªããäºå¾åå¸ã®ãã©ã¡ã¼ã¿æ´æ°é¨åãAlgorithm3ã§ãã
å®è£
ã®è¦³ç¹ã§ã¯éè¡åã®è¨ç®ãéããã§ããããã§å®æ¦ã§ã¯ä¸è¨ã®æ´æ°å¼ã§ãç´æ¥æ´æ°ãã¾ãã
ãã¡ãã¯ã¦ããããªã¼ã®å ¬å¼ããå°ãã¾ããPythonã«ããå®è£ ã®ãã§ã¼ãºã§ã¯ãã¡ãã®æ´æ°å¼ã使ãã¾ãã
CascadeLinTSã®Pythonå®è£
ä»åã®å®è£ ã§ã¯ãã¢ã¤ãã ã®ç¹å¾´ãã¯ãã«ã表ç¾ããããã«ãæè¡ããã°ãµã¤ãã®è¨äºãæ¨è¦ããæ½çãä»®å®ãã¾ããããããã®è¨äºã«ã¯ã¿ã°ã1~3åä»ä¸ãããã¿ã°ã«ãã£ã¦ã¯ãªãã¯ããã確çãå¤ããã¾ããä»åã¯ã¿ã°ãç¹å¾´éã¨ãã¦CascadeLinTSãåããã¾ãã
ã¾ãã¯ä»åã®ã·ã¥ãã¬ã¼ã·ã§ã³ç¨ã®è¨äºãçæãã¾ããã¿ã°ããããã®ã¯ãªãã¯çã®ç·å½¢çµåã§å®éã®ã¯ãªãã¯çãè¨å®ãã¾ããããããã®ã¿ã°ã®ã¯ãªãã¯çã¯æ£è¦åå¸ãããµã³ããªã³ã°ãã¾ãã
def gen_items(tags: dict[str, float], L: int, sigma=0.01)-> pd.DataFrame: tag_ids = list(tags.keys()) weights = [] features = [] for i in range(L): n = random.randint(1, 3) tag_samples = random.sample(tag_ids, n) w = 0 one_hot = [] for t in tag_ids: if t in tag_samples: w += random.gauss(tags[t], sigma) one_hot.append(1) else: one_hot.append(0) vec = np.array(one_hot) features.append(vec.reshape(len(vec),1)) weights.append(w) df = pd.DataFrame({'id': list(range(len(weights))), 'weight' : weights, 'feature' : features}) return df
å®éã«gen_items
ãåããã¦ã¿ã¾ãã
tags_with_weight = { 'AWS': 0.03, 'Docker': 0.2, 'Elasticsearch': 0.15, 'GCP': 0.08, 'Git': 0.05, 'NLP': 0.19, 'Rust': 0.23, 'Scala': 0.14, 'æ©æ¢°å¦ç¿': 0.15, 'å¼·åå¦ç¿': 0.20 } df = gen_items(tags=tags_with_weight, L=10) df.head(5)
çµæãä¸è¨ã®ãããªã¢ã¤ãã IDãã¯ãªãã¯çãç¹å¾´éãã¯ãã«ãåºæ¥ã¾ããã¿ã°ã¯multi-hot encodingãã¦ãã¾ããç¹å¾´éã¯è«æã«åããã¦ã®å½¢ã§æã£ã¦ãã¾ãã
ç°å¢ã®å®è£
ã¯CascadingModelEnv
ã¨åããªã®ã§ãã®ã¾ã¾å©ç¨ãã¾ããç¶ãã¦CascadeLinTSAgent
ãå®è£
ãã¾ããçä¼¼ã³ã¼ããåèã«å®è£
ãã¦ããã¾ãã
class CascadeLinTSAgent(Agent): def __init__(self, d: int, sigma: float, features: pd.DataFrame): self.sigma = sigma self.features = features self.InvM = np.eye(d) self.B = np.zeros(d).reshape(d, 1) def get_list(self, k: int) -> list[int]: before_theta = (self.sigma**-2) * self.InvM.dot(self.B) theta = np.random.multivariate_normal(mean=before_theta.ravel(), cov=self.InvM) weights = self.features['feature'].apply(lambda x: x.T.dot(theta)).to_list() return sorted(range(len(weights)), key=lambda i: weights[i], reverse=True)[:k] def is_click(self, click: int, k: int) -> int: return 1 if click == k else 0 def observe(self, a: list[int], click : int) -> None: if click == -1: click = len(a) for i in range(min(len(a), click)): e = a[i] x = self.features[self.features['id']==e]['feature'].to_list()[0] self.InvM = self.InvM - ( self.InvM.dot(x).dot(x.T).dot(self.InvM) )/( x.T.dot(self.InvM).dot(x) + self.sigma**2 ) if self.is_click(click, i+1): self.B = self.B + x
ãã®å®è£
ã§ã¯ãç´æ¥æ´æ°ãã¦ãããã¨ã«æ³¨æãã¦ãã ããã
ã§ã¯CascadeLinTSAgent
ãåããã¦ã¿ã¾ããè«æã®å®é¨ã«åããã¦ãæ¨è¦ããã¢ã¤ãã æ°
ãç¹å¾´éã®æ¬¡å
ã§å®é¨ãã¾ããå
¨ã¢ã¤ãã æ°
ã¯
ã§å®é¨ãã¾ããä»åã®å®é¨ã§ã¯å
ã»ã©å®è£
ããCascadeUCB1ã¨æ¯è¼ãã¦ããã¾ãã
k=4 L=[16, 256, 3000] steps=10000
æ©éå®è¡ãã¦ã¿ã¾ãã
tags_with_weight = { 'AWS': 0.1, 'Docker': 0.2, 'Elasticsearch': 0.15, 'GCP': 0.08, 'Git': 0.05, 'NLP': 0.23, 'Rust': 0.3, 'Scala': 0.18, 'æ©æ¢°å¦ç¿': 0.25, 'å¼·åå¦ç¿': 0.2, } fig = plt.figure(figsize=(13,4)) fig.suptitle('The n-steps regret of CascadeUCB1, CascadeLinTS') for i, l in enumerate(L): df = gen_items(tags=tags_with_weight, L=l) E = df['weight'].to_list() env = CascadingModelEnv(E=E) cascadelints_agent = CascadeLinTSAgent(d=len(tags_with_weight), sigma=1, features=df.drop('weight', axis=1)) cascadeucb_agent = CascadeUCB1Agent(E=E, p=0.2) cascadeucb1_regret_cumulative_history = simulate(agent=cascadeucb_agent, env=env, k=k, steps=steps) cascadelints_regret_cumulative_history = simulate(agent=cascadelints_agent, env=env, k=k, steps=steps) ax = fig.add_subplot(1,3, i+1) ax.set_title(f'L={l}, k=4') ax.plot(cascadeucb1_regret_cumulative_history, label='CascadeUCB1') ax.plot(cascadelints_regret_cumulative_history, label='CascadeLinTS') fig.legend(['CascadeUCB1', 'CascadeLinTS'], loc='upper center', borderaxespad=0.1, title="Algorithm", bbox_to_anchor=(0.5, -0.02), ncol=2) fig.tight_layout() plt.show()
çµæã¯æ¬¡ã®ããã«ãªãã¾ããã
ã¢ã¤ãã ã®ç¹å¾´éãåãå ¥ããCascadeLinTSã®æ¹ããªã°ã¬ãããå°ãªããã¨ããããã¾ããç¹ã«Lã大ããã»ã©CascadeUCB1ã®ãªã°ã¬ããã大ãããåæããªãçµæã確èªã§ãã¾ãã
ã¡ãªã¿ã«10000stepã®é度ã¯ä¸è¨ã®ããã«ãªãã¾ããã
CascadeUCB1 | CascadeLinTS | |
---|---|---|
|
0.00002ç§ | 15ç§ |
|
3ç§ | 24ç§ |
|
28ç§ | 4å46ç§ |
å®è£
ã®ä¸æããããããã§ããããããã«CascadeLinTS
ã®æ¹ãé
ãã§ããç§ã®æ¹ã§èª¿æ»ããã¨ãããCascadeLinTS
ã®get_list
ã§100msãªã¼ãã¼ã®æéãããã£ã¦ãã¾ããã
ãã ãCascadeLinTS
ãç¸å¯¾çã«é
ãã¨ãã£ã¦ãã§1åã®stepã大ä½28msããããªã®ã§ãå
åãªã³ã©ã¤ã³ã§åä½ããã¹ãã¼ãã§ãã
ã¾ã¨ã
ä»åã¯Cascade Modelã«å¤è ãã³ãã£ãããé©ç¨ããã¢ã«ã´ãªãºã ã§ããCascadeUCB1ã¨CascadeLinTSãç´¹ä»ãã¾ããã
ããããæ´ã«ã¦ã¼ã¶ã¼ã®ç¹å¾´éãèæ ®ãããã¼ã½ãã©ã¤ãºãCascade Modelä¸ã§è¡ãã¢ã«ã´ãªãºã ãããã®ã§ãããä½è£ãããã°æ¬¡å以éã®ããã°ã§å®è£ ã¨ã¨ãã«ç´¹ä»ãã¾ãã
We're hiring !!!
ã¨ã ã¹ãªã¼ã§ã¯æ¤ç´¢&æ¨è¦åºç¤ã®éçº&æ¹åãéãã¦å»çãåé²ãããã¨ã³ã¸ãã¢ãåéãã¦ãã¾ãï¼ç¤¾å ã§ã¯æ¥ã æ¤ç´¢ãæ¨è¦ã«ã¤ãã¦ã®è°è«ãæ´»çºã«è¡ããã¦ãã¾ããåé±ã§æ å ±/æ¨è¦è«æèªã¿ä¼ãéå¬ããã¦ãã¾ãã
ãã¡ãã£ã¨è©±ãèãã¦ã¿ãããããã¨ãã人ã¯ãã¡ãããï¼ jobs.m3.com
Reference
-
Dorota Glowacka. 2017. Bandit Algorithms in Interactive Information Retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, October 1-4, 2017, Jaap Kamps, Evangelos Kanoulas, Maarten de Rijke, Hui Fang, and Emine Yilmaz (Eds.). ACM, 327â328.↩
-
Craswell, N., O. Zoeter, M. Taylor, and B. Ramsey. 2008. âAn experimental comparison of click position-bias modelsâ. In: Proceedings of the 2008 international conference on web search and data mining. ACM. 87â94.↩
-
Kveton, B., C. Szepesvari, Z. Wen, and A. Ashkan. 2015a. âCascading Bandits: Learning to Rank in the Cascade Modelâ. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 767â776↩
-
Zong, S., H. Ni, K. Sung, N. R. Ke, Z. Wen, and B. Kveton. 2016. âCascading Bandits for Large-Scale Recommendation Problemsâ. arXiv preprint arXiv:1603.05359 - Proc. UAI.↩
-
Thompson, W. R. 1933. âOn the likelihood that one unknown probability exceeds another in view of the evidence of two samplesâ. Biometrika. 25(¾): 285â294.↩
-
Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning, pages 127â135, 2013.↩