[B! LLM] serihiroã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

serihiro id:serihiro

LLMã«é–¢ã™ã‚‹serihiroã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯ (107)

${{author_name}}$

{{author_name}} {{created}}

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

${{author_name}}$
{{author_name}}{{created}}
{{ #comment }}{{ comment }}{{ /comment }}
- {{ label }}

{{#following_bookmarks}}

${{author_name}}$

{{author_name}} {{created}}

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

{{/following_bookmarks}}

{{/is_wiped}}

GitHub - mjun0812/flash-attention-prebuild-wheels: Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
serihiro 2026/01/24
fash-attn

vllm

llm
ãƒªãƒ³ã‚¯
GitHub - hemingkx/SpeculativeDecodingPapers: ðŸ“° Must-read papers and blogs on Speculative Decoding âš¡ï¸
serihiro 2025/08/20
LLM

speculative_decoding
ãƒªãƒ³ã‚¯
The Big LLM Architecture Comparison
Last updated: Dec 18, 2025 It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are. Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attention has largely given way to Gr
serihiro 2025/08/18
LLM
ãƒªãƒ³ã‚¯
ã€ŒLLMã¯ã‚³ãƒ³ãƒ†ã‚ã‚¹ãƒˆãŒã™ã¹ã¦ã€ã‹ã‚‚ã—ã‚Œãªã„
ã‚³ãƒ³ãƒ†ã‚ã‚¹ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã«ã¤ã„ã¦ LLMï¼ˆå¤§è¦æ¨¡è¨€èªžãƒ¢ãƒ‡ãƒ«ï¼‰ã®åˆ†é‡Žã§ã€æœ€è¿‘ã€Œã‚³ãƒ³ãƒ†ã‚ã‚¹ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ï¼ˆContext Engineeringï¼‰ã€ã¨ã„ã†è¨€è‘‰ãŒå¤šãä½¿ã‚ã‚Œã‚‹ã‚ˆã†ã«ãªã‚Šã¾ã—ãŸã€‚AIã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã®æ–‡è„ˆã§ã‚‚ä½¿ã‚ã‚Œã‚‹ã“ã¨ãŒå¤šãã€è‡ªåˆ†ã®ä¸ã§ãšã£ã¨ãƒ¢ãƒ¤ãƒ¢ãƒ¤ã—ã¦ã„ãŸã®ã§ã™ãŒã€å°‘ã—è‡ªåˆ†ãªã‚Šã«æ•´ç†ã—ã¦ã¿ãŸã®ã§ã“ã“ã«æ›¸ã„ã¦ã¿ã¾ã™ã€‚ åŠåˆ†ä»¥ä¸ŠãŠæ°—æŒã¡ã¨ã„ã†ã‹ãƒã‚¨ãƒ ã‚„ç§è¦‹ãŒæ··ã˜ã£ã¦ã„ã¾ã™ã®ã§ã€å¦è¡“çš„ãªå®šç¾©ã®åŽ³å¯†æ€§ã‚ˆã‚Šã€è‡ªåˆ†ãŒæ™®æ®µä½¿ã£ã¦ã„ã¦æ„Ÿã˜ã‚‹å®Ÿè·µç›®ç·šã§ã®ä¸€ã¤ã®è€ƒãˆæ–¹ã¨ã—ã¦æ‰ãˆã¦ã‚‚ã‚‰ãˆã‚‹ã¨ã‚ã‚ŠãŒãŸã„ã§ã™ã€‚ ã€Œãƒ—ãƒãƒ³ãƒ—ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã€ã‹ã‚‰ã€Œã‚³ãƒ³ãƒ†ã‚ã‚¹ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã€ã¸ ãã‚‚ãã‚‚ã€Œã‚³ãƒ³ãƒ†ã‚ã‚¹ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã€ã£ã¦ä½•ï¼Ÿã€Œãƒ—ãƒãƒ³ãƒ—ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã€ã¨ä½•ãŒé•ã†ã®ï¼Ÿã¨ã„ã†ã¨ã“ã‚ã‹ã‚‰å§‹ã‚ãŸã„ã¨æ€ã„ã¾ã™ã€‚ ãƒ—ãƒãƒ³ãƒ—ãƒˆã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ãƒªãƒ³ã‚°ã¯ã€ã‚‚ã®ã™ã”ã„å˜ç´”ã«ã—ãŸå›³ã«ã™ã‚‹ã¨ä»¥ä¸‹ã«ãªã‚‹ã¨æ€ã„ã¾ã™ã€‚ ãƒ—ãƒãƒ³ãƒ—
serihiro 2025/08/14
LLM
ãƒªãƒ³ã‚¯
GitHub - google/langextract: A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Precise Source Grounding: Maps every extraction to its exact location in the source text, enabling visual highlighting for easy traceability and verification. Reliable Structured Outputs: Enforces a consistent output schema based on your few-shot examples, leveraging controlled generation in supported models like Gemini to guarantee robust, structured results. Optimized for Long Documents: Overcom
serihiro 2025/07/31
LLM

python
ãƒªãƒ³ã‚¯
Stanford CS336 | Language Modeling from Scratch
serihiro 2025/07/26
LLM

tutorial
ãƒªãƒ³ã‚¯
2025-05-02 Stanford CS336 Language Modeling from Scratch: GPUã®è¬Žã‚’è§£ã - Flash Attentionã¾ã§ã®æœ€é©åŒ–æŠ€è¡“å®Œå…¨ã‚¬ã‚¤ãƒ‰
2025-05-02 Stanford CS336 Language Modeling from Scratch: GPUã®è¬Žã‚’è§£ã - Flash Attentionã¾ã§ã®æœ€é©åŒ–æŠ€è¡“å®Œå…¨ã‚¬ã‚¤ãƒ‰ â€»æœ¬è¨˜äº‹ã¯ã€Stanford CS336 Language Modeling from Scratch Spring 2025ã®è¬›ç¾©å‹•ç”»ã€ŒGPUsã€ã®å†…å®¹ã‚’åŸºã«ä½œæˆã•ã‚Œã¦ã„ã¾ã™ã€‚è¬›ç¾©ã®è©³ç´°æƒ…å ±ã¯ https://stanford-cs336.github.io/spri... ã§ã”è¦§ã„ãŸã ã‘ã¾ã™ã€‚Stanfordå¤§å¦ã®ã‚ªãƒ³ãƒ©ã‚¤ãƒ³AIãƒ—ãƒã‚°ãƒ©ãƒ ã«ã¤ã„ã¦ã¯ https://stanford.io/ai ã€æœ¬è¬›ç¾©ã¸ã®ç™»éŒ²ã«ã¤ã„ã¦ã¯ https://online.stanford.edu/courses/c... ã‚’ã”å‚ç…§ãã ã•ã„ã€‚ æœ¬è¨˜äº‹ã§ã¯ã€è¬›ç¾©ã®å†…å®¹ã‚’è©³ç´°ã«ã¾ã¨ã‚ã¦ãŠã‚Šã¾ã™ãŒã€è¦ç´„ã‚„è§£é‡ˆã«ã‚ˆã‚‹èª¤ã‚ŠãŒ
serihiro 2025/07/26
GPU

LLM
ãƒªãƒ³ã‚¯
Stanford CS336 Language Modeling from Scratch I 2025
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpo...
serihiro 2025/07/26
LLM

tutorial
ãƒªãƒ³ã‚¯
Context Engineering
TL;DRAgents need context to perform tasks. Context engineering is the art and science of filling the context window with just the right information at each step of an agentâ€™s trajectory. In this post, we break down some common strategies â€” write, select, compress, and isolate â€” for context engineering by reviewing various popular agents and papers. We then explain how LangGraph is designed to supp
serihiro 2025/07/24
LLM

agent
ãƒªãƒ³ã‚¯
Context Engineering for AI Agents: Lessons from Building Manus
Context Engineering for AI Agents: Lessons from Building Manus At the very beginning of the Manus project, my team and I faced a key decision: should we train an end-to-end agentic model using open-source foundations, or build an agent on top of the in-context learning abilities of frontier models? Back in my first decade in NLP, we didn't have the luxury of that choice. In the distant days of BER
serihiro 2025/07/23
agent

LLM
ãƒªãƒ³ã‚¯
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alon
serihiro 2025/07/18
paper

LLM
ãƒªãƒ³ã‚¯
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety â€” AI Alignment Forum
It is interesting to note how views on this topic have shifted with the rise of outcome-based RL applied to LLMs. A couple of years ago, the consensus in the safety community was that process-based RL should be prioritized over outcome-based RL, since it incentivizes choosing actions for reasons that humans endorse. See for example Anthropic's Core Views On AI Safety: Learning Processes Rather tha
serihiro 2025/07/18
LLM

paper
ãƒªãƒ³ã‚¯
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
serihiro 2025/07/14
LLM

paper
ãƒªãƒ³ã‚¯
AIã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã®ã‚µãƒ¼ãƒ“ã‚¹æ§‹ç¯‰ã‚’æ¤œè¨Žã—ã¦ã„ã‚‹ã‚ãªãŸã¸
ã¯ã˜ã‚ã« æœ¬è¨˜äº‹ã§ã¯ã€æ˜¨ä»Šè©±é¡Œã® AI ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã‚µãƒ¼ãƒ“ã‚¹ã‚’ã“ã‚Œã‹ã‚‰ä½œã‚‹æ–¹ã¸ã€AI ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã‚·ã‚¹ãƒ†ãƒ ã‚’ä½œã‚‹ä¸Šã§ã®æŠ‘ãˆã‚‹ã¹ããƒã‚¤ãƒ³ãƒˆã¨å®Ÿè·µçš„ãªãƒãƒ³ã‚ºã‚ªãƒ³ã«ã¦ DeepDive ã—ã¦ã„ããŸã„ã¨æ€ã„ã¾ã™ã€‚ æœ¬è¨˜äº‹ã¯ 2 éƒ¨æ§‹æˆã«ãªã£ã¦ãŠã‚Šã€ä»¥ä¸‹ã®æ§‹æˆã§ã™ã€‚ ç¬¬ 1 éƒ¨: AI ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆ ã®åŸºæœ¬æ¦‚å¿µã¨ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã‚·ã‚¹ãƒ†ãƒ æ§‹ç¯‰ã®ã‚¬ã‚¤ãƒ‰ ç¬¬ 2 éƒ¨: Azure AI Agent Service ã‚’ä½¿ã£ãŸãƒ¯ãƒ¼ã‚¯ãƒ•ãƒãƒ¼ãƒ«ãƒ¼ãƒ†ã‚£ãƒ³ã‚°ã®å®Ÿè£… ç¬¬ï¼‘éƒ¨ã§ã¯ã€OpenAI ç¤¾ã®a-practical-guide-to-building-agents ã‚’å‚è€ƒã«ã€AI ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã®åŸºæœ¬æ¦‚å¿µã¨ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã‚·ã‚¹ãƒ†ãƒ æ§‹ç¯‰ã®ã‚¬ã‚¤ãƒ‰ã‚’è§£èª¬ã—ã¾ã™ã€‚ ç¬¬ï¼’éƒ¨ã§ã¯ã€Anthropic ã®ãƒ–ãƒã‚°è¨˜äº‹ Building Effective Agents â€“ Workflow Routing ã§ç´¹ä»‹ã•ã‚Œã¦ã„ã‚‹ ãƒ¯ãƒ¼ã‚¯ãƒ•ãƒãƒ¼ãƒ«ãƒ¼
serihiro 2025/07/14
LLM

agent

tutorial
ãƒªãƒ³ã‚¯
AI ã‚¨ãƒ¼ã‚¸ã‚§ãƒ³ãƒˆã¨è€ƒãˆç›´ã™ãƒ‡ãƒ¼ã‚¿åŸºç›¤
NotebookLM: https://notebooklm.google.com/notebook/ad02414d-0499-4892-af21-4db8d16fa721 é–¢é€£è³‡æ–™ https://careers.mercari.com/mercan/articles/53431/ httâ€¦
serihiro 2025/07/13
agent

LLM
ãƒªãƒ³ã‚¯
Accelerating Large Language Model Decoding with Speculative Sampling
serihiro 2025/06/20
LLM

speculative_decoding
ãƒªãƒ³ã‚¯
vLLMã®Speculative Decodingã«ã‚ˆã‚‹æŽ¨è«–é«˜é€ŸåŒ–ã‚’è©¦ã™
ã¯ã˜ã‚ã« ã“ã®è¨˜äº‹ã§ã¯ã€Speculative Decodingã«ã‚ˆã‚‹LLMã®æŽ¨è«–é«˜é€ŸåŒ–ã‚’vLLMã§è©¦ã—ã€ç°¡å˜ãªãƒ™ãƒ³ãƒãƒžãƒ¼ã‚¯ã‚’è¡Œã£ãŸçµæžœã‚’å…±æœ‰ã—ã¾ã™ã€‚ Speculative Decodingã«ã¤ã„ã¦ æœ€åˆã«ã€Speculative Decodingã«ã¤ã„ã¦ç°¡å˜ã«è§£èª¬ã—ã¾ã™ã€‚ Speculative Decodingã¨ã¯ã€å¤§åž‹ã®ãƒ¢ãƒ‡ãƒ«ã®æŽ¨è«–ã‚’ã™ã‚‹éš›ã€ã‚ˆã‚Šå°åž‹ã®ãƒ¢ãƒ‡ãƒ«ã‚’åˆ©ç”¨ã—ã¦æŽ¨è«–ã‚’é«˜é€ŸåŒ–ã™ã‚‹æ‰‹æ³•ã§ã™ã€‚ã“ã®æœ¬æ¥ã®å‡ºåŠ›ã‚’å¾—ãŸã„å¤§åž‹ã®ãƒ¢ãƒ‡ãƒ«ã‚’Target Modelã€é«˜é€ŸåŒ–ã®ãŸã‚ã®å°åž‹ã®ãƒ¢ãƒ‡ãƒ«ã‚’Draft Modelã¨è¨€ã„ã¾ã™ã€‚ Speculative Decodingã§ã¯é€šå¸¸ã®æŽ¨è«–ã¨ã¯é•ã„ã€æŽ¨è«–ã®éš›ã«ã¾ãšå°åž‹ã®Draft ModelãŒä¸€å®šã®Draft Tokensåˆ†ã®ç”Ÿæˆã‚’è¡Œã„ã€å€™è£œã¨ãªã‚‹ãƒˆãƒ¼ã‚¯ãƒ³åˆ—ã‚’ææ¡ˆã—ã¾ã™ã€‚ãã®å¾ŒTarget Modelã¯ã“ã®Draft Tokensã«å¯¾ã—ã¦ç¢ºçŽ‡åˆ†å¸ƒã‚’å…ƒã«
serihiro 2025/06/20
LLM

speculative_decoding

vllm
ãƒªãƒ³ã‚¯
LLMã®æŽ¨è«–é€Ÿåº¦ã‚’åŠ‡çš„ã«åŠ é€Ÿã™ã‚‹æ–¹æ³• Speculative Decoding ã®è§£èª¬ â€“ blog
ã¯ã˜ã‚ã« çš†ã•ã‚“ã«è³ªå•ã§ã™ã€‚ ã€Œãƒ¢ãƒ‡ãƒ«ã®ç²¾åº¦ã‚’è½ã¨ã•ãšã€è¨ˆç®—ãƒªã‚½ãƒ¼ã‚¹ã‚‚å¢—ã‚„ã•ãšã€æŽ¨è«–é€Ÿåº¦ã ã‘ã‚’2å€ã«ã™ã‚‹æ–¹æ³•ã€ ãŒã‚ã‚‹ã¨ã—ãŸã‚‰â€”â€”ãã‚Œã¯é”æ³•ã§ã—ã‚‡ã†ã‹ï¼Ÿãã‚Œã¨ã‚‚ç¾å®Ÿã®æŠ€è¡“ã§ã—ã‚‡ã†ã‹ï¼Ÿ ç”ãˆã¯å¾Œè€…ã§ã™ã€‚Google DeepMindã¨UC BerkeleyãŒå…±åŒé–‹ç™ºã—ãŸSpeculative Decodingã¯ã€ã¾ã•ã«ã“ã®ä¸å¯èƒ½ã‚’å¯èƒ½ã«ã™ã‚‹ã€ŒæŽ¨è«–åŠ é€Ÿã®ãƒ–ãƒ©ãƒƒã‚¯ãƒœãƒƒã‚¯ã‚¹ã€ã€‚è‡ªå‹•è»Šã§ä¾‹ãˆã‚Œã°ã€ãƒŠãƒ“ã®äºˆæ¸¬ãƒ«ãƒ¼ãƒˆå€™è£œã‚’äº‹å‰è¨ˆç®—ã—ã¤ã¤ã€å®Ÿéš›ã®èµ°è¡Œã§æœ€é©çµŒè·¯ã‚’é¸æŠžã™ã‚‹ã‚ˆã†ãªå·§å¦™ãªæ‰‹æ³•ã§ã€LLMã®ç”Ÿæˆé€Ÿåº¦ã«é©å‘½ã‚’èµ·ã“ã—ã¾ã™ã€‚ ã€ŒSpeculative Decodingã€ã£ã¦ä½•ï¼Ÿ ã€ŒSpeculative Decodingã€ã¯æ—¥æœ¬èªžã§ã€ŒæŽ¨æ¸¬çš„ãƒ‡ã‚³ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ã€ã¨è¨³ã•ã‚Œã‚‹ã“ã¨ãŒå¤šãã€ç›´è¨³ã«è¿‘ã„è¡¨ç¾ã¨ã—ã¦ã€ŒæŠ•æ©Ÿçš„ãƒ‡ã‚³ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ã€ã¨å‘¼ã°ã‚Œã‚‹ã“ã¨ã‚‚ã‚ã‚Šã¾ã™ã€‚ã“ã®æ‰‹æ³•ã‚’ç°¡å˜ã«è¨€ã†ã¨ã€å°ã•ãªãƒ¢ãƒ‡ãƒ«ï¼ˆãƒ‰ãƒ©ãƒ•ãƒˆãƒ¢ãƒ‡ãƒ«ï¼‰ã§è¤‡æ•°
serihiro 2025/06/20
LLM

speculative_decoding
ãƒªãƒ³ã‚¯
GitHub - FasterDecoding/Medusa: Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
serihiro 2025/06/19
LLM

speculative_decoding
ãƒªãƒ³ã‚¯
GitHub - SafeAILab/EAGLE: Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
serihiro 2025/06/19
LLM

speculative_decoding
ãƒªãƒ³ã‚¯
1 2 3 4 5 6 æ¬¡ã®ãƒšãƒ¼ã‚¸

ãŠçŸ¥ã‚‰ã›

ã‚‚ã£ã¨èªã‚€

å…¬å¼Twitter

@HatenaBookmark
ãƒªãƒªãƒ¼ã‚¹ã€éšœå®³æƒ…å ±ãªã©ã®ã‚µãƒ¼ãƒ“ã‚¹ã®ãŠçŸ¥ã‚‰ã›
@hatebu
æœ€æ–°ã®äººæ°—ã‚¨ãƒ³ãƒˆãƒªãƒ¼ã®é…ä¿¡

ã‚ãƒ¼ãƒœãƒ¼ãƒ‰ã‚·ãƒ§ãƒ¼ãƒˆã‚«ãƒƒãƒˆä¸€è¦§

jæ¬¡ã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

kå‰ã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

lã‚ã¨ã§èªã‚€

eã‚³ãƒ¡ãƒ³ãƒˆä¸€è¦§ã‚’é–‹ã

oãƒšãƒ¼ã‚¸ã‚’é–‹ã

è¨å®šã‚’å¤‰æ›´ã—ã¾ã—ãŸx