evaluationの人気記事 49件 - はてなブックマーク

1 - 40 件 / 49件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

evaluationの検索結果1 - 40 件 / 49件

タグ検索の該当結果が少ないため、タイトル検索結果を表示しています。

evaluationに関するエントリは49件あります。 LLM、組織、マネジメントなどが関連タグです。人気エントリには『エンジニア組織30人の壁を超えるための評価システムとマネジメントのスケール / Scaling evaluation system and management』などがあります。

エンジニア組織30人の壁を超えるための評価システムとマネジメントのスケール / Scaling evaluation system and management
- 127 users
- speakerdeck.com/yoshikiiida
- テクノロジー
- 2024/08/08
2024夏のジンジニアMeetup!　〜みんなで学ぼう！開発組織の評価制度と運用〜 https://jinjineer.connpass.com/event/323746/
新米マネージャーの初めての目標設定と評価 / New manager's first goal setting and evaluation
- 57 users
- speakerdeck.com/kaminashi
- テクノロジー
- 2024/03/02
2024/03/01： EMゆるミートアップ vol.6 〜LT会〜 https://em-yuru-meetup.connpass.com/event/308552/ 新米マネージャーの初めての目標設定と評価倉澤直弘 EM
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
- 50 users
- speakerdeck.com/chaspy
- テクノロジー
- 2024/06/15
CNDS2024 https://event.cloudnativedays.jp/cnds2024/
- 組織
- 技術
- あとで読む
- slide
- データ
- management
作るだけなら簡単なLLMを“より優れたもの”にするには　「Pretraining」「Fine-Tuning」「Evaluation & Analysis」構築のポイント | ログミーBusiness
- 39 users
- logmi.jp
- テクノロジー
- 2023/12/05
より優れたLLMを作るために必要なこと秋葉拓哉氏：めでたくFine-Tuningもできた。これけっこう、びっくりするかもしれません。コードはさすがにゼロとはいかないと思いますが、ほとんど書かずに実はLLMは作れます。「さすがにこんなんじゃゴミみたいなモデルしかできないだろう」と思われるかもしれませんが、おそらく余計なことをしなければこれだけでも、まあまあそれっぽいLLMにはなるかなと思います。なので、ちょっと、先ほどの鈴木先生（鈴木潤氏）の話と若干矛盾してしまって恐縮なのですが、僕のスタンスは、LLMを作るだけであれば思っているよりは簡単かなと思います。ここまで前半でした。とはいえ、じゃあ、これをやったらGPT-4になるのかっていったら当然ならないわけです。そこにやはりギャップがあるわけですよね。「それは何なのか？」を次に考えていきましょうか。ここはかなりキリがないのですが、挙げられ
- llm
- あとで読む
- ai

Best Practices for LLM Evaluation of RAG Applications
- 39 users
- www.databricks.com
- テクノロジー
- 2023/09/16
Unified governance for all data, analytics and AI assets
- LLM
- rag
- AI
- あとで読む
Off-Policy Evaluationの基礎とZOZOTOWN大規模公開実データおよびパッケージ紹介 - ZOZO TECH BLOG
- 34 users
- techblog.zozo.com
- テクノロジー
- 2020/09/03
※AMP表示の場合、数式が正しく表示されません。数式を確認する場合は通常表示版をご覧ください ※2020年11月7日に、「Open Bandit Pipelineの使い方」の節に修正を加えました。修正では、パッケージの更新に伴って、実装例を新たなバージョンに対応させました。詳しくは対応するrelease noteをご確認ください。今後、データセット・パッケージ・論文などの更新情報はGoogle Groupにて随時周知する予定です。こちらも良ければフォローしてみてください。また新たに「国際会議ワークショップでの反応」という章を追記しました。 ZOZO研究所と共同研究をしている東京工業大学の齋藤優太です。普段は、反実仮想機械学習の理論と応用をつなぐような研究をしています。反実仮想機械学習に関しては、拙著のサーベイ記事をご覧ください。本記事では、機械学習に基づいて作られた意思決定の性能をオフラ
GitHub - yahoojapan/JGLUE: JGLUE: Japanese General Language Understanding Evaluation
- 24 users
- github.com/yahoojapan
- テクノロジー
- 2022/06/02
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- 自然言語処理
- NLP
- japanese
- 日本語
- yahoo
- search
- AI
GitHub - Stability-AI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models.
- 21 users
- github.com/Stability-AI
- テクノロジー
- 2023/06/07
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- LLM
- あとで読む
GitHub - Arize-ai/phoenix: AI Observability & Evaluation
- 15 users
- github.com/Arize-ai
- テクノロジー
- 2023/06/19
Phoenix is an open-source AI observability platform designed for experimentation, evaluation, and troubleshooting. It provides: Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation. Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals. Datasets - Create versioned datasets of examples for experimentation, evalu
- MLOps
- 機械学習
- LLM
- library
- github
- AI
- あとで読む
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
- 14 users
- github.com/confident-ai
- テクノロジー
- 2023/08/17
DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locall
- LLM
COVID-19 vaccine efficacy summary | Institute for Health Metrics and Evaluation
- 10 users
- www.healthdata.org
- 世の中
- 2021/05/06
- COVID-19
- data
PMスキル・評価制度を導入し、アウトカムを生み出すプロダクトマネジメント集団へ進化する道のりの共有 / How we introduced the PM skills and evaluation system and evolved into a product management group that produces outcomes
- 9 users
- speakerdeck.com/roki_n_
- テクノロジー
- 2021/10/26
pmconf2021登壇スライド。 Rettyがプロジェクトマネジメント一辺倒な組織から、アウトカムドリブンな開発ができるプロダクトマネジメントが根付いた組織に至るまでの成長の経過について。具体的な取り組みの一例を挙げると、私たちは力のあるプロダクトマネージャーを育てるために、PMのスキル…
Estimation of total and excess mortality due to COVID-19 | Institute for Health Metrics and Evaluation
- 9 users
- www.healthdata.org
- 世の中
- 2021/05/10
Estimation of total and excess mortality due to COVID-19 Published October 15, 2021 This page was updated on October 15, 2021 to reflect changes in our modeling strategy. View our previous methods published May 13, 2021 here. In our October 15 release, we introduced three major changes. First, we have very substantially updated the data and methods used to estimate excess mortality related to the
- COVID-19
- 統計
Evaluation of science advice during the COVID-19 pandemic in Sweden - Humanities and Social Sciences Communications
- 8 users
- www.nature.com
- 学び
- 2022/04/01
Sweden was well equipped to prevent the pandemic of COVID-19 from becoming serious. Over 280 years of collaboration between political bodies, authorities, and the scientific community had yielded many successes in preventive medicine. Sweden’s population is literate and has a high level of trust in authorities and those in power. During 2020, however, Sweden had ten times higher COVID-19 death rat
- 生物
GitHub - st-tech/zr-obp: Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
- 8 users
- github.com/st-tech
- テクノロジー
- 2020/08/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- oss
GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness
- 7 users
- github.com/pfnet-research
- テクノロジー
- 2023/12/04
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- あとで読む
Top Evaluation Metrics for RAG Failures
- 7 users
- towardsdatascience.com
- テクノロジー
- 2024/02/12
Figure 1: Root Cause Workflows for LLM RAG Applications (flowchart created by author) If you have been experimenting with large language models (LLMs) for search and retrieval tasks, you have likely come across retrieval augmented generation (RAG) as a technique to add relevant contextual information to LLM generated responses. By connecting an LLM to private data, RAG can enable a better response
- あとで読む
KDD 2024 LLM Evaluation Tutorial
- 5 users
- sites.google.com
- テクノロジー
- 2024/08/27
Grounding and Evaluation for Large Language Models (Tutorial) With the ongoing rapid adoption of Artificial Intelligence (AI) based systems in high-stakes domains such as financial services, healthcare and life sciences, hiring and human resources, education, societal infrastructure, and national security, it is crucial to develop and deploy the underlying AI models and systems in a responsible ma
- tutorial
- あとで読む
論文紹介 Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
- 5 users
- speakerdeck.com/hagino3000
- エンタメ
- 2021/10/15
社内論文読み会の資料です Mehrotra, Rishabh, et al. "Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfac…
- 音楽
Evaluation method of UX “The User Experience Honeycomb” | blog / bookslope
- 5 users
- bookslope.jp
- テクノロジー
- 2020/07/08
ウェブサイトを評価する・レビューする方法にはさまざまな視点が必要になると思いますが、市場の流れから考えて「UX」視点が必要だとする見方があります。以前から、利用者視点というものを評価方法として加えている調査会社であれば、当然の流れといえますが、そうした場合のUXの評価とはユーザーテストを実施して実際に被験者に利用してもらうことが多いと思います。ユーザテストのシナリオ作成においては、もっぱらそうした検討がされていると思いますが、評価方法としてUXを考える場合、「UXハニカム構造」がベースになるように思いました。 User Experience Design – Semantic Studios この記事に「The User Experience Honeycomb」というものがあり、これを「UXハニカム構造」と呼んでいるわけですが、UXを構成する要素には、Useful (役に立つ)・Usa
- UX
GitHub - FreedomIntelligence/LLMZoo: ⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
- 5 users
- github.com/FreedomIntelligence
- テクノロジー
- 2023/04/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
長期の評価に最適なWindows 10／11 Enterprise Evaluationともっと長く付き合う“裏ワザ”
- 4 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2024/02/28
長期の評価に最適なWindows 10／11 Enterprise Evaluationともっと長く付き合う“裏ワザ”：山市良のうぃんどうず日記（277） Windows 10／11 Enterpriseには、90日無料で評価できる「Evaluation」エディションがあります。ライセンスを購入しなくても、企業向けWindows 10／11をテスト、評価できるので、筆者はよく利用しています。そんなEvaluationエディションを可能な限り長く利用する裏ワザを幾つか紹介します。
- windows
Misplaced trust: When trust in science fosters belief in pseudoscience and the benefits of critical evaluation
- 4 users
- www.sciencedirect.com
- 学び
- 2021/07/06
At a time when pseudoscience threatens the survival of communities, understanding this vulnerability, and how to reduce it, is paramount. Four preregistered experiments (N = 532, N = 472, N = 605, N = 382) with online U.S. samples introduced false claims concerning a (fictional) virus created as a bioweapon, mirroring conspiracy theories about COVID-19, and carcinogenic effects of GMOs (Geneticall
OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future | OpenTofu
- 4 users
- opentofu.org
- テクノロジー
- 2024/07/29
July 29, 2024OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future Since the 1.7 release, the OpenTofu community and core team have been hard at work on much-requested features, making .tf code easier to write, reducing unnecessary boilerplate, improving performance, and more. We are happy to announce the immediate availability of OpenTofu 1.8 with the followin
【ML Tech RPT. 】第11回機械学習のモデルの評価方法 (Evaluation Metrics) を学ぶ (2) - Sansan Tech Blog
- 4 users
- buildersbox.corp-sansan.com
- テクノロジー
- 2020/02/28
DSOC研究員の吉村です. 弊社には「よいこ」という社内の部活のような社内制度があり, 私はその中のテニス部に所属しています. 月一程度で活動をしているのですが, 最近は新たに入社された部員も増えてきて新しい風を感じています. さて, 今回も前回に引き続き「機械学習のモデルの評価方法 (Evaluation Metrics)」に焦点を当てていきます. (今回も前回同様, "モデル" という言葉を機械学習のモデルという意味で用います.) 前回は, モデルを評価する観点や注意事項について確認しました. 今回からは, 各種問題設定ごとにどのような評価指標が存在し, それらが何を意味するのかについて見ていこうと思います. 今回は二値分類問題を取り扱います. 前回の記事の最後で, 多クラス (マルチクラス) 分類・回帰問題についても本記事で取り扱うと書きましたが, 量が多くなりすぎてしまったため,
The Generative AI Evaluation Company - Galileo
- 4 users
- www.rungalileo.io
- テクノロジー
- 2023/02/09
Evaluate, observe, and protect your GenAI applications Go beyond ‘vibe checks’ and asking GPT with the first end-to-end GenAI Stack, powered by Evaluation Foundation Models.
GitHub - huggingface/evaluation-guidebook: Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
- 4 users
- github.com/huggingface
- テクノロジー
- 2024/11/04
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Windows Server 2022 | Microsoft Evaluation Center
- 3 users
- www.microsoft.com
- テクノロジー
- 2021/08/19
In addition to your trial experience of Windows Server 2022, you can more easily add and manage languages and Features on Demand with the new Languages and Optional Features ISO. Download this ISO. This ISO is only available on Windows Server 2022 and combines the previously separate Features on Demand and Language Packs ISOs, and can be used as a FOD and Language pack repository. To learn about F
- Windows
「Microsoft Evaluation Center」に障害、評価版ソフトがダウンロード不能に／コミュニティサイトでダウンロードリンクを案内中
- 3 users
- forest.watch.impress.co.jp
- テクノロジー
- 2022/05/13
- Microsoft
- Software
- Web
Evaluation of Retrieval-Augmented Generation: A Survey
- 3 users
- arxiv.org
- 学び
- 2024/05/15
Retrieval-Augmented Generation (RAG) has recently gained traction in natural language processing. Numerous studies and real-world applications are leveraging its ability to enhance generative models through external information retrieval. Evaluating these RAG systems, however, poses unique challenges due to their hybrid structure and reliance on dynamic knowledge sources. To better understand thes
- 論文
- あとで読む
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
- 3 users
- arxiv.org
- テクノロジー
- 2020/08/23
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there has been growing research interest in this field. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of
Mandoline: Model Evaluation under Distribution Shift
- 3 users
- arxiv.org
- テクノロジー
- 2021/08/14
Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as im
Terms of Evaluation
- 3 users
- www.hashicorp.com
- テクノロジー
- 2020/05/30
Terms of Evaluation for HashiCorp SoftwareBefore you download and/or use our enterprise software for evaluation purposes, you will need to agree to a special set of terms (“Agreement”), which will be applicable for your use of the HashiCorp, Inc.’s (“HashiCorp”, “we”, or “us”) enterprise software. PLEASE READ THIS AGREEMENT CAREFULLY BEFORE INSTALLING OR USING THE SOFTWARE. THESE TERMS AND CONDITI
- law
論文紹介：ChatGPT で情報抽出タスクは解けるのか？�Is information extraction solved by ChatGPT? �An analysis of performance, evaluation criteria, robustness and errors
- 3 users
- speakerdeck.com/stktu
- テクノロジー
- 2023/07/12
論文紹介：ChatGPT で情報抽出タスクは解けるのか？ Is information extraction solved by ChatGPT? An analysis of performance, evaluation criteria, robustness and errors
- あとで読む
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.
- 3 users
- github.com/EleutherAI
- テクノロジー
- 2023/05/03
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
U.S. AI Safety Institute Signs Agreements Regarding AI Safety Research, Testing and Evaluation With Anthropic and OpenAI
- 3 users
- www.nist.gov
- テクノロジー
- 2024/08/30
GAITHERSBURG, Md. — Today, the U.S. Artificial Intelligence Safety Institute at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) announced agreements that enable formal collaboration on AI safety research, testing and evaluation with both Anthropic and OpenAI. Each company’s Memorandum of Understanding establishes the framework for the U.S. AI Safety Institut
- 人工知能
CAE (Continuous Access Evaluation: 継続的アクセス評価)
- 3 users
- jpazureid.github.io
- テクノロジー
- 2021/05/18
こんにちは。Azure Identity チームの金森です。みなさんは CAE (Continuous Access Evaluation: 継続的アクセス評価) という機能をご存知でしょうか。 2021 年 11 月現在、以下のようなお知らせがあり、目にされた方も多いのではないかと思います。 Microsoft 365 管理ポータルのメッセージセンターに MC255540 (Continuous access evaluation on by default) として情報が公開送信元 : Microsoft Azure [email protected] から TRACKING ID: 5T93-LTG として以下の件名のメールでお知らせ -> Continuous access evaluation will be enabled in premium Azu
Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related
- 3 users
- www.sciencedirect.com
- 学び
- 2020/02/27
•Students do not learn more from professors with higher student evaluation of teaching (SET) ratings. •Previus meta-analyses of SET/learning correlations in multisection studies are not interprettable. •Re-analyses of previous meta-analyses of multisection studies indicate that SET ratings explain at most 1% of variability in measures of student learning. •New meta-analyses of multisection studies
Amazon Bedrockで行うモデル評価入門 / Introduction to Model Evaluation in Amazon Bedrock
- 3 users
- speakerdeck.com/rkaga
- テクノロジー
- 2024/06/18
Bedrock Claude Night 2（JAWS-UG AI/ML支部 × 東京支部コラボ）のLT資料です。 https://jawsug-ai.connpass.com/event/319748/
- あとで読む
Evaluation of Suicides Among US Adolescents During the COVID-19 Pandemic
- 3 users
- jamanetwork.com
- 世の中
- 2022/08/17
Our website uses cookies to enhance your experience. By continuing to use our site, or clicking "Continue," you are agreeing to our Cookie Policy | Continue

新着記事

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx