**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 06:42

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 06:42

mutaguchi @mutaguchi@fedibird.com

2024年11月26日 06:42

https://dynomight.net/more-chess/
gpt-4o等のchat completionモデルと比べ、completionモデルであるgpt-3.5-turbo-instructだけチェスが強い現象があるらしい。

しかしgpt-4oでも、「まず現在までの手順を復唱した後に、次の手を出力せよ」のように指示し、さらにfew-shots promptingすれば、gpt-3.5-turbo-instruct並に強くなったらしい。

このことから、チャットテンプレートに含まれる、role指定トークン（OpenAIのChatMLなら"<|im_end|>\n<|im_start|>assistant\n"）によりチェス手順が分割されるのが、chat completionモデルでの性能劣化を招いているのでは、という考察がされている。

なおLlama3などオープン系モデルは、completionモデルであっても、どれも等しくチェスが弱いらしい。なので、OpenAIの基盤モデルは他のオープン系モデルに比べてそもそも地頭が良いのでは、という考察も。

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 06:59

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 06:59

2024年11月26日 06:59

mutaguchi @mutaguchi@fedibird.com

gpt-3.5-turbo-instructだけチェス強い現象は興味深いし、その理由を検証するための実験手順は適切だし、結論にも妥当性を感じる。良い研究であった。

実際、chat completionモデルは、生成タスクにおいてはナーフ要素になり得るよね。小説なんか書かせたいときでも、小説の冒頭を与えて続きを書け、という指示を出すより、小説の冒頭を与えて続きを補完させる方が、ずっと良い結果になる。

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 07:14

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 07:14

2024年11月26日 07:14

mutaguchi @mutaguchi@fedibird.com

chatチューニングは、Q&Aタスクとか会話タスクが得意というだけで、一般的な生成タスクだと性能が落ちがち。

でも、AIの「安全性」（エロ、犯罪などを出力しないこと）担保のためには、chatチューニングは不可欠なんだろうなあ。

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 07:22

**mutaguchi** @mutaguchi@fedibird.com · 2024年11月26日 07:22

2024年11月26日 07:22

mutaguchi @mutaguchi@fedibird.com

あとOpenAIの基盤モデルは、やっぱりそこらのオープン系LLMとは格が違うんだと思う。パラメータ数だけでは測れない何かがある。たぶんデータセットの質なんだろうな。

**mutaguchi** @mutaguchi@fedibird.com · 2024-11-26T07:34:44Z

mutaguchi @mutaguchi@fedibird.com

チェスプロンプト、「お前はチェスのグランドマスターだ。次の手を考えろ」という内容だけど、completionモデルに特化するなら、「以下にグランドマスター対アマチュアのチェス譜面を提示します。」とかにした方が強くなりそうな気もする。

2024年11月26日 07:34 · · Web · · ·

ログインして会話に参加

トレンドタグ

リソース

開発者向け

Mastodon とは？

fedibird.com

さらに…