Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
流式解码
目前的流式输出方式是每次都完整decode一遍整个输出内容,并且刷新屏幕。这样效率不是很高,同时也不利于通过网络来流式传输数据。最理想的方式应该是做到流式解码,每次只解码增量部分,于是我添加了一个SentencePiece的流式解码器以及一个配套的demo。
代码说明
解码器的原始实现参考的是SentencePieceProcessor的CPP代码,重写为python代码的同时为流式解码做了一些更改,主要集中在多字节编码的处理和句首空格(bos_ws)处理上。可以避免现有流式输出中有可能出现的因为多字节编码未完整输出而产生的ReplacementCharacter(�)。同时也可以实现无闪烁的控制台打字机输出以及更方便的基于websocket的传输。
流式解码器方法
一共三个方法,分别是
put()``end()
以及get()
,put用于输入index并解码,end用于结束流式解码器,get用于从解码缓冲区中获取内容并清空缓冲区,详细使用方法见demo。一些希望
希望这个解码器未来可以进入到模型代码仓库中(HuggingFace那个),并且整合进stream_chat方法中。
如果有什么疑问和建议,欢迎和我讨论。