03a-token-counter
2024. 1. 12. 18:49ㆍlangchain
대화 기억
추가 재료: 토큰 카운터
이 노트북에서는 다양한 대화 메모리 유형에 대해 대화에 사용된 토큰 수를 계산합니다.
필요한 라이브러리를 설치하는 것부터 시작합니다.
In [ ]:
!python3 -m venv langchain
!source langchain/bin/activate
!pip install -U langchain openai transformers
필수 라이브러리 및 개체를 가져옵니다.
In [ ]:
from getpass import getpass
import openai
from langchain import OpenAI
from langchain.chains import LLMChain, ConversationChain
from langchain.chains.conversation.memory import (
ConversationBufferMemory,
ConversationSummaryMemory,
ConversationBufferWindowMemory,
ConversationSummaryBufferMemory
)
from langchain.callbacks import get_openai_callback
from tqdm.auto import tqdm
노트북을 실행하기 위해 OpenAI의 gpt-3.5-turbo 모델을 사용하겠습니다.
In [ ]:
OPENAI_API_KEY = "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"
In [ ]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
temperature=0,
openai_api_key=OPENAI_API_KEY,
model_name='gpt-3.5-turbo' # can be used with llms like 'text-davinci-003'
)
각 호출 중에 사용된 토큰 수를 계산하기 위해 count_tokens 함수를 정의합니다.
In [ ]:
def count_tokens(chain, query):
def count_tokens(chain, query):
with get_openai_callback() as cb:
result = chain.run(query)
return {
'result': result,
'token_count': cb.total_tokens
}
대화 기능을 정의해 보겠습니다.
In [ ]:
queries = [
"Good morning AI?",
"""My interest here is to explore the potential of integrating Large
Language Models with external knowledge""",
"I just want to analyze the different possibilities. What can you think of?",
"What about the use of retrieval augmentation, can that be used as well?",
"""That's very interesting, can you tell me more about this? Like what
systems would I use to store the information and retrieve relevant info?""",
"""Okay that's cool, I've been hearing about 'vector databases', are they
relevant in this context?""",
"""Okay that's useful, but how do I go from my external knowledge to
creating these 'vectors'? I have no idea how text can become a vector?""",
"""Well I don't think I'd be using word embeddings right? If I wanted to
store my documents in this vector database, I suppose I would need to
transform the documents into vectors? Maybe I can use the 'sentence
embeddings' for this, what do you think?""",
"""Can sentence embeddings only represent sentences of text? That seems
kind of small to capture any meaning from a document? Is there any approach
that can encode at least a paragraph of text?""",
"""Huh, interesting. I do remember reading something about 'mpnet' or
'minilm' sentence 'transformer' models that could encode small to
medium sized paragraphs. Am I wrong about this?""",
"""Ah that's great to hear, do you happen to know how much text I can feed
into these types of models?""",
"""I've never heard of hierarchical embeddings, could you explain those in
more detail?""",
"""So is it like you have a transformer model or something else that creates
sentence level embeddings, then you feed all of the sentence level
embeddings into another separate neural network that knows how to merge
multiple sentence embeddings into a single embedding?""",
"""Could you explain this process step by step from start to finish? Explain
like I'm very new to this space, assume I don't have much prior knowledge
of embeddings, neural nets, etc""",
"""Awesome thanks! Are there any popular 'heirarchical neural network'
models that I can look up? Or maybe just the second stage that creates the
hierarchical embeddings?""",
"It seems like these HAN models are quite old, is there anything more recent?",
"Can you explain the difference between transformer-XL and longformer?",
"How much text can be encoded by each of these models?",
"""Okay very interesting, so before returning to earlier in the conversation.
I understand now that there are a lot of different transformer (and not
transformer) based models for creating the embeddings from vectors. Is that
correct?""",
"""Perfect, so I understand text can be encoded into these embeddings. But
what then? Once I have my embeddings what do I do?""",
"""I'd like to use these embeddings to help a chatbot or a question-answering
system answer questions with help from this external knowledge base. I
suppose this would come under information retrieval? Could you explain that
process in a little more detail?""",
"""Okay great, that sounds like what I'm hoping to do. When you say the
'chatbot or question-answering system generates an embedding', what do you
mean exactly?""",
"""Ah okay, I understand, so it isn't the 'chatbot' model specifically
creating the embedding right? That's how I understood your earlier comment.
It seems more like there is a separate embedding model? And that encodes
the query, then we retrieve the set of relevant documents from the
external knowledge base? How is that information then used by the chatbot
or question-answering system exactly?""",
"""Okay but how is the information provided to the chatbot or
question-answering system?""",
"""So the retrieved information is given to the chatbot / QA system as plain
text? But then how do we pass in the original query? How can the system
distinguish between a user's query and all of this additional information?""",
"""That doesn't seem correct to me, my question is — if we are giving the
chatbot / QA system the user's query AND retrieved information from an
external knowledge base, and it's all fed into the model as plain text,
how does the model know what part of the plain text is a query vs. retrieved
information?""",
"""Yes I get that, but in the text passed to the model, how do we identify
user prompt vs retrieved information?"""
]
def talk(conversation_chain):
tokens_used = []
# we loop through the conversation above, counting token usage as we go
for user_query in tqdm(queries):
try:
res = count_tokens(conversation_chain, user_query)
tokens_used.append(res['token_count'])
except openai.error.InvalidRequestError:
# we hit the token limit of the model, so break
break
return tokens_used
우리가 사용할 대화 체인 세트를 만듭니다.
In [ ]:
conversation_chains = {
'ConversationBufferMemory': ConversationChain(
llm=llm, memory=ConversationBufferMemory()
),
'ConversationSummaryMemory': ConversationChain(
llm=llm, memory=ConversationSummaryMemory(llm=llm)
),
'ConversationBufferWindowMemory(k=6)': ConversationChain(
llm=llm, memory=ConversationBufferWindowMemory(k=6)
),
'ConversationBufferWindowMemory(k=12)': ConversationChain(
llm=llm, memory=ConversationBufferWindowMemory(k=12)
),
'ConversationSummaryBufferMemory(k=6)': ConversationChain(
llm=llm, memory=ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=650
)
),
'ConversationSummaryBufferMemory(k=12)': ConversationChain(
llm=llm, memory=ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=1_300
)
)
}
In [ ]:
counts = {}
# loop through each of our memory types above
for key, chain in conversation_chains.items():
print(key)
counts[key] = talk(chain)
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(12,8))
max_tokens = 4096
colors = ["#1c17ff", "#738FAB", "#f77f00", "#fcbf49", "#38c172", "#4dc0b5"]
for i, (key, count) in enumerate(counts.items()):
color = colors[i]
sns.lineplot(
x=range(1, len(count)+1),
y=count,
label=key,
color=color
)
if max_tokens in count:
plt.plot(
len(count), max_tokens, marker="X", color="red", markersize=10
)
plt.show()
In [ ]:
출처 : https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03a-token-counter.ipynb
'langchain' 카테고리의 다른 글
06-langchain-agents (1) | 2024.01.14 |
---|---|
04-langchain-chat (0) | 2024.01.13 |
03-langchain-conversational-memory (2) | 2024.01.11 |
02-langchain-체인 (1) | 2024.01.10 |
01-langchain-prompt-templates (1) | 2024.01.09 |