03a-token-counter

2024. 1. 12. 18:49langchain

대화 기억

추가 재료: 토큰 카운터

이 노트북에서는 다양한 대화 메모리 유형에 대해 대화에 사용된 토큰 수를 계산합니다.

필요한 라이브러리를 설치하는 것부터 시작합니다.

In [ ]:
!python3 -m venv langchain
!source langchain/bin/activate
!pip install -U langchain openai transformers

필수 라이브러리 및 개체를 가져옵니다.

In [ ]:
from getpass import getpass

import openai
from langchain import OpenAI
from langchain.chains import LLMChain, ConversationChain
from langchain.chains.conversation.memory import (
    ConversationBufferMemory,
    ConversationSummaryMemory,
    ConversationBufferWindowMemory,
    ConversationSummaryBufferMemory
)
from langchain.callbacks import get_openai_callback
from tqdm.auto import tqdm

노트북을 실행하기 위해 OpenAI의 gpt-3.5-turbo 모델을 사용하겠습니다.

In [ ]:
OPENAI_API_KEY = "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"
In [ ]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
    temperature=0, 
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-3.5-turbo'  # can be used with llms like 'text-davinci-003'
)

각 호출 중에 사용된 토큰 수를 계산하기 위해 count_tokens 함수를 정의합니다.

In [ ]:
def count_tokens(chain, query):
def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
    return {
        'result': result,
        'token_count': cb.total_tokens
    }

대화 기능을 정의해 보겠습니다.

In [ ]:
queries = [
    "Good morning AI?",
    """My interest here is to explore the potential of integrating Large
    Language Models with external knowledge""",
    "I just want to analyze the different possibilities. What can you think of?",
    "What about the use of retrieval augmentation, can that be used as well?",
    """That's very interesting, can you tell me more about this? Like what
    systems would I use to store the information and retrieve relevant info?""",
    """Okay that's cool, I've been hearing about 'vector databases', are they
    relevant in this context?""",
    """Okay that's useful, but how do I go from my external knowledge to
    creating these 'vectors'? I have no idea how text can become a vector?""",
    """Well I don't think I'd be using word embeddings right? If I wanted to
    store my documents in this vector database, I suppose I would need to
    transform the documents into vectors? Maybe I can use the 'sentence
    embeddings' for this, what do you think?""",
    """Can sentence embeddings only represent sentences of text? That seems
    kind of small to capture any meaning from a document? Is there any approach
    that can encode at least a paragraph of text?""",
    """Huh, interesting. I do remember reading something about 'mpnet' or
    'minilm' sentence 'transformer' models that could encode small to
    medium sized paragraphs. Am I wrong about this?""",
    """Ah that's great to hear, do you happen to know how much text I can feed
    into these types of models?""",
    """I've never heard of hierarchical embeddings, could you explain those in
    more detail?""",
    """So is it like you have a transformer model or something else that creates
    sentence level embeddings, then you feed all of the sentence level
    embeddings into another separate neural network that knows how to merge
    multiple sentence embeddings into a single embedding?""",
    """Could you explain this process step by step from start to finish? Explain
    like I'm very new to this space, assume I don't have much prior knowledge
    of embeddings, neural nets, etc""",
    """Awesome thanks! Are there any popular 'heirarchical neural network'
    models that I can look up? Or maybe just the second stage that creates the
    hierarchical embeddings?""",
    "It seems like these HAN models are quite old, is there anything more recent?",
    "Can you explain the difference between transformer-XL and longformer?",
    "How much text can be encoded by each of these models?",
    """Okay very interesting, so before returning to earlier in the conversation.
    I understand now that there are a lot of different transformer (and not
    transformer) based models for creating the embeddings from vectors. Is that
    correct?""",
    """Perfect, so I understand text can be encoded into these embeddings. But
    what then? Once I have my embeddings what do I do?""",
    """I'd like to use these embeddings to help a chatbot or a question-answering
    system answer questions with help from this external knowledge base. I
    suppose this would come under information retrieval? Could you explain that
    process in a little more detail?""",
    """Okay great, that sounds like what I'm hoping to do. When you say the
    'chatbot or question-answering system generates an embedding', what do you
    mean exactly?""",
    """Ah okay, I understand, so it isn't the 'chatbot' model specifically
    creating the embedding right? That's how I understood your earlier comment.
    It seems more like there is a separate embedding model? And that encodes
    the query, then we retrieve the set of relevant documents from the
    external knowledge base? How is that information then used by the chatbot
    or question-answering system exactly?""",
    """Okay but how is the information provided to the chatbot or
    question-answering system?""",
    """So the retrieved information is given to the chatbot / QA system as plain
    text? But then how do we pass in the original query? How can the system
    distinguish between a user's query and all of this additional information?""",
    """That doesn't seem correct to me, my question is — if we are giving the
    chatbot / QA system the user's query AND retrieved information from an
    external knowledge base, and it's all fed into the model as plain text,
    how does the model know what part of the plain text is a query vs. retrieved
    information?""",
    """Yes I get that, but in the text passed to the model, how do we identify
    user prompt vs retrieved information?"""

]

def talk(conversation_chain):
    tokens_used = []
    # we loop through the conversation above, counting token usage as we go
    for user_query in tqdm(queries):
        try:
            res = count_tokens(conversation_chain, user_query)
            tokens_used.append(res['token_count'])
        except openai.error.InvalidRequestError:
            # we hit the token limit of the model, so break
            break
    return tokens_used

우리가 사용할 대화 체인 세트를 만듭니다.

In [ ]:
conversation_chains = {
    'ConversationBufferMemory': ConversationChain(
        llm=llm, memory=ConversationBufferMemory()
    ),
    'ConversationSummaryMemory': ConversationChain(
        llm=llm, memory=ConversationSummaryMemory(llm=llm)
    ),
    'ConversationBufferWindowMemory(k=6)': ConversationChain(
        llm=llm, memory=ConversationBufferWindowMemory(k=6)
    ),
    'ConversationBufferWindowMemory(k=12)': ConversationChain(
        llm=llm, memory=ConversationBufferWindowMemory(k=12)
    ),
    'ConversationSummaryBufferMemory(k=6)': ConversationChain(
        llm=llm, memory=ConversationSummaryBufferMemory(
            llm=llm,
            max_token_limit=650
        )
    ),
    'ConversationSummaryBufferMemory(k=12)': ConversationChain(
        llm=llm, memory=ConversationSummaryBufferMemory(
            llm=llm,
            max_token_limit=1_300
        )
    )
}
In [ ]:
counts = {}
# loop through each of our memory types above
for key, chain in conversation_chains.items():
    print(key)
    counts[key] = talk(chain)
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(12,8))
max_tokens = 4096

colors = ["#1c17ff", "#738FAB", "#f77f00", "#fcbf49", "#38c172", "#4dc0b5"]

for i, (key, count) in enumerate(counts.items()):
    color = colors[i]
    sns.lineplot(
        x=range(1, len(count)+1),
        y=count,
        label=key,
        color=color
    )
    if max_tokens in count:
        plt.plot(
            len(count), max_tokens, marker="X", color="red", markersize=10
        )

plt.show()
In [ ]:
출처 : https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03a-token-counter.ipynb

'langchain' 카테고리의 다른 글

06-langchain-agents  (1) 2024.01.14
04-langchain-chat  (0) 2024.01.13
03-langchain-conversational-memory  (2) 2024.01.11
02-langchain-체인  (1) 2024.01.10
01-langchain-prompt-templates  (1) 2024.01.09