当前位置：首页 > news >正文

旅游社网站建设规划书网站建设开场白

news 2025/10/17 4:54:59

旅游社网站建设规划书,网站建设开场白,建设手机银行,为什么选择做网站编辑作者#xff1a;LUCA WINTERGERST 在本博客中#xff0c;我们将测试一个使用 OpenAI 的 Python 应用程序并分析其性能以及运行该应用程序的成本。使用从应用程序收集的数据#xff0c;我们还将展示如何将 LLMs 成到你的应用程序中。在之前的博客文章中#xff0c;我们构建…作者LUCA WINTERGERST 在本博客中我们将测试一个使用 OpenAI 的 Python 应用程序并分析其性能以及运行该应用程序的成本。使用从应用程序收集的数据我们还将展示如何将 LLMs 成到你的应用程序中。在之前的博客文章中我们构建了一个小型 Python 应用程序该应用程序使用向量搜索和 BM25 的组合来查询 Elasticsearch以帮助在专有数据集中找到最相关的结果。然后最热门的结果会传递给 OpenAI它会为我们解答问题。在本博客中我们将测试使用 OpenAI 的 Python 应用程序并分析其性能以及运行该应用程序的成本。使用从应用程序收集的数据我们还将展示如何将大型语言模型 (LLM) 集成到你的应用程序中。作为额外的奖励我们将尝试回答这个问题为什么 ChatGPT 逐字打印其输出使用 Elastic APM 检测应用程序如果你有机会尝试我们的示例应用程序你可能会注意到从搜索界面加载结果的速度没有你期望的那么快。现在的问题是这是否来自我们首先在 Elasticsearch 中运行查询的两阶段方法或者缓慢的行为是否来自 OpenAI或者是否是两者的组合。使用 Elastic APM我们可以轻松地检测该应用程序以获得更好的外观。我们需要为检测做的所有事情如下我们将在博客文章末尾以及 GitHub 存储库中展示完整的示例 import elasticapm # the APM Agent is initialized apmClient elasticapm.Client(service_nameelasticdocs-gpt-v2-streaming)# the default instrumentation is applied # this will instrument the most common libraries, as well as outgoing http requests elasticapm.instrument() 由于我们的示例应用程序使用 Streamlit因此我们还需要启动至少一项 transaction 并最终再次结束它。此外我们还可以向 APM 提供有关 transaction 结果的信息以便我们可以正确跟踪故障。 # start the APM transaction apmClient.begin_transaction(user-query)(...)elasticapm.set_transaction_outcome(success)# or failure for unsuccessful transactions # elasticapm.set_transaction_outcome(success)# end the APM transaction apmClient.end_transaction(user-query) 就是这样 ---- 这足以为我们的应用程序提供完整的 APM 工具。话虽这么说我们将在这里做一些额外的工作以获得一些更有趣的数据。第一步我们将用户的查询添加到 APM 元数据中。通过这种方式我们可以检查用户尝试搜索的内容并可以分析一些流行的查询或重现错误。 elasticapm.label(queryquery) 在我们与 OpenAI 对话的异步方法中我们还将添加一些更多的检测以便我们可以更好地可视化我们收到的 tokens并收集额外的统计数据。 async with elasticapm.async_capture_span(openaiChatCompletion, span_typeopenai):async for chunk in await openai.ChatCompletion.acreate(engineengine, messages[{role: system, content: You are a helpful assistant.}, {role: user, content: truncated_prompt}],streamTrue,):content chunk[choices][0].get(delta, {}).get(content)# since we have the streamTrue option, we can get the output as it comes in# one iteration is one token# we start a new span here for each token. These spans will be aggregated# into a compressed span automaticallywith elasticapm.capture_span(token, leafTrue, span_typehttp):if content is not None:# concatenate the output to the previous one, so have the full response at the endoutput content# with every token we get, we update the elementelement.markdown(output) 最后在应用程序的最后阶段我们还将向 APM 交易添加 token 数量和大致成本。这将使我们能够稍后可视化这些指标并将它们与应用程序性能相关联。如果你不使用流式传输则 OpenAI 响应将包含一个 total_tokens 字段它是你发送的上下文和返回的响应的总和。如果你使用 streamTrue 选项那么你有责任计算 token 数量或近似数量。一个常见的建议是对英文文本使用 “(len(prompt) len(response)) / 4”但特别是代码片段可能会偏离这种近似值。如果你需要更准确的数字你可以使用 tiktoken 等库来计算 token 数量。 # add the number of tokens as a metadata label elasticapm.label(openai_tokens st.session_state[openai_current_tokens]) # add the approximate cost as a metadata label # currently the cost is $0.002 / 1000 tokens elasticapm.label(openai_cost st.session_state[openai_current_tokens] / 1000 * 0.002) 检查 APM 数据 — Elasticsearch 或 OpenAI 哪个更慢对应用程序进行检测后快速查看 “Dependencies” 可以让我们更好地了解正在发生的情况。看起来我们对 Elasticsearch 的请求平均在 125 毫秒内返回而 OpenAI 需要 8,500 毫秒才能完成请求。此屏幕截图是在不使用流式传输的应用程序版本上拍摄的。如果你使用流式传输则默认检测仅考虑依赖项响应时间中的初始 POST 请求而不考虑流式传输完整响应所需的时间。如果你自己已经使用过 ChatGPT你可能想知道为什么 UI 单独打印每个单词而不是立即返回完整的响应。事实证明如果你使用免费版本这实际上并不是为了诱使你付费这更多的是推理模型的限制。简而言之为了计算下一个 token模型还需要考虑最后一个 token。所以并行化的空间不大。由于每个 token 都是单独处理的因此在运行下一个 token 的计算时该 token 也可以发送到客户端。为了改善用户体验在使用 ChatCompletion 功能时使用流式方法会很有帮助。这样用户可以在生成完整响应的同时开始使用第一个结果。你可以在下面的 GIF 中看到这种行为。即使所有三个响应仍在加载用户也可以向下滚动并检查已有的内容。如前所述我们添加了比最低限度更多的自定义检测。这使我们能够获得有关我们的时间花在哪里的详细信息。让我们看一下完整的跟踪看看这个流的实际情况。我们的应用程序配置为从 Elasticsearch 获取前三名点击然后针对 OpenAI 并行运行一个 ChatCompletion 请求。正如我们在屏幕截图中看到的加载单个结果大约需要 15 秒。我们还可以看到返回较大响应的 OpenAI 请求需要更长的时间才能返回。但这只是一个请求。所有请求都会发生这种行为吗响应时间和支持我们之前主张的 token 数量之间是否存在明显的相关性分析成本和响应时间我们还可以使用自定义仪表板并根据 APM 数据创建可视化效果而不是使用 Elastic APM 来可视化数据。我们可以构建两个有趣的图表显示响应中的 token 数量与请求持续时间之间的关系。我们可以看到返回的 token 越多第一个图表中的 x 轴持续时间就越长第一个图表中的 y 轴。在右图中我们还可以看到无论返回的 token 总数x 轴有多少每返回 100 个 token 的持续时间几乎保持在 4 秒左右。如果你想提高使用 OpenAI 模型的应用程序的响应能力最好告诉模型保持简短的响应。除此之外我们还可以跟踪我们的总支出和每个页面加载的平均成本以及其他统计数据。对于我们的示例应用程序单次搜索的成本约为 1.1 美分。这个数字听起来并不高但它可能不会很快出现在你的公共网站上作为搜索选项。对于公司内部数据和偶尔使用的搜索界面来说这个成本可以忽略不计。在我们的测试中我们在 Azure 中使用 OpenAI API 时也经常遇到错误这最终导致我们向示例应用程序添加了一个具有指数退避的重试循环。我们还可以使用 Elastic APM 捕获这些错误。 while tries 5:try:print(request to openai for task number: str(index) attempt: str(tries))async with elasticapm.async_capture_span(openaiChatCompletion, span_typeopenai):async for chunk in await openai.ChatCompletion.acreate(engineengine, messages[{role: system, content: You are a helpful assistant.}, {role: user, content: truncated_prompt}],streamTrue,):content chunk[choices][0].get(delta, {}).get(content)counter 1with elasticapm.capture_span(token, leafTrue, span_typehttp):if content is not None:output contentelement.markdown(output)breakexcept Exception as e:client elasticapm.get_client()# capture the exception using Elastic APM and send it to the apm serverclient.capture_exception()tries 1time.sleep(tries * tries / 2)if tries 5:element.error(Error: str(e))else:print(retrying...) 然后任何捕获的错误都会在瀑布图中可见作为发生故障的跨度的一部分。此外Elastic APM 还提供所有错误的概述。在下面的屏幕截图中你可以看到我们偶尔遇到的 RateLimitError 和 APIConnectionError。使用我们粗略的指数重试机制我们可以缓解大多数此类问题。延迟和失败的 transaction 关联借助 Elastic APM 代理捕获的所有内置元数据以及我们添加的自定义标签我们可以轻松分析性能与任何元数据如服务版本、用户查询等之间是否存在任何相关性。如下所示查询 “How can I mount and index on a frozen node?” 之间存在很小的相关性。和较慢的响应时间。可以对任何导致错误的事务进行类似的分析。在此示例中“How do I create an ingest pipeline” 这两个查询比其他查询更频繁地失败导致它们在此相关性分析中脱颖而出。 import elasticapm # the APM Agent is initialized apmClient elasticapm.Client(service_nameelasticdocs-gpt-v2-streaming)# the default instrumentation is applied # this will instrument the most common libraries, as well as outgoing http requests elasticapm.instrument()# if a user clicks the Search button in the UI if submit_button:# start the APM transaction apmClient.begin_transaction(user-query) # add custom labels to the transaction, so we can see the users question in the API UI elasticapm.label(queryquery)async with elasticapm.async_capture_span(openaiChatCompletion, span_typeopenai):async for chunk in await openai.ChatCompletion.acreate(engineengine, messages[{role: system, content: You are a helpful assistant.}, {role: user, content: truncated_prompt}],streamTrue,):content chunk[choices][0].get(delta, {}).get(content)# since we have the streamTrue option, we can get the output as it comes in# one iteration is one tokenwith elasticapm.capture_span(token, leafTrue, span_typehttp):if content is not None:# concatenate the output to the previous one, so have the full response at the endoutput content# with every token we get, we update the elementelement.markdown(output) async def achat_gpt(prompt, result, index, element, modelgpt-3.5-turbo, max_tokens1024, max_context_tokens4000, safety_margin1000):output # we create on overall Span here to track the total process of doing the completionasync with elasticapm.async_capture_span(openaiChatCompletion, span_typeopenai):async for chunk in await openai.ChatCompletion.acreate(engineengine, messages[{role: system, content: You are a helpful assistant.}, {role: user, content: truncated_prompt}],streamTrue,):content chunk[choices][0].get(delta, {}).get(content)# since we have the streamTrue option, we can get the output as it comes in# one iteration is one token, so we create one small span for eachwith elasticapm.capture_span(token, leafTrue, span_typehttp):if content is not None:# concatenate the output to the previous one, so have the full response at the endoutput content# with every token we get, we update the elementelement.markdown(output) 在本博客中我们测试了一个用 Python 编写的应用程序以使用 OpenAI 并分析其性能。我们研究了响应延迟和失败的事务并评估了运行应用程序的成本。我们希望本指南对你有用详细了解 Elasticsearch 和 AI 的可能性。在这篇博文中我们可能使用了第三方生成式人工智能工具这些工具由其各自所有者拥有和运营。 Elastic 对第三方工具没有任何控制权我们对其内容、操作或使用不承担任何责任也不对您使用此类工具可能产生的任何损失或损害负责。使用人工智能工具处理个人、敏感或机密信息时请务必谨慎。你提交的任何数据都可能用于人工智能培训或其他目的。无法保证你提供的信息将得到安全或保密。在使用之前你应该熟悉任何生成式人工智能工具的隐私惯例和使用条款。本文提到的成本基于当前 OpenAI API 定价以及我们在加载示例应用程序时调用它的频率。 Elastic、Elasticsearch 和相关标志是 Elasticsearch N.V. 的商标、徽标或注册商标。在美国和其他国家。所有其他公司和产品名称均为其各自所有者的商标、徽标或注册商标。原文ChatGPT and Elasticsearch: APM instrumentation, performance, and cost analysis — Elastic Search Labs

查看全文

http://www.tj-hxxt.cn/news/224606.html