增加支持多模态，有图片识别功能和图片生成功能`esperyong/gpt-cmd#2`

> > >

✓ Completed in 8 minutes, 6 months ago using GPT-4 • Book a call • Report a bug

Progress

Creategptcli/vision.py8e032a4

1import base64
2import json
3
4import aiohttp
5
6
7class VisionAPIHandler:
8    def __init__(self, api_key: str):
9        self.api_key = api_key
10        self.headers = {"Authorization": f"Bearer {api_key}"}
11
12    async def generate_image(self, prompt: str, n_images: int = 1):
13        async with aiohttp.ClientSession() as session:
14            payload = {"prompt": prompt, "n": n_images}
15            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
16                if response.status == 200:
17                    return await response.json()
18                else:
19                    raise Exception(f"Failed to generate image: {response.status}")
20
21    async def recognize_image(self, image_path: str):
22        image_base64 = encode_image_to_base64(image_path)
23        async with aiohttp.ClientSession() as session:
24            payload = {"image": image_base64}
25            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
26                if response.status == 200:
27                    return await response.json()
28                else:
29                    raise Exception(f"Failed to recognize image: {response.status}")
30
31def encode_image_to_base64(image_path: str) -> str:
32    with open(image_path, "rb") as image_file:
33        return base64.b64encode(image_file.read()).decode('utf-8')
34
35def decode_base64_to_image(base64_str: str, output_path: str):
36    with open(output_path, "wb") as output_file:
37        output_file.write(base64.b64decode(base64_str))
38
39def decode_response_to_text(response: dict) -> str:
40    return response.get("data", {}).get("text", "")
41

This file will contain the classes and methods necessary for interacting with OpenAI's DALL-E 3 and GPT-4 vision APIs.
Import necessary libraries for HTTP requests and asynchronous operations, considering the dependencies listed in pyproject.toml, such as aiohttp.
Define a class VisionAPIHandler with methods generate_image for interacting with DALL-E 3 and recognize_image for GPT-4 vision capabilities. These methods should accept parameters for API requests and return the API responses.
Include utility functions for encoding images to the required format for API requests and decoding API responses back into images or textual descriptions.

Run GitHub Actions forgptcli/vision.py

Ran GitHub Actions for 8e032a4aca41545d846c369b4db52470f5a2547e:

Creategptcli/cli_vision_commands.py28bb96a

1import argparse
2import sys
3
4from gptcli.vision import VisionAPIHandler
5
6
7def cli_generate_image():
8    parser = argparse.ArgumentParser(description="Generate images from prompts.")
9    parser.add_argument("prompt", type=str, help="Prompt for generating images.")
10    parser.add_argument("--n_images", type=int, default=1, help="Number of images to generate.")
11    args = parser.parse_args()
12
13    try:
14        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
15        response = vision_handler.generate_image(prompt=args.prompt, n_images=args.n_images)
16        for image_data in response["data"]:
17            print(f"Generated Image URL: {image_data['url']}")
18    except Exception as e:
19        print(f"Error generating image: {e}")
20        sys.exit(1)
21
22def cli_recognize_image():
23    parser = argparse.ArgumentParser(description="Recognize text from an image.")
24    parser.add_argument("image_path", type=str, help="Path to the image file.")
25    args = parser.parse_args()
26
27    try:
28        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
29        response = vision_handler.recognize_image(image_path=args.image_path)
30        print(f"Recognized Text: {response['data']['text']}")
31    except Exception as e:
32        print(f"Error recognizing image: {e}")
33        sys.exit(1)
34
35if __name__ == "__main__":
36    parser = argparse.ArgumentParser(description="CLI for image recognition and generation.")
37    parser.add_argument("--generate", action="store_true", help="Generate images from prompts.")
38    parser.add_argument("--recognize", action="store_true", help="Recognize text from an image.")
39    args = parser.parse_args()
40
41    if args.generate:
42        cli_generate_image()
43    elif args.recognize:
44        cli_recognize_image()
45    else:
46        parser.print_help()
47

This file will define the CLI commands for image recognition and generation.
Import VisionAPIHandler from gptcli/vision.py and necessary CLI utilities.
Implement functions cli_generate_image and cli_recognize_image that parse user input, call the respective methods in VisionAPIHandler, and display the results to the user.
These functions should handle errors gracefully, providing user-friendly messages for common issues like invalid input or API errors.

Run GitHub Actions forgptcli/cli_vision_commands.py

Ran GitHub Actions for 28bb96a1cec6bb4893ee66ebf293df9acde2e151:

Modifygptcli/config.py:15-30

Changed `gptcli/config.py` in `c1dcaad`

Add new configuration options for DALL-E 3 and GPT-4 vision API keys, dalle_api_key and gpt4_vision_api_key, respectively.
Ensure these new keys are optional to maintain backward compatibility and default to None if not set.
Update the GptCliConfig dataclass to include these new fields.

Modified file with Assistant API

Instructions (1)

You are an expert software developer assigned to write code to complete the user's request.
You are diligent and tireless and always COMPLETELY IMPLEMENT the needed code!
You NEVER leave comments describing code without implementing it!
Always use best practices when coding.
Respect and use existing conventions, libraries, etc that are already present in the code base.
Your job is to make edits to the file to complete the user "# Request".

# Instructions
Modify the snippets above according to the request by calling the search_and_replace function.
* Keep whitespace and comments.
* Make the minimum necessary search_and_replaces to make changes to the snippets. Only write diffs for lines that have been asked to be changed.
* Write multiple small changes instead of a single large change.

User (2)

# Repo & Issue Metadata
Repo: gpt-cmd: No description provided.
Issue Title: 增加支持多模态，有图片识别功能和图片生成功能

Issue Description: ### Details
## 增加支持多模态，有图片识别功能和图片生成功能

目前只是使用了chatgpt的[Chat Completions API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能。



**Summaries of links found in the content:**

https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能:

The page metadata indicates that the page is not accessible due to JavaScript being turned off and cookies not being enabled. Therefore, it is not possible to provide a summary of the page content or any code snippets.

User (3)

You have previously changed these files:
<changed_file file_path="gptcli/vision.py">
--- 
+++ 
@@ -0,0 +1,40 @@
+import base64
+import json
+
+import aiohttp
+
+
+class VisionAPIHandler:
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self.headers = {"Authorization": f"Bearer {api_key}"}
+
+    async def generate_image(self, prompt: str, n_images: int = 1):
+        async with aiohttp.ClientSession() as session:
+            payload = {"prompt": prompt, "n": n_images}
+            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to generate image: {response.status}")
+
+    async def recognize_image(self, image_path: str):
+        image_base64 = encode_image_to_base64(image_path)
+        async with aiohttp.ClientSession() as session:
+            payload = {"image": image_base64}
+            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to recognize image: {response.status}")
+
+def encode_image_to_base64(image_path: str) -> str:
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode('utf-8')
+
+def decode_base64_to_image(base64_str: str, output_path: str):
+    with open(output_path, "wb") as output_file:
+        output_file.write(base64.b64decode(base64_str))
+
+def decode_response_to_text(response: dict) -> str:
+    return response.get("data", {}).get("text", "")
</changed_file>
<changed_file file_path="gptcli/cli_vision_commands.py">
--- 
+++ 
@@ -0,0 +1,46 @@
+import argparse
+import sys
+
+from gptcli.vision import VisionAPIHandler
+
+
+def cli_generate_image():
+    parser = argparse.ArgumentParser(description="Generate images from prompts.")
+    parser.add_argument("prompt", type=str, help="Prompt for generating images.")
+    parser.add_argument("--n_images", type=int, default=1, help="Number of images to generate.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.generate_image(prompt=args.prompt, n_images=args.n_images)
+        for image_data in response["data"]:
+            print(f"Generated Image URL: {image_data['url']}")
+    except Exception as e:
+        print(f"Error generating image: {e}")
+        sys.exit(1)
+
+def cli_recognize_image():
+    parser = argparse.ArgumentParser(description="Recognize text from an image.")
+    parser.add_argument("image_path", type=str, help="Path to the image file.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.recognize_image(image_path=args.image_path)
+        print(f"Recognized Text: {response['data']['text']}")
+    except Exception as e:
+        print(f"Error recognizing image: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="CLI for image recognition and generation.")
+    parser.add_argument("--generate", action="store_true", help="Generate images from prompts.")
+    parser.add_argument("--recognize", action="store_true", help="Recognize text from an image.")
+    args = parser.parse_args()
+
+    if args.generate:
+        cli_generate_image()
+    elif args.recognize:
+        cli_recognize_image()
+    else:
+        parser.print_help()
</changed_file>

User (4)


<section id="A">
import os
from typing import Dict, List, Optional, TypedDict
from attr import dataclass
import yaml

from gptcli.assistant import AssistantConfig
from gptcli.llama import LLaMAModelConfig


CONFIG_FILE_PATHS = [
    os.path.join(os.path.expanduser("~"), ".config", "gpt-cli", "gpt.yml"),
    os.path.join(os.path.expanduser("~"), ".gptrc"),
]


</section id="A">
<section id="B">
@dataclass
class GptCliConfig:
    default_assistant: str = "general"
    markdown: bool = True
    show_price: bool = True
    api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
    openai_api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
    anthropic_api_key: Optional[str] = os.environ.get("ANTHROPIC_API_KEY")
    google_api_key: Optional[str] = os.environ.get("GOOGLE_API_KEY")
    log_file: Optional[str] = None
    log_level: str = "INFO"
    conversations_save_directory: str = os.path.join(os.path.expanduser("~"), "Documents", "gpt-cli", "conversations")
    conversations_render_directory: str = os.path.join(os.path.expanduser("~"), "develop", "opencsg", "work", "docs", "opencsg", "gpt-generate-md")
    assistants: Dict[str, AssistantConfig] = {}
    interactive: Optional[bool] = None
    llama_models: Optional[Dict[str, LLaMAModelConfig]] = None


def choose_config_file(paths: List[str]) -> str:
    for path in paths:
        if os.path.isfile(path):
            return path
    return ""


def read_yaml_config(file_path: str) -> GptCliConfig:
    with open(file_path, "r") as file:
        config = yaml.safe_load(file)
        return GptCliConfig(
            **config,
        )
</section id="B">

User (5)

# Request
• Add new configuration options for DALL-E 3 and GPT-4 vision API keys, `dalle_api_key` and `gpt4_vision_api_key`, respectively.
• Ensure these new keys are optional to maintain backward compatibility and default to `None` if not set.
• Update the `GptCliConfig` dataclass to include these new fields.

Function Call (6)

{"analysis_and_identification": "Adding DALL-E 3 and GPT-4 vision API keys to the GptCliConfig class", "replaces_to_make": [{"section_id": "B", "old_code": "    google_api_key: Optional[str] = os.environ.get(\"GOOGLE_API_KEY\")", "new_code": "    google_api_key: Optional[str] = os.environ.get(\"GOOGLE_API_KEY\")\n    dalle_api_key: Optional[str] = os.environ.get(\"DALLE_API_KEY\", None)\n    gpt4_vision_api_key: Optional[str] = os.environ.get(\"GPT4_VISION_API_KEY\", None)"}]}

Function Call Output (7)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -22,6 +22,8 @@
     openai_api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
     anthropic_api_key: Optional[str] = os.environ.get("ANTHROPIC_API_KEY")
     google_api_key: Optional[str] = os.environ.get("GOOGLE_API_KEY")
+    dalle_api_key: Optional[str] = os.environ.get("DALLE_API_KEY", None)
+    gpt4_vision_api_key: Optional[str] = os.environ.get("GPT4_VISION_API_KEY", None)
     log_file: Optional[str] = None
     log_level: str = "INFO"
     conversations_save_directory: str = os.path.join(os.path.expanduser("~"), "Documents", "gpt-cli", "conversations")

```
Here are the new code sections:




<section id="A">
import os
from typing import Dict, List, Optional, TypedDict
from attr import dataclass
import yaml

from gptcli.assistant import AssistantConfig
from gptcli.llama import LLaMAModelConfig


CONFIG_FILE_PATHS = [
    os.path.join(os.path.expanduser("~"), ".config", "gpt-cli", "gpt.yml"),
    os.path.join(os.path.expanduser("~"), ".gptrc"),
]


</section id="A">
<section id="B">
@dataclass
class GptCliConfig:
    default_assistant: str = "general"
    markdown: bool = True
    show_price: bool = True
    api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
    openai_api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
    anthropic_api_key: Optional[str] = os.environ.get("ANTHROPIC_API_KEY")
    google_api_key: Optional[str] = os.environ.get("GOOGLE_API_KEY")
    dalle_api_key: Optional[str] = os.environ.get("DALLE_API_KEY", None)
    gpt4_vision_api_key: Optional[str] = os.environ.get("GPT4_VISION_API_KEY", None)
    log_file: Optional[str] = None
    log_level: str = "INFO"
    conversations_save_directory: str = os.path.join(os.path.expanduser("~"), "Documents", "gpt-cli", "conversations")
</section id="B">
<section id="C">
    conversations_render_directory: str = os.path.join(os.path.expanduser("~"), "develop", "opencsg", "work", "docs", "opencsg", "gpt-generate-md")
    assistants: Dict[str, AssistantConfig] = {}
    interactive: Optional[bool] = None
    llama_models: Optional[Dict[str, LLaMAModelConfig]] = None


def choose_config_file(paths: List[str]) -> str:
    for path in paths:
        if os.path.isfile(path):
            return path
    return ""


def read_yaml_config(file_path: str) -> GptCliConfig:
    with open(file_path, "r") as file:
        config = yaml.safe_load(file)
        return GptCliConfig(
            **config,
        )
</section id="C">

You can continue to make changes to the code sections and call the `search_and_replace` function again.

Run GitHub Actions forgptcli/config.py

Ran GitHub Actions for c1dcaadcd8d575dd5558739bfde8fc6f041dbf7b:

Modifygptcli/openai.py:11-64

Changed `gptcli/openai.py` in `8499c44`

Integrate calls to VisionAPIHandler within the existing OpenAICompletionProvider class for cases where image processing is required.
This integration will likely involve checking the type of input (text vs. image) and deciding whether to call the chat completion API or the vision API based on this input.
Add logic to handle the responses from the vision API, converting them into a format suitable for the CLI output.

Modified file with Assistant API

Instructions (1)

You are an expert software developer assigned to write code to complete the user's request.
You are diligent and tireless and always COMPLETELY IMPLEMENT the needed code!
You NEVER leave comments describing code without implementing it!
Always use best practices when coding.
Respect and use existing conventions, libraries, etc that are already present in the code base.
Your job is to make edits to the file to complete the user "# Request".

# Instructions
Modify the snippets above according to the request by calling the search_and_replace function.
* Keep whitespace and comments.
* Make the minimum necessary search_and_replaces to make changes to the snippets. Only write diffs for lines that have been asked to be changed.
* Write multiple small changes instead of a single large change.

User (2)

# Repo & Issue Metadata
Repo: gpt-cmd: No description provided.
Issue Title: 增加支持多模态，有图片识别功能和图片生成功能

Issue Description: ### Details
## 增加支持多模态，有图片识别功能和图片生成功能

目前只是使用了chatgpt的[Chat Completions API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能。



**Summaries of links found in the content:**

https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能:

The page metadata indicates that the page is not accessible due to JavaScript being turned off and cookies not being enabled. Therefore, it is not possible to provide a summary of the page content or any code snippets.

User (3)

You have previously changed these files:
<changed_file file_path="gptcli/vision.py">
--- 
+++ 
@@ -0,0 +1,40 @@
+import base64
+import json
+
+import aiohttp
+
+
+class VisionAPIHandler:
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self.headers = {"Authorization": f"Bearer {api_key}"}
+
+    async def generate_image(self, prompt: str, n_images: int = 1):
+        async with aiohttp.ClientSession() as session:
+            payload = {"prompt": prompt, "n": n_images}
+            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to generate image: {response.status}")
+
+    async def recognize_image(self, image_path: str):
+        image_base64 = encode_image_to_base64(image_path)
+        async with aiohttp.ClientSession() as session:
+            payload = {"image": image_base64}
+            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to recognize image: {response.status}")
+
+def encode_image_to_base64(image_path: str) -> str:
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode('utf-8')
+
+def decode_base64_to_image(base64_str: str, output_path: str):
+    with open(output_path, "wb") as output_file:
+        output_file.write(base64.b64decode(base64_str))
+
+def decode_response_to_text(response: dict) -> str:
+    return response.get("data", {}).get("text", "")
</changed_file>
<changed_file file_path="gptcli/cli_vision_commands.py">
--- 
+++ 
@@ -0,0 +1,46 @@
+import argparse
+import sys
+
+from gptcli.vision import VisionAPIHandler
+
+
+def cli_generate_image():
+    parser = argparse.ArgumentParser(description="Generate images from prompts.")
+    parser.add_argument("prompt", type=str, help="Prompt for generating images.")
+    parser.add_argument("--n_images", type=int, default=1, help="Number of images to generate.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.generate_image(prompt=args.prompt, n_images=args.n_images)
+        for image_data in response["data"]:
+            print(f"Generated Image URL: {image_data['url']}")
+    except Exception as e:
+        print(f"Error generating image: {e}")
+        sys.exit(1)
+
+def cli_recognize_image():
+    parser = argparse.ArgumentParser(description="Recognize text from an image.")
+    parser.add_argument("image_path", type=str, help="Path to the image file.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.recognize_image(image_path=args.image_path)
+        print(f"Recognized Text: {response['data']['text']}")
+    except Exception as e:
+        print(f"Error recognizing image: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="CLI for image recognition and generation.")
+    parser.add_argument("--generate", action="store_true", help="Generate images from prompts.")
+    parser.add_argument("--recognize", action="store_true", help="Recognize text from an image.")
+    args = parser.parse_args()
+
+    if args.generate:
+        cli_generate_image()
+    elif args.recognize:
+        cli_recognize_image()
+    else:
+        parser.print_help()
</changed_file>
<changed_file file_path="gptcli/config.py">
--- 
+++ 
@@ -43,4 +43,6 @@
         config = yaml.safe_load(file)
         return GptCliConfig(
             **config,
-        )+        )
+    dalle_api_key: Optional[str] = os.environ.get("DALLE_API_KEY")
+    gpt4_vision_api_key: Optional[str] = os.environ.get("GPT4_VISION_API_KEY")
</changed_file>

User (4)

Relevant files in this PR:

<relevant_file file_path="gptcli/vision.py">
import base64
import json

import aiohttp


class VisionAPIHandler:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"Authorization": f"Bearer {api_key}"}

    async def generate_image(self, prompt: str, n_images: int = 1):
        async with aiohttp.ClientSession() as session:
            payload = {"prompt": prompt, "n": n_images}
            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
                if response.status == 200:
                    return await response.json()
                else:
                    raise Exception(f"Failed to generate image: {response.status}")

    async def recognize_image(self, image_path: str):
        image_base64 = encode_image_to_base64(image_path)
        async with aiohttp.ClientSession() as session:
            payload = {"image": image_base64}
            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
                if response.status == 200:
                    return await response.json()
                else:
                    raise Exception(f"Failed to recognize image: {response.status}")

def encode_image_to_base64(image_path: str) -> str:
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def decode_base64_to_image(base64_str: str, output_path: str):
    with open(output_path, "wb") as output_file:
        output_file.write(base64.b64decode(base64_str))

def decode_response_to_text(response: dict) -> str:
    return response.get("data", {}).get("text", "")

</relevant_file>

User (5)


<section id="A">
from typing import Iterator, List, cast
import openai
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam

import tiktoken

from gptcli.completion import CompletionProvider, Message


class OpenAICompletionProvider(CompletionProvider):
</section id="A">
<section id="B">
    def __init__(self):
        self.client = OpenAI(api_key=openai.api_key)

    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
        kwargs = {}
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

</section id="B">
<section id="C">
        if stream:
            response_iter = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                stream=True,
                model=args["model"],
                **kwargs,
            )

            for response in response_iter:
                next_choice = response.choices[0]
                if next_choice.finish_reason is None and next_choice.delta.content:
                    yield next_choice.delta.content
</section id="C">
<section id="D">
        else:
            response = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                model=args["model"],
                stream=False,
                **kwargs,
            )
            next_choice = response.choices[0]
            if next_choice.message.content:
</section id="D">
<section id="E">
                yield next_choice.message.content


def num_tokens_from_messages_openai(messages: List[Message], model: str) -> int:
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        # every message follows <im_start>{role/name}\n{content}<im_end>\n
        num_tokens += 4
        for key, value in message.items():
            assert isinstance(value, str)
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
    return num_tokens_from_messages_openai([completion], model)
</section id="E">

User (6)

# Request
• Integrate calls to `VisionAPIHandler` within the existing `OpenAICompletionProvider` class for cases where image processing is required.
• This integration will likely involve checking the type of input (text vs. image) and deciding whether to call the chat completion API or the vision API based on this input.
• Add logic to handle the responses from the vision API, converting them into a format suitable for the CLI output.

Function Call (7)

{"analysis_and_identification": "Add the imports and the new method to handle vision API calls within the OpenAICompletionProvider class.", "replaces_to_make": [{"section_id": "A", "old_code": "from openai.types.chat import ChatCompletionMessageParam", "new_code": "from openai.types.chat import ChatCompletionMessageParam\nfrom gptcli.vision import VisionAPIHandler\nimport asyncio"}]}

Function Call Output (8)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -2,6 +2,8 @@
 import openai
 from openai import OpenAI
 from openai.types.chat import ChatCompletionMessageParam
+from gptcli.vision import VisionAPIHandler
+import asyncio
 
 import tiktoken
 

```
Here are the new code sections:




<section id="A">
from typing import Iterator, List, cast
import openai
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam
from gptcli.vision import VisionAPIHandler
import asyncio

import tiktoken

from gptcli.completion import CompletionProvider, Message


</section id="A">
<section id="B">
class OpenAICompletionProvider(CompletionProvider):
    def __init__(self):
        self.client = OpenAI(api_key=openai.api_key)

    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
        kwargs = {}
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

</section id="B">
<section id="C">
        if stream:
            response_iter = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                stream=True,
                model=args["model"],
                **kwargs,
            )

            for response in response_iter:
                next_choice = response.choices[0]
                if next_choice.finish_reason is None and next_choice.delta.content:
                    yield next_choice.delta.content
</section id="C">
<section id="D">
        else:
            response = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                model=args["model"],
                stream=False,
                **kwargs,
            )
            next_choice = response.choices[0]
            if next_choice.message.content:
</section id="D">
<section id="E">
                yield next_choice.message.content


def num_tokens_from_messages_openai(messages: List[Message], model: str) -> int:
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        # every message follows <im_start>{role/name}\n{content}<im_end>\n
        num_tokens += 4
        for key, value in message.items():
            assert isinstance(value, str)
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
    return num_tokens_from_messages_openai([completion], model)
</section id="E">

You can continue to make changes to the code sections and call the `search_and_replace` function again.

Function Call (9)

{"analysis_and_identification": "Integrate the image API handler in the OpenAICompletionProvider class constructor and add a new method to handle image inputs.", "replaces_to_make": [{"section_id": "B", "old_code": "        self.client = OpenAI(api_key=openai.api_key)", "new_code": "        self.client = OpenAI(api_key=openai.api_key)\n        self.vision_handler = VisionAPIHandler(api_key=openai.api_key)"}]}

Function Call Output (10)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -13,6 +13,7 @@
 class OpenAICompletionProvider(CompletionProvider):
     def __init__(self):
         self.client = OpenAI(api_key=openai.api_key)
+        self.vision_handler = VisionAPIHandler(api_key=openai.api_key)
 
     def complete(
         self, messages: List[Message], args: dict, stream: bool = False

```
Here are the new code sections:




<section id="A">
from typing import Iterator, List, cast
import openai
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam
from gptcli.vision import VisionAPIHandler
import asyncio

import tiktoken

from gptcli.completion import CompletionProvider, Message


</section id="A">
<section id="B">
class OpenAICompletionProvider(CompletionProvider):
    def __init__(self):
        self.client = OpenAI(api_key=openai.api_key)
        self.vision_handler = VisionAPIHandler(api_key=openai.api_key)

    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
</section id="B">
<section id="C">
        kwargs = {}
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

        if stream:
            response_iter = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                stream=True,
                model=args["model"],
                **kwargs,
            )

            for response in response_iter:
                next_choice = response.choices[0]
                if next_choice.finish_reason is None and next_choice.delta.content:
                    yield next_choice.delta.content
</section id="C">
<section id="D">
        else:
            response = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                model=args["model"],
                stream=False,
                **kwargs,
            )
            next_choice = response.choices[0]
            if next_choice.message.content:
</section id="D">
<section id="E">
                yield next_choice.message.content


def num_tokens_from_messages_openai(messages: List[Message], model: str) -> int:
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        # every message follows <im_start>{role/name}\n{content}<im_end>\n
        num_tokens += 4
        for key, value in message.items():
            assert isinstance(value, str)
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
    return num_tokens_from_messages_openai([completion], model)
</section id="E">

You can continue to make changes to the code sections and call the `search_and_replace` function again.

Function Call (11)

{"analysis_and_identification": "Add a new method to process image inputs using the vision API handler within the OpenAICompletionProvider class.", "replaces_to_make": [{"section_id": "E", "old_code": "def num_tokens_from_completion_openai(completion: Message, model: str) -> int:", "new_code": "def num_tokens_from_completion_openai(completion: Message, model: str) -> int:\n    return num_tokens_from_messages_openai([completion], model)\n\n    async def process_image_input(self, image_path: str) -> str:\n        try:\n            response = await self.vision_handler.recognize_image(image_path=image_path)\n            recognized_text = decode_response_to_text(response)\n            return recognized_text\n        except Exception as e:\n            return f'Error processing image: {e}'"}]}

Function Call Output (12)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -64,4 +64,13 @@
 
 
 def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
+    return num_tokens_from_messages_openai([completion], model)
+
+    async def process_image_input(self, image_path: str) -> str:
+        try:
+            response = await self.vision_handler.recognize_image(image_path=image_path)
+            recognized_text = decode_response_to_text(response)
+            return recognized_text
+        except Exception as e:
+            return f'Error processing image: {e}'
     return num_tokens_from_messages_openai([completion], model)
```
Here are the new code sections:




<section id="A">
from typing import Iterator, List, cast
import openai
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam
from gptcli.vision import VisionAPIHandler
import asyncio

import tiktoken

from gptcli.completion import CompletionProvider, Message


</section id="A">
<section id="B">
class OpenAICompletionProvider(CompletionProvider):
    def __init__(self):
        self.client = OpenAI(api_key=openai.api_key)
        self.vision_handler = VisionAPIHandler(api_key=openai.api_key)

    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
</section id="B">
<section id="C">
        kwargs = {}
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

        if stream:
            response_iter = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                stream=True,
                model=args["model"],
                **kwargs,
            )

            for response in response_iter:
                next_choice = response.choices[0]
                if next_choice.finish_reason is None and next_choice.delta.content:
                    yield next_choice.delta.content
</section id="C">
<section id="D">
        else:
            response = self.client.chat.completions.create(
                messages=cast(List[ChatCompletionMessageParam], messages),
                model=args["model"],
                stream=False,
                **kwargs,
            )
            next_choice = response.choices[0]
            if next_choice.message.content:
</section id="D">
<section id="E">
                yield next_choice.message.content


def num_tokens_from_messages_openai(messages: List[Message], model: str) -> int:
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        # every message follows <im_start>{role/name}\n{content}<im_end>\n
        num_tokens += 4
        for key, value in message.items():
            assert isinstance(value, str)
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
    return num_tokens_from_messages_openai([completion], model)

    async def process_image_input(self, image_path: str) -> str:
        try:
            response = await self.vision_handler.recognize_image(image_path=image_path)
            recognized_text = decode_response_to_text(response)
            return recognized_text
        except Exception as e:
            return f'Error processing image: {e}'
    return num_tokens_from_messages_openai([completion], model)
</section id="E">

You can continue to make changes to the code sections and call the `search_and_replace` function again.

Run GitHub Actions forgptcli/openai.py

Ran GitHub Actions for 8499c4483ebeb60a750fda739c1030cee78c5213:

Modifygptcli/anthropic.py:33-67

Changed `gptcli/anthropic.py` in `c7c4216`

Similar to the modifications in openai.py, integrate calls to VisionAPIHandler for image processing functionalities.
Ensure that the AnthropicCompletionProvider can handle both text and image inputs, directing each to the appropriate API (textual or vision) based on the input type.

Modified file with Assistant API

Instructions (1)

You are an expert software developer assigned to write code to complete the user's request.
You are diligent and tireless and always COMPLETELY IMPLEMENT the needed code!
You NEVER leave comments describing code without implementing it!
Always use best practices when coding.
Respect and use existing conventions, libraries, etc that are already present in the code base.
Your job is to make edits to the file to complete the user "# Request".

# Instructions
Modify the snippets above according to the request by calling the search_and_replace function.
* Keep whitespace and comments.
* Make the minimum necessary search_and_replaces to make changes to the snippets. Only write diffs for lines that have been asked to be changed.
* Write multiple small changes instead of a single large change.

User (2)

# Repo & Issue Metadata
Repo: gpt-cmd: No description provided.
Issue Title: 增加支持多模态，有图片识别功能和图片生成功能

Issue Description: ### Details
## 增加支持多模态，有图片识别功能和图片生成功能

目前只是使用了chatgpt的[Chat Completions API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能。



**Summaries of links found in the content:**

https://platform.openai.com/docs/guides/text-generation/chat-completions-api)接口。所以无法支持vision功能。我希望能够增加这个功能，根据用户的问题，判断是否需要调用dall-e-3和gpt-4-vision-preview来接受vision的内容和生成图片的功能:

The page metadata indicates that the page is not accessible due to JavaScript being turned off and cookies not being enabled. Therefore, it is not possible to provide a summary of the page content or any code snippets.

User (3)

You have previously changed these files:
<changed_file file_path="gptcli/vision.py">
--- 
+++ 
@@ -0,0 +1,40 @@
+import base64
+import json
+
+import aiohttp
+
+
+class VisionAPIHandler:
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self.headers = {"Authorization": f"Bearer {api_key}"}
+
+    async def generate_image(self, prompt: str, n_images: int = 1):
+        async with aiohttp.ClientSession() as session:
+            payload = {"prompt": prompt, "n": n_images}
+            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to generate image: {response.status}")
+
+    async def recognize_image(self, image_path: str):
+        image_base64 = encode_image_to_base64(image_path)
+        async with aiohttp.ClientSession() as session:
+            payload = {"image": image_base64}
+            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
+                if response.status == 200:
+                    return await response.json()
+                else:
+                    raise Exception(f"Failed to recognize image: {response.status}")
+
+def encode_image_to_base64(image_path: str) -> str:
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode('utf-8')
+
+def decode_base64_to_image(base64_str: str, output_path: str):
+    with open(output_path, "wb") as output_file:
+        output_file.write(base64.b64decode(base64_str))
+
+def decode_response_to_text(response: dict) -> str:
+    return response.get("data", {}).get("text", "")
</changed_file>
<changed_file file_path="gptcli/cli_vision_commands.py">
--- 
+++ 
@@ -0,0 +1,46 @@
+import argparse
+import sys
+
+from gptcli.vision import VisionAPIHandler
+
+
+def cli_generate_image():
+    parser = argparse.ArgumentParser(description="Generate images from prompts.")
+    parser.add_argument("prompt", type=str, help="Prompt for generating images.")
+    parser.add_argument("--n_images", type=int, default=1, help="Number of images to generate.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.generate_image(prompt=args.prompt, n_images=args.n_images)
+        for image_data in response["data"]:
+            print(f"Generated Image URL: {image_data['url']}")
+    except Exception as e:
+        print(f"Error generating image: {e}")
+        sys.exit(1)
+
+def cli_recognize_image():
+    parser = argparse.ArgumentParser(description="Recognize text from an image.")
+    parser.add_argument("image_path", type=str, help="Path to the image file.")
+    args = parser.parse_args()
+
+    try:
+        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
+        response = vision_handler.recognize_image(image_path=args.image_path)
+        print(f"Recognized Text: {response['data']['text']}")
+    except Exception as e:
+        print(f"Error recognizing image: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="CLI for image recognition and generation.")
+    parser.add_argument("--generate", action="store_true", help="Generate images from prompts.")
+    parser.add_argument("--recognize", action="store_true", help="Recognize text from an image.")
+    args = parser.parse_args()
+
+    if args.generate:
+        cli_generate_image()
+    elif args.recognize:
+        cli_recognize_image()
+    else:
+        parser.print_help()
</changed_file>
<changed_file file_path="gptcli/config.py">
--- 
+++ 
@@ -43,4 +43,6 @@
         config = yaml.safe_load(file)
         return GptCliConfig(
             **config,
-        )+        )
+    dalle_api_key: Optional[str] = os.environ.get("DALLE_API_KEY")
+    gpt4_vision_api_key: Optional[str] = os.environ.get("GPT4_VISION_API_KEY")
</changed_file>
<changed_file file_path="gptcli/openai.py">
--- 
+++ 
@@ -22,6 +22,24 @@
             kwargs["top_p"] = args["top_p"]
 
         if stream:
+from gptcli.vision import VisionAPIHandler
+
+        self.vision_handler = VisionAPIHandler(api_key=openai.api_key)
+        if input_type == "image":
+            if "image_path" in args:
+                try:
+                    response = await self.vision_handler.recognize_image(image_path=args["image_path"])
+                    yield decode_response_to_text(response)
+                except Exception as e:
+                    yield f"Error recognizing image: {e}"
+            elif "prompt" in args:
+                try:
+                    response = await self.vision_handler.generate_image(prompt=args["prompt"], n_images=args.get("n_images", 1))
+                    for image_data in response["data"]:
+                        yield f"Generated Image URL: {image_data['url']}"
+                except Exception as e:
+                    yield f"Error generating image: {e}"
+        elif input_type == "text" and stream:
             response_iter = self.client.chat.completions.create(
                 messages=cast(List[ChatCompletionMessageParam], messages),
                 stream=True,
@@ -61,4 +79,5 @@
 
 
 def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
-    return num_tokens_from_messages_openai([completion], model)+    return num_tokens_from_messages_openai([completion], model)
+from gptcli.vision import decode_response_to_text
</changed_file>

User (4)

Relevant files in this PR:

<relevant_file file_path="gptcli/vision.py">
import base64
import json

import aiohttp


class VisionAPIHandler:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"Authorization": f"Bearer {api_key}"}

    async def generate_image(self, prompt: str, n_images: int = 1):
        async with aiohttp.ClientSession() as session:
            payload = {"prompt": prompt, "n": n_images}
            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
                if response.status == 200:
                    return await response.json()
                else:
                    raise Exception(f"Failed to generate image: {response.status}")

    async def recognize_image(self, image_path: str):
        image_base64 = encode_image_to_base64(image_path)
        async with aiohttp.ClientSession() as session:
            payload = {"image": image_base64}
            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
                if response.status == 200:
                    return await response.json()
                else:
                    raise Exception(f"Failed to recognize image: {response.status}")

def encode_image_to_base64(image_path: str) -> str:
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def decode_base64_to_image(base64_str: str, output_path: str):
    with open(output_path, "wb") as output_file:
        output_file.write(base64.b64decode(base64_str))

def decode_response_to_text(response: dict) -> str:
    return response.get("data", {}).get("text", "")

</relevant_file>

User (5)


<section id="A">
import os
from typing import Iterator, List
import anthropic

from gptcli.completion import CompletionProvider, Message

api_key = os.environ.get("ANTHROPIC_API_KEY")


def get_client():
    if not api_key:
        raise ValueError("ANTHROPIC_API_KEY environment variable not set")

    return anthropic.Anthropic(api_key=api_key)


def role_to_name(role: str) -> str:
    if role == "system" or role == "user":
        return anthropic.HUMAN_PROMPT
    elif role == "assistant":
        return anthropic.AI_PROMPT
    else:
        raise ValueError(f"Unknown role: {role}")


</section id="A">
<section id="B">
def make_prompt(messages: List[Message]) -> str:
    prompt = "\n".join(
        [f"{role_to_name(message['role'])}{message['content']}" for message in messages]
    )
    prompt += f"{role_to_name('assistant')}"
    return prompt


class AnthropicCompletionProvider(CompletionProvider):
</section id="B">
<section id="C">
    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
        kwargs = {
            "prompt": make_prompt(messages),
            "stop_sequences": [anthropic.HUMAN_PROMPT],
            "max_tokens_to_sample": 4096,
            "model": args["model"],
        }
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

        client = get_client()
        if stream:
            response = client.completions.create(**kwargs, stream=True)
        else:
            response = [client.completions.create(**kwargs, stream=False)]

        for data in response:
            next_completion = data.completion
            yield next_completion


def num_tokens_from_messages_anthropic(messages: List[Message], model: str) -> int:
    prompt = make_prompt(messages)
    client = get_client()
    return client.count_tokens(prompt)


def num_tokens_from_completion_anthropic(message: Message, model: str) -> int:
    client = get_client()
    return client.count_tokens(message["content"])
</section id="C">

User (6)

# Request
• Similar to the modifications in `openai.py`, integrate calls to `VisionAPIHandler` for image processing functionalities.
• Ensure that the `AnthropicCompletionProvider` can handle both text and image inputs, directing each to the appropriate API (textual or vision) based on the input type.

Function Call (7)

{"analysis_and_identification": "Integrate VisionAPIHandler usage in AnthropicCompletionProvider for handling both text and image inputs. Ensure that similar functionalities as those implemented in openai.py are provided here.", "replaces_to_make": [{"section_id": "C", "old_code": "        if stream:\n            response = client.completions.create(**kwargs, stream=True)\n        else:\n            response = [client.completions.create(**kwargs, stream=False)]\n\n        for data in response:\n            next_completion = data.completion\n            yield next_completion", "new_code": "        from gptcli.vision import VisionAPIHandler, decode_response_to_text\n\n        vis_handler = VisionAPIHandler(api_key=api_key)\n\n        if args[\"input_type\"] == \"image\":\n            if \"image_path\" in args:\n                try:\n                    response = vis_handler.recognize_image(image_path=args[\"image_path\"])\n                    yield decode_response_to_text(response)\n                except Exception as e:\n                    yield f\"Error recognizing image: {e}\"\n            elif \"prompt\" in args:\n                try:\n                    response = vis_handler.generate_image(prompt=args[\"prompt\"], n_images=args.get(\"n_images\", 1))\n                    for image_data in response[\"data\"]:\n                        yield f\"Generated Image URL: {image_data['url']}\"\n                except Exception as e:\n                    yield f\"Error generating image: {e}\"\n        elif args[\"input_type\"] == \"text\":\n            if stream:\n                response = client.completions.create(**kwargs, stream=True)\n            else:\n                response = [client.completions.create(**kwargs, stream=False)]\n\n            for data in response:\n                next_completion = data.completion\n                yield next_completion"}]}

Function Call Output (8)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -47,14 +47,33 @@
             kwargs["top_p"] = args["top_p"]
 
         client = get_client()
-        if stream:
-            response = client.completions.create(**kwargs, stream=True)
-        else:
-            response = [client.completions.create(**kwargs, stream=False)]
+        from gptcli.vision import VisionAPIHandler, decode_response_to_text
 
-        for data in response:
-            next_completion = data.completion
-            yield next_completion
+        vis_handler = VisionAPIHandler(api_key=api_key)
+
+        if args["input_type"] == "image":
+            if "image_path" in args:
+                try:
+                    response = vis_handler.recognize_image(image_path=args["image_path"])
+                    yield decode_response_to_text(response)
+                except Exception as e:
+                    yield f"Error recognizing image: {e}"
+            elif "prompt" in args:
+                try:
+                    response = vis_handler.generate_image(prompt=args["prompt"], n_images=args.get("n_images", 1))
+                    for image_data in response["data"]:
+                        yield f"Generated Image URL: {image_data['url']}"
+                except Exception as e:
+                    yield f"Error generating image: {e}"
+        elif args["input_type"] == "text":
+            if stream:
+                response = client.completions.create(**kwargs, stream=True)
+            else:
+                response = [client.completions.create(**kwargs, stream=False)]
+
+            for data in response:
+                next_completion = data.completion
+                yield next_completion
 
 
 def num_tokens_from_messages_anthropic(messages: List[Message], model: str) -> int:

```
Here are the new code sections:




<section id="A">
import os
from typing import Iterator, List
import anthropic

from gptcli.completion import CompletionProvider, Message

api_key = os.environ.get("ANTHROPIC_API_KEY")


def get_client():
    if not api_key:
        raise ValueError("ANTHROPIC_API_KEY environment variable not set")

    return anthropic.Anthropic(api_key=api_key)


def role_to_name(role: str) -> str:
    if role == "system" or role == "user":
        return anthropic.HUMAN_PROMPT
    elif role == "assistant":
        return anthropic.AI_PROMPT
    else:
        raise ValueError(f"Unknown role: {role}")


</section id="A">
<section id="B">
def make_prompt(messages: List[Message]) -> str:
    prompt = "\n".join(
        [f"{role_to_name(message['role'])}{message['content']}" for message in messages]
    )
    prompt += f"{role_to_name('assistant')}"
    return prompt


class AnthropicCompletionProvider(CompletionProvider):
</section id="B">
<section id="C">
    def complete(
        self, messages: List[Message], args: dict, stream: bool = False
    ) -> Iterator[str]:
        kwargs = {
            "prompt": make_prompt(messages),
            "stop_sequences": [anthropic.HUMAN_PROMPT],
            "max_tokens_to_sample": 4096,
            "model": args["model"],
        }
        if "temperature" in args:
            kwargs["temperature"] = args["temperature"]
        if "top_p" in args:
            kwargs["top_p"] = args["top_p"]

        client = get_client()
        from gptcli.vision import VisionAPIHandler, decode_response_to_text

        vis_handler = VisionAPIHandler(api_key=api_key)

</section id="C">
<section id="D">
        if args["input_type"] == "image":
            if "image_path" in args:
                try:
                    response = vis_handler.recognize_image(image_path=args["image_path"])
                    yield decode_response_to_text(response)
                except Exception as e:
                    yield f"Error recognizing image: {e}"
            elif "prompt" in args:
                try:
                    response = vis_handler.generate_image(prompt=args["prompt"], n_images=args.get("n_images", 1))
                    for image_data in response["data"]:
                        yield f"Generated Image URL: {image_data['url']}"
                except Exception as e:
                    yield f"Error generating image: {e}"
</section id="D">
<section id="E">
        elif args["input_type"] == "text":
            if stream:
                response = client.completions.create(**kwargs, stream=True)
            else:
                response = [client.completions.create(**kwargs, stream=False)]

            for data in response:
                next_completion = data.completion
                yield next_completion


def num_tokens_from_messages_anthropic(messages: List[Message], model: str) -> int:
    prompt = make_prompt(messages)
    client = get_client()
    return client.count_tokens(prompt)


def num_tokens_from_completion_anthropic(message: Message, model: str) -> int:
    client = get_client()
    return client.count_tokens(message["content"])
</section id="E">

You can continue to make changes to the code sections and call the `search_and_replace` function again.

Run GitHub Actions forgptcli/anthropic.py

Ran GitHub Actions for c7c4216a5083769d45b01f713af3dd188bd21409:

Plan

This is based on the results of the Planning step. The plan may expand from failed GitHub Actions runs.

Creategptcli/vision.py8e032a4

1import base64
2import json
3
4import aiohttp
5
6
7class VisionAPIHandler:
8    def __init__(self, api_key: str):
9        self.api_key = api_key
10        self.headers = {"Authorization": f"Bearer {api_key}"}
11
12    async def generate_image(self, prompt: str, n_images: int = 1):
13        async with aiohttp.ClientSession() as session:
14            payload = {"prompt": prompt, "n": n_images}
15            async with session.post("https://api.openai.com/v1/images/generations", headers=self.headers, json=payload) as response:
16                if response.status == 200:
17                    return await response.json()
18                else:
19                    raise Exception(f"Failed to generate image: {response.status}")
20
21    async def recognize_image(self, image_path: str):
22        image_base64 = encode_image_to_base64(image_path)
23        async with aiohttp.ClientSession() as session:
24            payload = {"image": image_base64}
25            async with session.post("https://api.openai.com/v1/vision/recognitions", headers=self.headers, json=payload) as response:
26                if response.status == 200:
27                    return await response.json()
28                else:
29                    raise Exception(f"Failed to recognize image: {response.status}")
30
31def encode_image_to_base64(image_path: str) -> str:
32    with open(image_path, "rb") as image_file:
33        return base64.b64encode(image_file.read()).decode('utf-8')
34
35def decode_base64_to_image(base64_str: str, output_path: str):
36    with open(output_path, "wb") as output_file:
37        output_file.write(base64.b64decode(base64_str))
38
39def decode_response_to_text(response: dict) -> str:
40    return response.get("data", {}).get("text", "")
41

Run GitHub Actions forgptcli/vision.py

Creategptcli/cli_vision_commands.py28bb96a

1import argparse
2import sys
3
4from gptcli.vision import VisionAPIHandler
5
6
7def cli_generate_image():
8    parser = argparse.ArgumentParser(description="Generate images from prompts.")
9    parser.add_argument("prompt", type=str, help="Prompt for generating images.")
10    parser.add_argument("--n_images", type=int, default=1, help="Number of images to generate.")
11    args = parser.parse_args()
12
13    try:
14        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
15        response = vision_handler.generate_image(prompt=args.prompt, n_images=args.n_images)
16        for image_data in response["data"]:
17            print(f"Generated Image URL: {image_data['url']}")
18    except Exception as e:
19        print(f"Error generating image: {e}")
20        sys.exit(1)
21
22def cli_recognize_image():
23    parser = argparse.ArgumentParser(description="Recognize text from an image.")
24    parser.add_argument("image_path", type=str, help="Path to the image file.")
25    args = parser.parse_args()
26
27    try:
28        vision_handler = VisionAPIHandler(api_key="Your_API_Key_Here")
29        response = vision_handler.recognize_image(image_path=args.image_path)
30        print(f"Recognized Text: {response['data']['text']}")
31    except Exception as e:
32        print(f"Error recognizing image: {e}")
33        sys.exit(1)
34
35if __name__ == "__main__":
36    parser = argparse.ArgumentParser(description="CLI for image recognition and generation.")
37    parser.add_argument("--generate", action="store_true", help="Generate images from prompts.")
38    parser.add_argument("--recognize", action="store_true", help="Recognize text from an image.")
39    args = parser.parse_args()
40
41    if args.generate:
42        cli_generate_image()
43    elif args.recognize:
44        cli_recognize_image()
45    else:
46        parser.print_help()
47

Run GitHub Actions forgptcli/cli_vision_commands.py

Modifygptcli/config.py:15-30

Run GitHub Actions forgptcli/config.py

Modifygptcli/openai.py:11-64

Run GitHub Actions forgptcli/openai.py

Modifygptcli/anthropic.py:33-67

Run GitHub Actions forgptcli/anthropic.py

Code Snippets Found

This is based on the results of the Searching step.

gptcli/openai.py:10-64

10
11class OpenAICompletionProvider(CompletionProvider):
12    def __init__(self):
13        self.client = OpenAI(api_key=openai.api_key)
14
15    def complete(
16        self, messages: List[Message], args: dict, stream: bool = False
17    ) -> Iterator[str]:
18        kwargs = {}
19        if "temperature" in args:
20            kwargs["temperature"] = args["temperature"]
21        if "top_p" in args:
22            kwargs["top_p"] = args["top_p"]
23
24        if stream:
25            response_iter = self.client.chat.completions.create(
26                messages=cast(List[ChatCompletionMessageParam], messages),
27                stream=True,
28                model=args["model"],
29                **kwargs,
30            )
31
32            for response in response_iter:
33                next_choice = response.choices[0]
34                if next_choice.finish_reason is None and next_choice.delta.content:
35                    yield next_choice.delta.content
36        else:
37            response = self.client.chat.completions.create(
38                messages=cast(List[ChatCompletionMessageParam], messages),
39                model=args["model"],
40                stream=False,
41                **kwargs,
42            )
43            next_choice = response.choices[0]
44            if next_choice.message.content:
45                yield next_choice.message.content
46
47
48def num_tokens_from_messages_openai(messages: List[Message], model: str) -> int:
49    encoding = tiktoken.encoding_for_model(model)
50    num_tokens = 0
51    for message in messages:
52        # every message follows <im_start>{role/name}\n{content}<im_end>\n
53        num_tokens += 4
54        for key, value in message.items():
55            assert isinstance(value, str)
56            num_tokens += len(encoding.encode(value))
57            if key == "name":  # if there's a name, the role is omitted
58                num_tokens += -1  # role is always required and always 1 token
59    num_tokens += 2  # every reply is primed with <im_start>assistant
60    return num_tokens
61
62
63def num_tokens_from_completion_openai(completion: Message, model: str) -> int:
64    return num_tokens_from_messages_openai([completion], model)

gptcli/anthropic.py:0-68

1import os
2from typing import Iterator, List
3import anthropic
4
5from gptcli.completion import CompletionProvider, Message
6
7api_key = os.environ.get("ANTHROPIC_API_KEY")
8
9
10def get_client():
11    if not api_key:
12        raise ValueError("ANTHROPIC_API_KEY environment variable not set")
13
14    return anthropic.Anthropic(api_key=api_key)
15
16
17def role_to_name(role: str) -> str:
18    if role == "system" or role == "user":
19        return anthropic.HUMAN_PROMPT
20    elif role == "assistant":
21        return anthropic.AI_PROMPT
22    else:
23        raise ValueError(f"Unknown role: {role}")
24
25
26def make_prompt(messages: List[Message]) -> str:
27    prompt = "\n".join(
28        [f"{role_to_name(message['role'])}{message['content']}" for message in messages]
29    )
30    prompt += f"{role_to_name('assistant')}"
31    return prompt
32
33
34class AnthropicCompletionProvider(CompletionProvider):
35    def complete(
36        self, messages: List[Message], args: dict, stream: bool = False
37    ) -> Iterator[str]:
38        kwargs = {
39            "prompt": make_prompt(messages),
40            "stop_sequences": [anthropic.HUMAN_PROMPT],
41            "max_tokens_to_sample": 4096,
42            "model": args["model"],
43        }
44        if "temperature" in args:
45            kwargs["temperature"] = args["temperature"]
46        if "top_p" in args:
47            kwargs["top_p"] = args["top_p"]
48
49        client = get_client()
50        if stream:
51            response = client.completions.create(**kwargs, stream=True)
52        else:
53            response = [client.completions.create(**kwargs, stream=False)]
54
55        for data in response:
56            next_completion = data.completion
57            yield next_completion
58
59
60def num_tokens_from_messages_anthropic(messages: List[Message], model: str) -> int:
61    prompt = make_prompt(messages)
62    client = get_client()
63    return client.count_tokens(prompt)
64
65
66def num_tokens_from_completion_anthropic(message: Message, model: str) -> int:
67    client = get_client()
68    return client.count_tokens(message["content"])

gptcli/config.py:0-45

1import os
2from typing import Dict, List, Optional, TypedDict
3from attr import dataclass
4import yaml
5
6from gptcli.assistant import AssistantConfig
7from gptcli.llama import LLaMAModelConfig
8
9
10CONFIG_FILE_PATHS = [
11    os.path.join(os.path.expanduser("~"), ".config", "gpt-cli", "gpt.yml"),
12    os.path.join(os.path.expanduser("~"), ".gptrc"),
13]
14
15
16@dataclass
17class GptCliConfig:
18    default_assistant: str = "general"
19    markdown: bool = True
20    show_price: bool = True
21    api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
22    openai_api_key: Optional[str] = os.environ.get("OPENAI_API_KEY")
23    anthropic_api_key: Optional[str] = os.environ.get("ANTHROPIC_API_KEY")
24    google_api_key: Optional[str] = os.environ.get("GOOGLE_API_KEY")
25    log_file: Optional[str] = None
26    log_level: str = "INFO"
27    conversations_save_directory: str = os.path.join(os.path.expanduser("~"), "Documents", "gpt-cli", "conversations")
28    conversations_render_directory: str = os.path.join(os.path.expanduser("~"), "develop", "opencsg", "work", "docs", "opencsg", "gpt-generate-md")
29    assistants: Dict[str, AssistantConfig] = {}
30    interactive: Optional[bool] = None
31    llama_models: Optional[Dict[str, LLaMAModelConfig]] = None
32
33
34def choose_config_file(paths: List[str]) -> str:
35    for path in paths:
36        if os.path.isfile(path):
37            return path
38    return ""
39
40
41def read_yaml_config(file_path: str) -> GptCliConfig:
42    with open(file_path, "r") as file:
43        config = yaml.safe_load(file)
44        return GptCliConfig(
45            **config,

pyproject.toml:0-48

1[project]
2name = "gpt-command-line"
3version = "0.1.4"
4description = "Command-line interface for ChatGPT, Claude and Bard"
5authors = [{name = "Val Kharitonov", email = "val@kharvd.com"}]
6readme = "README.md"
7license = {file = "LICENSE"}
8requires-python = ">=3.9"
9keywords = ["cli", "command-line", "assistant", "openai", "claude", "bard", "gpt-3", "gpt-4", "llm", "chatgpt", "gpt-cli", "google-bard", "anthropic", "gpt-client", "anthropic-claude", "palm2"]
10classifiers = [
11    "Development Status :: 4 - Beta",
12    "Environment :: Console",
13    "Intended Audience :: Developers",
14    "Intended Audience :: End Users/Desktop",
15    "Intended Audience :: Science/Research",
16    "License :: OSI Approved :: MIT License",
17    "Topic :: Scientific/Engineering :: Artificial Intelligence",
18]
19dependencies = [
20    "anthropic==0.7.7",
21    "attrs==23.1.0",
22    "black==23.1.0",
23    "google-generativeai==0.1.0",
24    "openai==1.3.8",
25    "prompt-toolkit==3.0.41",
26    "pytest==7.3.1",
27    "PyYAML==6.0",
28    "rich==13.7.0",
29    "tiktoken==0.5.2",
30    "tokenizers==0.15.0",
31    "typing_extensions==4.5.0",
32    "pytz==2023.3.post1",
33    "jinja2==3.1.2",
34]
35
36[project.optional-dependencies]
37llama = [
38    "llama-cpp-python==0.1.57",
39]
40
41[project.urls]
42"Homepage" = "https://github.com/kharvd/gpt-cli"
43
44[project.scripts]
45gpt = "gptcli.gpt:main"
46
47[build-system]
48requires = ["pip>=23.0.0", "setuptools>=58.0.0", "wheel"]

增加支持多模态，有图片识别功能和图片生成功能esperyong/gpt-cmd#2Sign in

Progress

Changed gptcli/config.py in c1dcaad

Modified file with Assistant API

Changed gptcli/openai.py in 8499c44

Modified file with Assistant API

Changed gptcli/anthropic.py in c7c4216

Modified file with Assistant API

Plan

Code Snippets Found

增加支持多模态，有图片识别功能和图片生成功能`esperyong/gpt-cmd#2`

Changed `gptcli/config.py` in `c1dcaad`

Changed `gptcli/openai.py` in `8499c44`

Changed `gptcli/anthropic.py` in `c7c4216`