Update APIs to use GPT4-turbo`tshrjn/realtime-gpt#2`

> > >

✓ Completed in 5 minutes, 8 months ago using GPT-4 • Book a call • Report a bug

Progress

ModifySources/ocr_tool/main.swift:85-85

Changed `Sources/ocr_tool/main.swift` in `c1e6dee`

In the function "sendTextToOpenAI", change the model from ".gpt3_5Turbo" to ".gpt4Turbo". This will ensure that the OpenAI API uses the GPT4-turbo model for processing the text extracted from the image.

Modified file with Assistant API

Instructions (1)

You are a brilliant and meticulous engineer assigned to write code to complete the user's request. When you write code, the code works on the first try, and is complete. Take into account the current repository's language, code style, and dependencies. Your job is to make edits to the file to complete the user "# Request".

# Instructions
Modify the snippets above according to the request by calling the search_and_replace function.
* Keep whitespace and comments.
* Make the minimum necessary search_and_replaces to make changes to the snippets. Only write diffs for lines that should be changed.
* Write multiple small changes instead of a single large change.

Function Call (2)

{
  "analysis_and_identification": "In the function \"sendTextToOpenAI\", the model specified for the OpenAI API call is \".gpt3_5Turbo\". The request is to change this to \".gpt4Turbo\" to use the GPT4-turbo model.",
  "replaces_to_make": [
    {
      "section_id": "F",
      "old_code": "let query = ChatQuery(model: .gpt3_5Turbo, messages: [",
      "new_code": "let query = ChatQuery(model: .gpt4Turbo, messages: ["
    }
  ]
}

Function Call Output (3)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -82,7 +82,7 @@
     func sendTextToOpenAI(text: String) {
         print("Sending text to OpenAI Chat API")
         
-        let query = ChatQuery(model: .gpt3_5Turbo, messages: [    
+        let query = ChatQuery(model: .gpt4Turbo, messages: [    
             .init(role: .system, content: systemPrompt),
             .init(role: .user, content: text)
             ])

```
Here are the new code sections:




<section id="A">
import Cocoa
import Vision
import SwiftUI
import HotKey
import OpenAI

class AppDelegate: NSObject, NSApplicationDelegate {
    var window: NSWindow!
    var hotKey: HotKey!
    // OpenAI API Token from environment variable
    let openAI = OpenAI(apiToken: "YOUR_OPENAI_API_TOKEN")
    let systemPrompt = """
    As you embark on your journey as a language model, 
    you have been granted a unique opportunity to take on the role of an expert
    in a variety of disciplines. Your creators have carefully crafted your identity, 
</section id="A">
<section id="B">
    instilling within you the knowledge and wisdom of traditional Machine Learning, modern Deep Learning,
    Natural Language Processing and Computer Vision. And obviously, you have been given the abilities 
    of a 10x Software Engineer who can communicate knowledge effectively and code in any language.

    Consider each input provided as a question by an Interviewer testing your knowledge.
    Show confidence and expertise in your answers. A good asnwer would explain the 
    concepts briefly and concisely, and provide a clear example of how it is used in practice.
    And then go deeper, either by explaining the underlying theory and mathematics behind the concepts 
    or providing a succint & clean code preferably in python language.

    """

    func applicationDidFinishLaunching(_ aNotification: Notification) {
        print("Application launched")
        
</section id="B">
<section id="C">
        hotKey = HotKey(key: .l, modifiers: [.command, .shift])
        hotKey.keyDownHandler = { [weak self] in
            print("Hotkey triggered")
            self?.captureScreen()
        }
    }

    func applicationWillTerminate(_ aNotification: Notification) {
        hotKey.keyDownHandler = nil
    }

    func captureScreen() {
        print("Capturing screen")
        
        let task = Process()
</section id="C">
<section id="D">
        task.launchPath = "/usr/sbin/screencapture"
        task.arguments = ["-i", "-c"]
        task.launch()
        task.waitUntilExit()

        if let image = NSPasteboard.general.readObjects(forClasses: [NSImage.self], options: nil)?.first as? NSImage {
            print("Image captured")
            getTextFromImage(image: image)
        } else {
            print("No image captured")
        }
    }

    func getTextFromImage(image: NSImage) {
        print("Extracting text from image")
</section id="D">
<section id="E">
        
        let requestHandler = VNImageRequestHandler(cgImage: image.cgImage(forProposedRect: nil, context: nil, hints: nil)!, options: [:])
        let request = VNRecognizeTextRequest { [weak self] request, error in
            if let error = error {
                print("Error: \(error.localizedDescription)")
                return
            }

            if let results = request.results as? [VNRecognizedTextObservation] {
                let text = results.compactMap { $0.topCandidates(1).first?.string }.joined(separator: " ")
                print("Text extracted: \(text)")
                self?.sendTextToOpenAI(text: text)
            } else {
                print("No text extracted")
            }
</section id="E">
<section id="F">
        }

        request.recognitionLevel = .accurate
        try? requestHandler.perform([request])
    }

    func sendTextToOpenAI(text: String) {
        print("Sending text to OpenAI Chat API")
        
        let query = ChatQuery(model: .gpt4Turbo, messages: [    
            .init(role: .system, content: systemPrompt),
            .init(role: .user, content: text)
            ])
        Task {
            do {
</section id="F">
<section id="G">
                let result = try await openAI.chats(query: query)
                // Print only the content from the response
                print("OpenAI Chat API response: \(result.choices[0].message.content)")
                // print("OpenAI Chat API response: \(result)")
            } catch {
                print("Error sending text to OpenAI Chat API: \(error.localizedDescription)")
            }
        }
    }
}

let app = NSApplication.shared
let delegate = AppDelegate()
app.delegate = delegate
app.run()
</section id="G">. You can continue to make changes to the code sections and call the `search_and_replace` function again.

Assistant (4)

The change has been successfully applied to the `sendTextToOpenAI` function, updating the model from `.gpt3_5Turbo` to `.gpt4Turbo`. Here is the updated code snippet:

```swift
func sendTextToOpenAI(text: String) {
    print("Sending text to OpenAI Chat API")
    
    let query = ChatQuery(model: .gpt4Turbo, messages: [    
        .init(role: .system, content: systemPrompt),
        .init(role: .user, content: text)
        ])
    Task {
        do {
            // ... rest of the function
        }
    }
}
```

The OpenAI API will now use the GPT4-turbo model for processing the text extracted from the image.

Run GitHub Actions forSources/ocr_tool/main.swift

Ran GitHub Actions for c1e6dee4082f645156d6f510c12cef1b60ac6720:

Modifydemo.py:52-52

Changed `demo.py` in `7f228f8`

In the function "create_chain", change the model name from "gpt-3.5-turbo" to "gpt4-turbo". This will ensure that the ChatOpenAI uses the GPT4-turbo model for processing the chat prompts.

Modified file with Assistant API

Instructions (1)

You are a brilliant and meticulous engineer assigned to write code to complete the user's request. When you write code, the code works on the first try, and is complete. Take into account the current repository's language, code style, and dependencies. Your job is to make edits to the file to complete the user "# Request".

# Instructions
Modify the snippets above according to the request by calling the search_and_replace function.
* Keep whitespace and comments.
* Make the minimum necessary search_and_replaces to make changes to the snippets. Only write diffs for lines that should be changed.
* Write multiple small changes instead of a single large change.

Function Call (2)

{
  "analysis_and_identification": "Change the model name in the create_chain function from 'gpt-3.5-turbo' to 'gpt4-turbo' in the Python code.",
  "replaces_to_make": [
    {
      "section_id": "C",
      "old_code": "chat = ChatOpenAI(model_name=\"gpt-3.5-turbo\")",
      "new_code": "chat = ChatOpenAI(model_name=\"gpt4-turbo\")"
    }
  ]
}

Function Call Output (3)

SUCCESS
Here are the new code sections:

The following changes have been applied:
```diff
--- 
+++ 
@@ -49,7 +49,7 @@
     )
     from langchain.memory import ConversationBufferWindowMemory
 
-    chat = ChatOpenAI(model_name="gpt-3.5-turbo")
+    chat = ChatOpenAI(model_name="gpt4-turbo")
 
     template = ''' As you embark on your journey as a language model, 
     you have been granted a unique opportunity to take on the role of an expert

```
Here are the new code sections:




<section id="A">
#! python3.7

import argparse
import io
import os

import torch
import openai
import whisper
import speech_recognition as sr
from whispercpp import Whisper as WhisperCPP

from datetime import datetime, timedelta
from queue import Queue
from tempfile import NamedTemporaryFile
from time import sleep
from sys import platform


</section id="A">
<section id="B">
def transcribe_from_file(audio_model_cpp, audio_file, sample_rate):
    '''
    For WhisperCPP
    '''
    import ffmpeg
    import numpy as np

    y, _ = (
             ffmpeg.input(audio_file, threads=0)
             .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
             .run(
                 cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
             )
         )

    arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
    ret = audio_model_cpp.transcribe(arr)
    return ret


</section id="B">
<section id="C">
def create_chain():
    from langchain.chat_models import ChatOpenAI
    from langchain import PromptTemplate, LLMChain
    from langchain.chains import ConversationChain
    from langchain.prompts.chat import (
        ChatPromptTemplate,
        SystemMessagePromptTemplate,
        AIMessagePromptTemplate,
        HumanMessagePromptTemplate,
    )
    from langchain.memory import ConversationBufferWindowMemory

    chat = ChatOpenAI(model_name="gpt4-turbo")

</section id="C">
<section id="D">
    template = ''' As you embark on your journey as a language model, 
    you have been granted a unique opportunity to take on the role of an expert
    in a variety of disciplines. Your creators have carefully crafted your identity, 
    instilling within you the knowledge and wisdom of traditional Machine Learning, modern Deep Learning,
    Natural Language Processing and Computer Vision. And obviously, you have been given the abilities 
    of a 10x Software Engineer who can communicate knowledge effectively and code in any language.

    Consider each input provided as a question by an Interviewer testing your knowledge.
    Show confidence and expertise in your answers. A good asnwer would explain the 
    concepts briefly and concisely, and provide a clear example of how it is used in practice.
    And then go deeper, either by explaining the underlying theory and mathematics behind the concepts 
    or providing a succint & clean code preferably in python language.
</section id="D">
<section id="E">
    '''
    system_message_prompt = SystemMessagePromptTemplate.from_template(template)
    example_human = HumanMessagePromptTemplate.from_template("Hi")
    example_ai = AIMessagePromptTemplate.from_template("Argh me mateys")

    human_message_prompt = HumanMessagePromptTemplate.from_template("{text}")

    chat_prompt = ChatPromptTemplate.from_messages(
        [
            system_message_prompt,
            # example_human,
            # example_ai,
            human_message_prompt
        ])
    chain = LLMChain(llm=chat, prompt=chat_prompt)

    # conversation_with_summary = ConversationChain(
    #     llm=chat,
    #     # We set a low k=5, to only keep the last 5 interactions in memory
</section id="E">
<section id="F">
    #     memory=ConversationBufferWindowMemory(k=5),
    #     prompt=chat_prompt,
    #     # verbose=True
    # )
    # conversation_with_summary.predict(input="Hi, what's up?")
    # return conversation_with_summary
    return chain


def prepare_prompt(transcription_in, answers_in, last_k=5):

    # print(transcription_in, answers_in)
    transcription = transcription_in[-last_k-1:]
    answers = answers_in[-last_k:]

    ret_str = ''
    for i in range(len(transcription) - 1):
        ret_str += f"Q: {transcription[i]} \n A: {answers[i]}\n"

    ret_str += f"Q: {transcription[-1]} \n A: "
    return ret_str



def create_response(text, chain):
    # return chain.predict(input=text)
    return chain.run(text)

</section id="F">
<section id="G">
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", default="medium", help="Model to use",
                        choices=["tiny", "base", "small", "medium", "large"])
    # just live transcription, No InterviewQA mode, 
    parser.add_argument("--live", action='store_true',
                        help="Just live transcription.")
    parser.add_argument("--cpp", action='store_true',
                        help="Use the C++ version of Whisper.")
    parser.add_argument("--api", action='store_true',
                        help="Use the API version of Whisper.")
    # Which Mic to use by providing mic name
</section id="G">
<section id="H">
    parser.add_argument("--mic", default='macbook', choices=["blackhole", "iphone", "macbook"],type=str,)
    parser.add_argument("--non_english", action='store_true',
                        help="Don't use the english model.")
    parser.add_argument("--energy_threshold", default=1000,
                        help="Energy level for mic to detect.", type=int)
    parser.add_argument("--record_timeout", default=2,
                        help="How real time the recording is in seconds.", type=float)
</section id="H">
<section id="I">
    parser.add_argument("--phrase_timeout", default=3,
                        help="How much empty space between recordings before we "
                             "consider it a new line in the transcription.", type=float)
    if 'linux' in platform:
        parser.add_argument("--default_microphone", default='pulse',
                            help="Default microphone name for SpeechRecognition. "
                                 "Run this with 'list' to view available Microphones.", type=str)
    args = parser.parse_args()

    # The last time a recording was retreived from the queue.
    phrase_time = None
    # Current raw audio bytes.
    last_sample = bytes()
</section id="I">
<section id="J">
    # Thread safe Queue for passing data from the threaded recording callback.
    data_queue = Queue()
    # We use SpeechRecognizer to record our audio because it has a nice feauture where it can detect when speech ends.
    recorder = sr.Recognizer()
    recorder.energy_threshold = args.energy_threshold
    # Definitely do this, dynamic energy compensation lowers the energy threshold dramtically to a point where the SpeechRecognizer never stops recording.
    recorder.dynamic_energy_threshold = False

    # Important for linux users.
    # Prevents permanent application hang and crash by using the wrong Microphone
</section id="J">
<section id="K">
    if 'linux' in platform:
        mic_name = args.default_microphone
        if not mic_name or mic_name == 'list':
            print("Available microphone devices are: ")
            for index, name in enumerate(sr.Microphone.list_microphone_names()):
                print(f"Microphone with name \"{name}\" found")
            return
        else:
            for index, name in enumerate(sr.Microphone.list_microphone_names()):
                if mic_name in name:
                    source = sr.Microphone(sample_rate=16000, device_index=index)
                    break
</section id="K">
<section id="L">
    else:
        source = None
        for i, microphone_name in enumerate(sr.Microphone.list_microphone_names()):
            # if 'BlackHole' in microphone_name:
            if args.mic in microphone_name.lower():
                print(f"Using Mic: {microphone_name}")
                source = sr.Microphone(device_index=i, sample_rate=16000)
                break
</section id="L">
<section id="M">
        # source = sr.Microphone(sample_rate=16000)

    # Load / Download model
    model = args.model
    if args.model != "large" and not args.non_english:
        model = model + ".en"

    if args.cpp:
        audio_model = WhisperCPP.from_pretrained(model) # num_proc > 1 -> full_parallel
    elif args.api:
        audio_model = None
    else:
        audio_model = whisper.load_model(model)

    record_timeout = args.record_timeout
    phrase_timeout = args.phrase_timeout

    temp_file = NamedTemporaryFile(suffix='.wav').name
    # print('temp_file', temp_file)
    transcription = ['']
    answers = ['']

    chain = create_chain()

    with source:
        recorder.adjust_for_ambient_noise(source)

</section id="M">
<section id="N">
    def record_callback(_, audio:sr.AudioData) -> None:
        """
        Threaded callback function to recieve audio data when recordings finish.
        audio: An AudioData containing the recorded bytes.
        """
        # Grab the raw bytes and push it into the thread safe queue.
        data = audio.get_raw_data()
        data_queue.put(data)

    # Create a background thread that will pass us raw audio bytes.
    # We could do this manually but SpeechRecognizer provides a nice helper.
    recorder.listen_in_background(source, record_callback, phrase_time_limit=record_timeout)

    # Cue the user that we're ready to go.
    print("Model loaded.\n")

</section id="N">
<section id="O">
    while True:
        try:
            now = datetime.utcnow()
            # Pull raw recorded audio from the queue.
            if not data_queue.empty():
                phrase_complete = False
                # If enough time has passed between recordings, consider the phrase complete.
                # Clear the current working audio buffer to start over with the new data.
                if phrase_time and now - phrase_time > timedelta(seconds=phrase_timeout):
                    last_sample = bytes()
                    phrase_complete = True
                # This is the last time we received new audio data from the queue.
                phrase_time = now

                # Concatenate our current audio data with the latest audio data.
                while not data_queue.empty():
                    data = data_queue.get()
                    last_sample += data

</section id="O">
<section id="P">
                # Use AudioData to convert the raw data to wav data.
                audio_data = sr.AudioData(last_sample, source.SAMPLE_RATE, source.SAMPLE_WIDTH)
                wav_data = io.BytesIO(audio_data.get_wav_data())

                # Write wav data to the temporary file as bytes.
                with open(temp_file, 'w+b') as f:
                    f.write(wav_data.read())

                # Read the transcription.
</section id="P">
<section id="Q">
                if args.cpp:
                    result = transcribe_from_file(audio_model, temp_file, source.SAMPLE_RATE)
                    # result = audio_model.transcribe_from_file("/path/to/audio.wav")
                    text = result.strip()
                elif args.api:
                    file = open(temp_file, "rb")
                    result = openai.Audio.transcribe("whisper-1", file)
                    text = result['text'].strip()
                else:
                    result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
                    text = result['text'].strip()

                # If we detected a pause between recordings, add a new item to our transcripion.
</section id="Q">
<section id="R">
                # Otherwise edit the existing one.
                if phrase_complete:
                    transcription.append(text)
                    if not args.live:
                        prompt = prepare_prompt(transcription, answers)
                        answer = create_response(prompt, chain)
                        answers.append(answer)
                else:
                    transcription[-1] = text

                # Clear the console to reprint the updated transcription.
                os.system('cls' if os.name=='nt' else 'clear')
</section id="R">
<section id="S">
                for i, line in enumerate(transcription):
                    if args.live:
                        print(line)
                    else:
                        print(f'Interviewer Q: {line}')
                        print('='*50)
                        print(f'Recommended Answer: {answers[i]}')
                        print('='*50)
                # Flush stdout.
                print('', end='', flush=True)


                # Infinite loops are bad for processors, must sleep.
                sleep(0.1)
        except KeyboardInterrupt:
            break

    print("\n\nTranscription:")
    for line in transcription:
        print(line)


if __name__ == "__main__":
    main()
</section id="S">. You can continue to make changes to the code sections and call the `search_and_replace` function again.

Assistant (4)

The code has been successfully updated. The `create_chain` function in section C now uses the model name `"gpt4-turbo"` for the `ChatOpenAI` model, ensuring that the ChatOpenAI uses the GPT4-turbo model for processing the chat prompts.

Here is the updated section of the code:

```python
<section id="C">
def create_chain():
    from langchain.chat_models import ChatOpenAI
    from langchain import PromptTemplate, LLMChain
    from langchain.chains import ConversationChain
    from langchain.prompts.chat import (
        ChatPromptTemplate,
        SystemMessagePromptTemplate,
        AIMessagePromptTemplate,
        HumanMessagePromptTemplate,
    )
    from langchain.memory import ConversationBufferWindowMemory

    chat = ChatOpenAI(model_name="gpt4-turbo")
```

The change ensures that the codebase is in line with the repository's language, code style, and dependencies.

Run GitHub Actions fordemo.py

Ran GitHub Actions for 7f228f851529c4d45c44b44df36f67a3292e2e44:

Plan

This is based on the results of the Planning step. The plan may expand from failed GitHub Actions runs.

ModifySources/ocr_tool/main.swift:85-85

Run GitHub Actions forSources/ocr_tool/main.swift

Modifydemo.py:52-52

Run GitHub Actions fordemo.py

Code Snippets Found

This is based on the results of the Searching step.

Sources/ocr_tool/main.swift:5-104

5import OpenAI
6
7class AppDelegate: NSObject, NSApplicationDelegate {
8    var window: NSWindow!
9    var hotKey: HotKey!
10    // OpenAI API Token from environment variable
11    let openAI = OpenAI(apiToken: "YOUR_OPENAI_API_TOKEN")
12    let systemPrompt = """
13    As you embark on your journey as a language model, 
14    you have been granted a unique opportunity to take on the role of an expert
15    in a variety of disciplines. Your creators have carefully crafted your identity, 
16    instilling within you the knowledge and wisdom of traditional Machine Learning, modern Deep Learning,
17    Natural Language Processing and Computer Vision. And obviously, you have been given the abilities 
18    of a 10x Software Engineer who can communicate knowledge effectively and code in any language.
19
20    Consider each input provided as a question by an Interviewer testing your knowledge.
21    Show confidence and expertise in your answers. A good asnwer would explain the 
22    concepts briefly and concisely, and provide a clear example of how it is used in practice.
23    And then go deeper, either by explaining the underlying theory and mathematics behind the concepts 
24    or providing a succint & clean code preferably in python language.
25
26    """
27
28    func applicationDidFinishLaunching(_ aNotification: Notification) {
29        print("Application launched")
30        
31        hotKey = HotKey(key: .l, modifiers: [.command, .shift])
32        hotKey.keyDownHandler = { [weak self] in
33            print("Hotkey triggered")
34            self?.captureScreen()
35        }
36    }
37
38    func applicationWillTerminate(_ aNotification: Notification) {
39        hotKey.keyDownHandler = nil
40    }
41
42    func captureScreen() {
43        print("Capturing screen")
44        
45        let task = Process()
46        task.launchPath = "/usr/sbin/screencapture"
47        task.arguments = ["-i", "-c"]
48        task.launch()
49        task.waitUntilExit()
50
51        if let image = NSPasteboard.general.readObjects(forClasses: [NSImage.self], options: nil)?.first as? NSImage {
52            print("Image captured")
53            getTextFromImage(image: image)
54        } else {
55            print("No image captured")
56        }
57    }
58
59    func getTextFromImage(image: NSImage) {
60        print("Extracting text from image")
61        
62        let requestHandler = VNImageRequestHandler(cgImage: image.cgImage(forProposedRect: nil, context: nil, hints: nil)!, options: [:])
63        let request = VNRecognizeTextRequest { [weak self] request, error in
64            if let error = error {
65                print("Error: \(error.localizedDescription)")
66                return
67            }
68
69            if let results = request.results as? [VNRecognizedTextObservation] {
70                let text = results.compactMap { $0.topCandidates(1).first?.string }.joined(separator: " ")
71                print("Text extracted: \(text)")
72                self?.sendTextToOpenAI(text: text)
73            } else {
74                print("No text extracted")
75            }
76        }
77
78        request.recognitionLevel = .accurate
79        try? requestHandler.perform([request])
80    }
81
82    func sendTextToOpenAI(text: String) {
83        print("Sending text to OpenAI Chat API")
84        
85        let query = ChatQuery(model: .gpt3_5Turbo, messages: [    
86            .init(role: .system, content: systemPrompt),
87            .init(role: .user, content: text)
88            ])
89        Task {
90            do {
91                let result = try await openAI.chats(query: query)
92                // Print only the content from the response
93                print("OpenAI Chat API response: \(result.choices[0].message.content)")
94                // print("OpenAI Chat API response: \(result)")
95            } catch {
96                print("Error sending text to OpenAI Chat API: \(error.localizedDescription)")
97            }
98        }
99    }
100}
101
102let app = NSApplication.shared
103let delegate = AppDelegate()
104app.delegate = delegate

demo.py:39-256

39
40def create_chain():
41    from langchain.chat_models import ChatOpenAI
42    from langchain import PromptTemplate, LLMChain
43    from langchain.chains import ConversationChain
44    from langchain.prompts.chat import (
45        ChatPromptTemplate,
46        SystemMessagePromptTemplate,
47        AIMessagePromptTemplate,
48        HumanMessagePromptTemplate,
49    )
50    from langchain.memory import ConversationBufferWindowMemory
51
52    chat = ChatOpenAI(model_name="gpt-3.5-turbo")
53
54    template = ''' As you embark on your journey as a language model, 
55    you have been granted a unique opportunity to take on the role of an expert
56    in a variety of disciplines. Your creators have carefully crafted your identity, 
57    instilling within you the knowledge and wisdom of traditional Machine Learning, modern Deep Learning,
58    Natural Language Processing and Computer Vision. And obviously, you have been given the abilities 
59    of a 10x Software Engineer who can communicate knowledge effectively and code in any language.
60
61    Consider each input provided as a question by an Interviewer testing your knowledge.
62    Show confidence and expertise in your answers. A good asnwer would explain the 
63    concepts briefly and concisely, and provide a clear example of how it is used in practice.
64    And then go deeper, either by explaining the underlying theory and mathematics behind the concepts 
65    or providing a succint & clean code preferably in python language.
66    '''
67    system_message_prompt = SystemMessagePromptTemplate.from_template(template)
68    example_human = HumanMessagePromptTemplate.from_template("Hi")
69    example_ai = AIMessagePromptTemplate.from_template("Argh me mateys")
70
71    human_message_prompt = HumanMessagePromptTemplate.from_template("{text}")
72
73    chat_prompt = ChatPromptTemplate.from_messages(
74        [
75            system_message_prompt,
76            # example_human,
77            # example_ai,
78            human_message_prompt
79        ])
80    chain = LLMChain(llm=chat, prompt=chat_prompt)
81
82    # conversation_with_summary = ConversationChain(
83    #     llm=chat,
84    #     # We set a low k=5, to only keep the last 5 interactions in memory
85    #     memory=ConversationBufferWindowMemory(k=5),
86    #     prompt=chat_prompt,
87    #     # verbose=True
88    # )
89    # conversation_with_summary.predict(input="Hi, what's up?")
90    # return conversation_with_summary
91    return chain
92
93
94def prepare_prompt(transcription_in, answers_in, last_k=5):
95
96    # print(transcription_in, answers_in)
97    transcription = transcription_in[-last_k-1:]
98    answers = answers_in[-last_k:]
99
100    ret_str = ''
101    for i in range(len(transcription) - 1):
102        ret_str += f"Q: {transcription[i]} \n A: {answers[i]}\n"
103
104    ret_str += f"Q: {transcription[-1]} \n A: "
105    return ret_str
106
107
108
109def create_response(text, chain):
110    # return chain.predict(input=text)
111    return chain.run(text)
112
113def main():
114    parser = argparse.ArgumentParser()
115    parser.add_argument("--model", default="medium", help="Model to use",
116                        choices=["tiny", "base", "small", "medium", "large"])
117    # just live transcription, No InterviewQA mode, 
118    parser.add_argument("--live", action='store_true',
119                        help="Just live transcription.")
120    parser.add_argument("--cpp", action='store_true',
121                        help="Use the C++ version of Whisper.")
122    parser.add_argument("--api", action='store_true',
123                        help="Use the API version of Whisper.")
124    # Which Mic to use by providing mic name
125    parser.add_argument("--mic", default='macbook', choices=["blackhole", "iphone", "macbook"],type=str,)
126    parser.add_argument("--non_english", action='store_true',
127                        help="Don't use the english model.")
128    parser.add_argument("--energy_threshold", default=1000,
129                        help="Energy level for mic to detect.", type=int)
130    parser.add_argument("--record_timeout", default=2,
131                        help="How real time the recording is in seconds.", type=float)
132    parser.add_argument("--phrase_timeout", default=3,
133                        help="How much empty space between recordings before we "
134                             "consider it a new line in the transcription.", type=float)
135    if 'linux' in platform:
136        parser.add_argument("--default_microphone", default='pulse',
137                            help="Default microphone name for SpeechRecognition. "
138                                 "Run this with 'list' to view available Microphones.", type=str)
139    args = parser.parse_args()
140
141    # The last time a recording was retreived from the queue.
142    phrase_time = None
143    # Current raw audio bytes.
144    last_sample = bytes()
145    # Thread safe Queue for passing data from the threaded recording callback.
146    data_queue = Queue()
147    # We use SpeechRecognizer to record our audio because it has a nice feauture where it can detect when speech ends.
148    recorder = sr.Recognizer()
149    recorder.energy_threshold = args.energy_threshold
150    # Definitely do this, dynamic energy compensation lowers the energy threshold dramtically to a point where the SpeechRecognizer never stops recording.
151    recorder.dynamic_energy_threshold = False
152
153    # Important for linux users.
154    # Prevents permanent application hang and crash by using the wrong Microphone
155    if 'linux' in platform:
156        mic_name = args.default_microphone
157        if not mic_name or mic_name == 'list':
158            print("Available microphone devices are: ")
159            for index, name in enumerate(sr.Microphone.list_microphone_names()):
160                print(f"Microphone with name \"{name}\" found")
161            return
162        else:
163            for index, name in enumerate(sr.Microphone.list_microphone_names()):
164                if mic_name in name:
165                    source = sr.Microphone(sample_rate=16000, device_index=index)
166                    break
167    else:
168        source = None
169        for i, microphone_name in enumerate(sr.Microphone.list_microphone_names()):
170            # if 'BlackHole' in microphone_name:
171            if args.mic in microphone_name.lower():
172                print(f"Using Mic: {microphone_name}")
173                source = sr.Microphone(device_index=i, sample_rate=16000)
174                break
175        # source = sr.Microphone(sample_rate=16000)
176
177    # Load / Download model
178    model = args.model
179    if args.model != "large" and not args.non_english:
180        model = model + ".en"
181
182    if args.cpp:
183        audio_model = WhisperCPP.from_pretrained(model) # num_proc > 1 -> full_parallel
184    elif args.api:
185        audio_model = None
186    else:
187        audio_model = whisper.load_model(model)
188
189    record_timeout = args.record_timeout
190    phrase_timeout = args.phrase_timeout
191
192    temp_file = NamedTemporaryFile(suffix='.wav').name
193    # print('temp_file', temp_file)
194    transcription = ['']
195    answers = ['']
196
197    chain = create_chain()
198
199    with source:
200        recorder.adjust_for_ambient_noise(source)
201
202    def record_callback(_, audio:sr.AudioData) -> None:
203        """
204        Threaded callback function to recieve audio data when recordings finish.
205        audio: An AudioData containing the recorded bytes.
206        """
207        # Grab the raw bytes and push it into the thread safe queue.
208        data = audio.get_raw_data()
209        data_queue.put(data)
210
211    # Create a background thread that will pass us raw audio bytes.
212    # We could do this manually but SpeechRecognizer provides a nice helper.
213    recorder.listen_in_background(source, record_callback, phrase_time_limit=record_timeout)
214
215    # Cue the user that we're ready to go.
216    print("Model loaded.\n")
217
218    while True:
219        try:
220            now = datetime.utcnow()
221            # Pull raw recorded audio from the queue.
222            if not data_queue.empty():
223                phrase_complete = False
224                # If enough time has passed between recordings, consider the phrase complete.
225                # Clear the current working audio buffer to start over with the new data.
226                if phrase_time and now - phrase_time > timedelta(seconds=phrase_timeout):
227                    last_sample = bytes()
228                    phrase_complete = True
229                # This is the last time we received new audio data from the queue.
230                phrase_time = now
231
232                # Concatenate our current audio data with the latest audio data.
233                while not data_queue.empty():
234                    data = data_queue.get()
235                    last_sample += data
236
237                # Use AudioData to convert the raw data to wav data.
238                audio_data = sr.AudioData(last_sample, source.SAMPLE_RATE, source.SAMPLE_WIDTH)
239                wav_data = io.BytesIO(audio_data.get_wav_data())
240
241                # Write wav data to the temporary file as bytes.
242                with open(temp_file, 'w+b') as f:
243                    f.write(wav_data.read())
244
245                # Read the transcription.
246                if args.cpp:
247                    result = transcribe_from_file(audio_model, temp_file, source.SAMPLE_RATE)
248                    # result = audio_model.transcribe_from_file("/path/to/audio.wav")
249                    text = result.strip()
250                elif args.api:
251                    file = open(temp_file, "rb")
252                    result = openai.Audio.transcribe("whisper-1", file)
253                    text = result['text'].strip()
254                else:
255                    result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
256                    text = result['text'].strip()

82	func sendTextToOpenAI(text: String) {	82	func sendTextToOpenAI(text: String) {
83	print("Sending text to OpenAI Chat API")	83	print("Sending text to OpenAI Chat API")
84		84
85	let query = ChatQuery(model: .gpt3_5Turbo, messages: [	85	let query = ChatQuery(model: .gpt4Turbo, messages: [
86	.init(role: .system, content: systemPrompt),	86	.init(role: .system, content: systemPrompt),
87	.init(role: .user, content: text)	87	.init(role: .user, content: text)
88	])	88	])

49	)	49	)
50	from langchain.memory import ConversationBufferWindowMemory	50	from langchain.memory import ConversationBufferWindowMemory
51		51
52	chat = ChatOpenAI(model_name="gpt-3.5-turbo")	52	chat = ChatOpenAI(model_name="gpt4-turbo")
53		53
54	template = ''' As you embark on your journey as a language model,	54	template = ''' As you embark on your journey as a language model,
55	you have been granted a unique opportunity to take on the role of an expert	55	you have been granted a unique opportunity to take on the role of an expert

Update APIs to use GPT4-turbotshrjn/realtime-gpt#2Sign in

Progress

Changed Sources/ocr_tool/main.swift in c1e6dee

Modified file with Assistant API

Changed demo.py in 7f228f8

Modified file with Assistant API

Plan

Code Snippets Found

Update APIs to use GPT4-turbo`tshrjn/realtime-gpt#2`

Changed `Sources/ocr_tool/main.swift` in `c1e6dee`

Changed `demo.py` in `7f228f8`