Add tests for context agent`sweepai/sweep#3491`

> > >

✓ Completed in 54.20238757133484 seconds, 4 months ago using GPT-4 • Book a call • Report a bug

Progress

Createtests/test_context_pruning.py522afed

❌ Unable to modify files in tests Edit sweep.yaml to configure.

Modifysweepai/core/context_pruning.py 522afed

1from copy import deepcopy
2from itertools import zip_longest
3from math import log
4import os
5import subprocess
6import urllib
7from dataclasses import dataclass, field
8
9import networkx as nx
10import openai
11from loguru import logger
12from openai.types.beta.thread import Thread
13from openai.types.beta.threads.run import Run
14
15from sweepai.config.client import SweepConfig
16from sweepai.core.chat import ChatGPT
17from sweepai.core.entities import Message, Snippet
18from sweepai.logn.cache import file_cache
19from sweepai.utils.chat_logger import ChatLogger
20from sweepai.utils.convert_openai_anthropic import AnthropicFunctionCall, mock_function_calls_to_string
21from sweepai.utils.github_utils import ClonedRepo
22from sweepai.utils.modify_utils import post_process_rg_output
23from sweepai.utils.openai_listwise_reranker import listwise_rerank_snippets
24from sweepai.utils.progress import AssistantConversation, TicketProgress
25from sweepai.utils.tree_utils import DirectoryTree
26
27ASSISTANT_MAX_CHARS = 4096 * 4 * 0.95  # ~95% of 4k tokens
28NUM_SNIPPETS_TO_SHOW_AT_START = 15
29MAX_REFLECTIONS = 2
30MAX_ITERATIONS = 30 # Tuned to 30 because haiku is cheap
31NUM_ROLLOUTS = 2 # dev speed
32SCORE_THRESHOLD = 8 # good score
33STOP_AFTER_SCORE_THRESHOLD_IDX = 0 # stop after the first good score and past this index
34MAX_PARALLEL_FUNCTION_CALLS = 1
35NUM_BAD_FUNCTION_CALLS = 4
36
37# TODO:
38# - Add self-evaluation / chain-of-verification
39# - Add list of tricks for finding definitions
40
41anthropic_function_calls = """<tool_name>view_file</tool_name>
42<description>
43Retrieves the contents of the specified file. After viewing a file, use `code_search` on relevant entities to find other potentially relevant files. Use `store_file` to add the file to the final list if it provides important context or may need modifications to solve the issue.
44</description>
45<parameters>
46<parameter>
47<name>file_path</name>
48<type>string</type>
49<description>The path of the file to view.</description>
50</parameter>
51<parameter>
52<name>justification</name>
53<type>string</type>
54<description>Explain why viewing this file is necessary to solve the issue.</description>
55</parameter>
56</parameters>
57</tool_description>
58
59<tool_description>
60<tool_name>store_file</tool_name>
61<description>
62Adds a file to the final list of relevant files that provide important context or may need modifications to resolve the issue. Err on the side of including a file if you're unsure about its relevance.
63</description>
64<parameters>
65<parameter>
66<name>file_path</name>
67<type>string</type>
68<description>The path of the file to store.</description>
69</parameter>
70<parameter>
71<name>justification</name>
72<type>string</type>
73<description>Explain why this file should be read for context or modified and what needs to be modified. Include a supporting code excerpt.</description>
74</parameter>
75</parameters>
76</tool_description>
77
78<tool_description>
79<tool_name>code_search</tool_name>  
80<description>
81Passes the code_entity into ripgrep to search the entire codebase and return a list of files and line numbers where it appears. Useful for finding definitions and usages of types, classes and functions that may be relevant. Review the search results using `view_file` to determine relevance.
82</description>
83<parameters>
84<parameter>
85<name>code_entity</name>
86<type>string</type>
87<description>The code entity to search for. Should be a distinctive name, not a generic term. For functions, search for the definition syntax, e.g. 'def foo' in Python or 'function bar' or 'const bar' in JavaScript.</description>
88</parameter>
89<parameter>
90<name>justification</name>
91<type>string</type>
92<description>Explain what information you expect to get from this search and why it's needed, e.g. "I need to find the definition of the Foo class to see what methods are available on Foo objects."</description>
93</parameter>
94</parameters>
95</tool_description>
96
97<tool_description>
98<tool_name>submit</tool_name>
99<description>
100Submits the final list consisting of all files that were stored using the `store_file` tool. Only call this tool once when you are absolutely certain you have stored all potentially relevant files. The submitted list should err on the side of including extra files to ensure high recall of relevant ones.
101</description>
102<parameters>
103<parameter>
104<name>result</name>
105<type>string</type>
106<description>A simple string stating that you have finished storing relevant files and are submitting the list of stored files as the final result.</description>
107</parameter>
108</parameters>
109</tool_description>
110
111You must call the tools using the specified XML format. Here are some generic examples to illustrate the format without referring to a specific task:
112
113<examples>
114Example 1:
115<function_call>
116<invoke>
117<tool_name>view_file</tool_name>
118<parameters>
119<file_path>services/user_service.py</file_path>
120<justification>The user request mentions the get_user_by_id method in the UserService class. I need to view user_service.py to understand how this method currently works and what files it interacts with.</justification>
121</parameters>
122</invoke>
123</function_call>
124
125Example 2:
126<function_call>
127<invoke>
128<tool_name>store_file</tool_name>
129<parameters>
130<file_path>models/user.py</file_path>
131<justification>The User model is relevant for understanding the attributes of a User, especially the `deleted` flag that indicates if a user is soft-deleted. This excerpt shows the key parts of the model:
132```python
133class User(db.Model):
134    id = db.Column(db.Integer, primary_key=True)
135    name = db.Column(db.String(100))  
136    email = db.Column(db.String(100), unique=True)
137    deleted = db.Column(db.Boolean, default=False)
138```
139</justification>
140</parameters>
141</invoke>
142<invoke>
143<tool_name>code_search</tool_name>
144<parameters>
145<code_entity>def get_user_by_id</code_entity>
146<justification>I need to find the definition of the get_user_by_id method to see its current implementation and determine what changes are needed to support excluding deleted users.</justification>
147</parameters>
148</invoke>
149</function_call>
150</examples>
151
152You must call the tools using the specified XML format, as illustrated in the previous examples. Focus on identifying all files that provide important context or may require modifications to fully resolve the issue at hand. Prioritize recall over precision."""
153
154sys_prompt = """You are a brilliant engineer assigned to solve the following GitHub issue. Your task is to generate a complete list of all files that are relevant to fully resolving the issue. A file is considered RELEVANT if it must be either modified or read to understand the necessary changes as part of the issue resolution process. 
155
156It is critical that you identify every relevant file, even if you are unsure whether it needs to be modified. Your goal is to generate an extremely comprehensive list of files for an intern who is unfamiliar with the codebase. Precision is less important than recall - it's better to include a few extra files than to miss a relevant one.
157
158You will do this by searching for and viewing files in the codebase to gather all the necessary information. 
159
160INSTRUCTIONS
161Use the following iterative process:
1621. View all files that seem potentially relevant based on file paths and entities mentioned in the "User Request" and "Relevant Snippets". For example, if the class foo.bar.Bar is referenced, be sure to view foo/bar.py. Check all files referenced in the user request. If you can't find a specific module, also check the "Common modules" section. When you identify a relevant file, use store_file to add it to the final list.
1632. Use code_search to find definitions and usages for ALL unknown variables, classes, attributes, and functions that may be relevant. View the search result files.
164
165Repeat steps 1-2, searching exhaustively, until you are fully confident you have stored all files that provide necessary context or may need modifications.
166
1674. Submit the final list of relevant files with the submit function.
168
169Here are the tools at your disposal. Call them until you have stored all relevant files:
170
171""" + anthropic_function_calls
172
173unformatted_user_prompt = """\
174## Relevant Snippets
175Here are potentially relevant snippets in the repo in decreasing relevance that you should use the `view_file` tool to review:
176{snippets_in_repo}
177
178## Code files mentioned in the user request
179Here are the code files mentioned in the user request, these code files are very important to the solution and should be considered very relevant:
180<code_files_in_query>
181{file_paths_in_query}
182</code_files_in_query>
183{import_tree_prompt}
184## User Request
185<user_request>
186{query}
187<user_request>"""
188
189unformatted_user_prompt_stored = """\
190## Stored Files
191Here are the files that you have already stored:
192{snippets_in_repo}
193{import_tree_prompt}
194## User Request
195<user_request>
196{query}
197<user_request>"""
198
199
200PLAN_SUBMITTED_MESSAGE = "SUCCESS: Report and plan submitted."
201
202def escape_ripgrep(text):
203    # Special characters to escape
204    special_chars = ["(", "{"]
205    for s in special_chars:
206        text = text.replace(s, "\\" + s)
207    return text
208
209@staticmethod
210def can_add_snippet(snippet: Snippet, current_snippets: list[Snippet]):
211    return (
212        len(snippet.xml) + sum([len(snippet.xml) for snippet in current_snippets])
213        <= ASSISTANT_MAX_CHARS
214    )
215
216
217@dataclass
218class RepoContextManager:
219    dir_obj: DirectoryTree
220    current_top_tree: str
221    snippets: list[Snippet]
222    snippet_scores: dict[str, float]
223    cloned_repo: ClonedRepo
224    current_top_snippets: list[Snippet] = field(default_factory=list)
225    read_only_snippets: list[Snippet] = field(default_factory=list)
226    issue_report_and_plan: str = ""
227    import_trees: str = ""
228    relevant_file_paths: list[str] = field(
229        default_factory=list
230    )  # a list of file paths that appear in the user query
231
232    @property
233    def top_snippet_paths(self):
234        return [snippet.file_path for snippet in self.current_top_snippets]
235
236    @property
237    def relevant_read_only_snippet_paths(self):
238        return [snippet.file_path for snippet in self.read_only_snippets]
239
240    def expand_all_directories(self, directories_to_expand: list[str]):
241        self.dir_obj.expand_directory(directories_to_expand)
242
243    def is_path_valid(self, path: str, directory: bool = False):
244        if directory:
245            return any(snippet.file_path.startswith(path) for snippet in self.snippets)
246        return any(snippet.file_path == path for snippet in self.snippets)
247
248    def format_context(
249        self,
250        unformatted_user_prompt: str,
251        query: str,
252    ):
253        top_snippets_str = [snippet.file_path for snippet in self.current_top_snippets]
254        # dedupe the list inplace
255        top_snippets_str = list(dict.fromkeys(top_snippets_str))
256        top_snippets_str = top_snippets_str[:NUM_SNIPPETS_TO_SHOW_AT_START]
257        snippets_in_repo_str = "\n".join(top_snippets_str)
258        logger.info(f"Snippets in repo:\n{snippets_in_repo_str}")
259        repo_tree = str(self.dir_obj)
260        import_tree_prompt = """
261## Import trees for code files in the user request
262<import_trees>
263{import_trees}
264</import_trees>
265"""
266        import_tree_prompt = (
267            import_tree_prompt.format(import_trees=self.import_trees.strip("\n"))
268            if self.import_trees
269            else ""
270        )
271        user_prompt = unformatted_user_prompt.format(
272            query=query,
273            snippets_in_repo=snippets_in_repo_str,
274            repo_tree=repo_tree,
275            import_tree_prompt=import_tree_prompt,
276            file_paths_in_query=", ".join(self.relevant_file_paths),
277        )
278        return user_prompt
279
280    def get_highest_scoring_snippet(self, file_path: str) -> Snippet:
281        def snippet_key(snippet):
282            return snippet.denotation
283
284        filtered_snippets = [
285            snippet
286            for snippet in self.snippets
287            if snippet.file_path == file_path
288            and snippet not in self.current_top_snippets
289        ]
290        if not filtered_snippets:
291            return None
292        highest_scoring_snippet = max(
293            filtered_snippets,
294            key=lambda snippet: (
295                self.snippet_scores[snippet_key(snippet)]
296                if snippet_key(snippet) in self.snippet_scores
297                else 0
298            ),
299        )
300        return highest_scoring_snippet
301
302    def add_snippets(self, snippets_to_add: list[Snippet]):
303        # self.dir_obj.add_file_paths([snippet.file_path for snippet in snippets_to_add])
304        for snippet in snippets_to_add:
305            self.current_top_snippets.append(snippet)
306
307    # does the same thing as add_snippets but adds it to the beginning of the list
308    def boost_snippets_to_top(self, snippets_to_boost: list[Snippet]):
309        # self.dir_obj.add_file_paths([snippet.file_path for snippet in snippets_to_boost])
310        for snippet in snippets_to_boost:
311            self.current_top_snippets.insert(0, snippet)
312
313    def add_import_trees(self, import_trees: str):
314        self.import_trees += "\n" + import_trees
315
316    def append_relevant_file_paths(self, relevant_file_paths: str):
317        # do not use append, it modifies the list in place and will update it for ALL instances of RepoContextManager
318        self.relevant_file_paths = self.relevant_file_paths + [relevant_file_paths]
319
320    def set_relevant_paths(self, relevant_file_paths: list[str]):
321        self.relevant_file_paths = relevant_file_paths
322
323    def update_issue_report_and_plan(self, new_issue_report_and_plan: str):
324        self.issue_report_and_plan = new_issue_report_and_plan
325
326
327"""
328Dump the import tree to a string
329Ex:
330main.py
331├── database.py
332│   └── models.py
333└── utils.py
334    └── models.py
335"""
336
337def build_full_hierarchy(
338    graph: nx.DiGraph, start_node: str, k: int, prefix="", is_last=True, level=0
339):
340    if level > k:
341        return ""
342    if level == 0:
343        hierarchy = f"{start_node}\n"
344    else:
345        hierarchy = f"{prefix}{'└── ' if is_last else '├── '}{start_node}\n"
346    child_prefix = prefix + ("    " if is_last else "│   ")
347    try:
348        successors = {
349            node
350            for node, length in nx.single_source_shortest_path_length(
351                graph, start_node, cutoff=1
352            ).items()
353            if length == 1
354        }
355    except Exception as e:
356        print("error occured while fetching successors:", e)
357        return hierarchy
358    sorted_successors = sorted(successors)
359    for idx, child in enumerate(sorted_successors):
360        child_is_last = idx == len(sorted_successors) - 1
361        hierarchy += build_full_hierarchy(
362            graph, child, k, child_prefix, child_is_last, level + 1
363        )
364    if level == 0:
365        try:
366            predecessors = {
367                node
368                for node, length in nx.single_source_shortest_path_length(
369                    graph.reverse(), start_node, cutoff=1
370                ).items()
371                if length == 1
372            }
373        except Exception as e:
374            print("error occured while fetching predecessors:", e)
375            return hierarchy
376        sorted_predecessors = sorted(predecessors)
377        for idx, parent in enumerate(sorted_predecessors):
378            parent_is_last = idx == len(sorted_predecessors) - 1
379            # Prepend parent hierarchy to the current node's hierarchy
380            hierarchy = (
381                build_full_hierarchy(graph, parent, k, "", parent_is_last, level + 1)
382                + hierarchy
383            )
384    return hierarchy
385
386
387def load_graph_from_file(filename):
388    G = nx.DiGraph()
389    current_node = None
390    with open(filename, "r") as file:
391        for line in file:
392            if not line:
393                continue
394            if line.startswith(" "):
395                line = line.strip()
396                if current_node:
397                    G.add_edge(current_node, line)
398            else:
399                line = line.strip()
400                current_node = line
401                if current_node:
402                    G.add_node(current_node)
403    return G
404
405@file_cache(ignore_params=["rcm", "G"])
406def graph_retrieval(formatted_query: str, top_k_paths: list[str], rcm: RepoContextManager, G: nx.DiGraph):
407    # TODO: tune these params
408    top_paths_cutoff = 25
409    num_rerank = 30
410    selected_paths = rcm.top_snippet_paths[:10]
411    top_k_paths = top_k_paths[:top_paths_cutoff]
412
413    snippet_scores = rcm.snippet_scores
414    for snippet, score in snippet_scores.items():
415        if snippet.split(":")[0] in top_k_paths:
416            snippet_scores[snippet] += 1
417
418    personalization = {}
419
420    for snippet in selected_paths:
421        personalization[snippet] = 1
422
423    try:
424        personalized_pagerank_scores = nx.pagerank(G, personalization=personalization, alpha=0.85)
425        unpersonalized_pagerank_scores = nx.pagerank(G, alpha=0.85)
426
427        # tfidf style
428        normalized_pagerank_scores = {path: score * log(1 / (1e-6 + unpersonalized_pagerank_scores[path])) for path, score in personalized_pagerank_scores.items()}
429
430        top_pagerank_scores = sorted(normalized_pagerank_scores.items(), key=lambda x: x[1], reverse=True)
431        
432        top_pagerank_paths = [path for path, _score in top_pagerank_scores]
433
434        distilled_file_path_list = []
435
436        for file_path, score in top_pagerank_scores:
437            if file_path.endswith(".js") and file_path.replace(".js", ".ts") in top_pagerank_paths:
438                continue
439            if file_path in top_k_paths:
440                continue
441            if "generated" in file_path or "mock" in file_path or "test" in file_path:
442                continue
443            try:
444                rcm.cloned_repo.get_file_contents(file_path)
445            except FileNotFoundError:
446                continue
447            distilled_file_path_list.append(file_path)
448        
449        # Rerank once
450        reranked_snippets = []
451        for file_path in distilled_file_path_list[:num_rerank]:
452            contents = rcm.cloned_repo.get_file_contents(file_path)
453            reranked_snippets.append(Snippet(
454                content=contents,
455                start=0,
456                end=contents.count("\n") + 1,
457                file_path=file_path,
458            ))
459        reranked_snippets = listwise_rerank_snippets(formatted_query, reranked_snippets, prompt_type="graph")
460        distilled_file_path_list[:num_rerank] = [snippet.file_path for snippet in reranked_snippets]
461
462        return distilled_file_path_list
463    except Exception as e:
464        logger.error(e)
465        return []
466
467@file_cache(ignore_params=["repo_context_manager", "override_import_graph"])
468def integrate_graph_retrieval(formatted_query: str, repo_context_manager: RepoContextManager, override_import_graph: nx.DiGraph = None):
469    num_graph_retrievals = 25
470    repo_context_manager, import_graph = parse_query_for_files(formatted_query, repo_context_manager)
471    if override_import_graph:
472        import_graph = override_import_graph
473    if import_graph:
474        # Graph retrieval can fail and return [] if the graph is not found or pagerank does not converge
475        # Happens especially when graph has multiple components
476        graph_retrieved_files = graph_retrieval(formatted_query, repo_context_manager.top_snippet_paths, repo_context_manager, import_graph)
477        if graph_retrieved_files:
478            sorted_snippets = sorted(
479                repo_context_manager.snippets,
480                key=lambda snippet: repo_context_manager.snippet_scores[snippet.denotation],
481                reverse=True,
482            )
483            snippets = []
484            for file_path in graph_retrieved_files:
485                for snippet in sorted_snippets[50 - num_graph_retrievals:]:
486                    if snippet.file_path == file_path:
487                        snippets.append(snippet)
488                        break
489            graph_retrieved_files = graph_retrieved_files[:num_graph_retrievals]
490            repo_context_manager.read_only_snippets = snippets[:len(graph_retrieved_files)]
491            repo_context_manager.current_top_snippets = repo_context_manager.current_top_snippets[:50 - num_graph_retrievals]
492    return repo_context_manager, import_graph
493
494# add import trees for any relevant_file_paths (code files that appear in query)
495def build_import_trees(
496    rcm: RepoContextManager,
497    import_graph: nx.DiGraph,
498    override_import_graph: nx.DiGraph = None,
499) -> tuple[RepoContextManager]:
500    if import_graph is None and override_import_graph is None:
501        return rcm
502    if override_import_graph:
503        import_graph = override_import_graph
504    # if we have found relevant_file_paths in the query, we build their import trees
505    code_files_in_query = rcm.relevant_file_paths
506    # graph_retrieved_files = graph_retrieval(rcm.top_snippet_paths, rcm, import_graph)[:15]
507    graph_retrieved_files = [snippet.file_path for snippet in rcm.read_only_snippets]
508    if code_files_in_query:
509        for file in code_files_in_query:
510            # fetch direct parent and children
511            representation = (
512                f"\nThe file '{file}' has the following import structure: \n"
513                + build_full_hierarchy(import_graph, file, 2)
514            )
515            if graph_retrieved_files:
516                representation += "\n\nThe following modules may contain helpful services or utility functions:\n- " + "\n- ".join(graph_retrieved_files)
517            rcm.add_import_trees(representation)
518    # if there are no code_files_in_query, we build import trees for the top 5 snippets
519    else:
520        for snippet in rcm.current_top_snippets[:5]:
521            file_path = snippet.file_path
522            representation = (
523                f"\nThe file '{file_path}' has the following import structure: \n"
524                + build_full_hierarchy(import_graph, file_path, 2)
525            )
526            if graph_retrieved_files:
527                representation += "\n\nThe following modules may contain helpful services or utility functions:\n- " + "\n-".join(graph_retrieved_files)
528            rcm.add_import_trees(representation)
529    return rcm
530
531
532# add any code files that appear in the query to current_top_snippets
533def add_relevant_files_to_top_snippets(rcm: RepoContextManager) -> RepoContextManager:
534    code_files_in_query = rcm.relevant_file_paths
535    for file in code_files_in_query:
536        current_top_snippet_paths = [
537            snippet.file_path for snippet in rcm.current_top_snippets
538        ]
539        # if our mentioned code file isnt already in the current_top_snippets we add it
540        if file not in current_top_snippet_paths:
541            try:
542                code_snippets = [
543                    snippet for snippet in rcm.snippets if snippet.file_path == file
544                ]
545                rcm.boost_snippets_to_top(code_snippets)
546            except Exception as e:
547                logger.error(
548                    f"Tried to add code file found in query but recieved error: {e}, skipping and continuing to next one."
549                )
550    return rcm
551
552
553# fetch all files mentioned in the user query
554def parse_query_for_files(
555    query: str, rcm: RepoContextManager
556) -> tuple[RepoContextManager, nx.DiGraph]:
557    # use cloned_repo to attempt to find any files names that appear in the query
558    repo_full_name = rcm.cloned_repo.repo_full_name
559    repo_name = repo_full_name.split("/")[-1]
560    repo_group_name = repo_full_name.split("/")[0]
561    code_files_to_add = set([])
562    code_files_to_check = set(list(rcm.cloned_repo.get_file_list()))
563    code_files_uri_encoded = [
564        urllib.parse.quote(file_path) for file_path in code_files_to_check
565    ]
566    # check if any code files are mentioned in the query
567    for file, file_uri_encoded in zip(code_files_to_check, code_files_uri_encoded):
568        if file in query or file_uri_encoded in query:
569            code_files_to_add.add(file)
570    for code_file in code_files_to_add:
571        rcm.append_relevant_file_paths(code_file)
572    # only for enterprise
573    try:
574        pathing = (
575            f"{repo_group_name}_import_graphs/{repo_name}/{repo_name}_import_tree.txt"
576        )
577        if not os.path.exists(pathing):
578            return rcm, None
579        graph = load_graph_from_file(pathing)
580    except Exception as e:
581        logger.error(
582            f"Error loading import tree: {e}, skipping step and setting import_tree to empty string"
583        )
584        return rcm, None
585    files = set(list(graph.nodes()))
586    files_uri_encoded = [urllib.parse.quote(file_path) for file_path in files]
587    for file, file_uri_encoded in zip(files, files_uri_encoded):
588        if (file in query or file_uri_encoded in query) and (
589            file not in code_files_to_add
590        ):
591            rcm.append_relevant_file_paths(file)
592    return rcm, graph
593
594
595# do not ignore repo_context_manager
596# @file_cache(ignore_params=["ticket_progress", "chat_logger"])
597def get_relevant_context(
598    query: str,
599    repo_context_manager: RepoContextManager,
600    seed: int = None,
601    import_graph: nx.DiGraph = None,
602    num_rollouts: int = NUM_ROLLOUTS,
603    ticket_progress: TicketProgress = None,
604    chat_logger: ChatLogger = None,
605):
606    logger.info("Seed: " + str(seed))
607    try:
608        # for any code file mentioned in the query, build its import tree - This is currently not used
609        repo_context_manager = build_import_trees(
610            repo_context_manager,
611            import_graph,
612        )
613        # for any code file mentioned in the query add it to the top relevant snippets
614        repo_context_manager = add_relevant_files_to_top_snippets(repo_context_manager)
615        # add relevant files to dir_obj inside repo_context_manager, this is in case dir_obj is too large when as a string
616        repo_context_manager.dir_obj.add_relevant_files(
617            repo_context_manager.relevant_file_paths
618        )
619
620        user_prompt = repo_context_manager.format_context(
621            unformatted_user_prompt=unformatted_user_prompt,
622            query=query,
623        )
624        chat_gpt = ChatGPT()
625        chat_gpt.messages = [Message(role="system", content=sys_prompt)]
626        old_relevant_snippets = deepcopy(repo_context_manager.current_top_snippets)
627        old_read_only_snippets = deepcopy(repo_context_manager.read_only_snippets)
628        try:
629            repo_context_manager = context_dfs(
630                user_prompt,
631                repo_context_manager,
632                problem_statement=query,
633                num_rollouts=num_rollouts,
634            )
635        except openai.BadRequestError as e:  # sometimes means that run has expired
636            logger.exception(e)
637        # repo_context_manager.current_top_snippets += old_relevant_snippets[:25 - len(repo_context_manager.current_top_snippets)]
638        # Add stuffing until context limit
639        max_chars = 140000 * 3.5 # 120k tokens
640        counter = sum([len(snippet.get_snippet(False, False)) for snippet in repo_context_manager.current_top_snippets]) + sum(
641            [len(snippet.get_snippet(False, False)) for snippet in repo_context_manager.read_only_snippets]
642        )
643        for snippet, read_only_snippet in zip_longest(old_relevant_snippets, old_read_only_snippets, fillvalue=None):
644            if snippet and not any(context_snippet.file_path == snippet.file_path for context_snippet in repo_context_manager.current_top_snippets):
645                counter += len(snippet.get_snippet(False, False))
646                if counter > max_chars:
647                    break
648                repo_context_manager.current_top_snippets.append(snippet)
649            if read_only_snippet and not any(context_snippet.file_path == read_only_snippet.file_path for context_snippet in repo_context_manager.read_only_snippets):
650                counter += len(read_only_snippet.get_snippet(False, False))
651                if counter > max_chars:
652                    break
653                repo_context_manager.read_only_snippets.append(read_only_snippet)
654        return repo_context_manager
655    except Exception as e:
656        logger.exception(e)
657        return repo_context_manager
658
659
660def update_assistant_conversation(
661    run: Run,
662    thread: Thread,
663    ticket_progress: TicketProgress,
664    repo_context_manager: RepoContextManager,
665):
666    assistant_conversation = AssistantConversation.from_ids(
667        assistant_id=run.assistant_id,
668        run_id=run.id,
669        thread_id=thread.id,
670    )
671    if ticket_progress:
672        if assistant_conversation:
673            ticket_progress.search_progress.pruning_conversation = (
674                assistant_conversation
675            )
676        ticket_progress.search_progress.repo_tree = str(repo_context_manager.dir_obj)
677        ticket_progress.search_progress.final_snippets = (
678            repo_context_manager.current_top_snippets
679        )
680        ticket_progress.save()
681
682
683CLAUDE_MODEL = "claude-3-haiku-20240307"
684
685
686def validate_and_parse_function_calls(
687    function_calls_string: str, chat_gpt: ChatGPT
688) -> list[AnthropicFunctionCall]:
689    function_calls = AnthropicFunctionCall.mock_function_calls_from_string(
690        function_calls_string.strip("\n") + "\n</function_call>"
691    )  # add end tag
692    if len(function_calls) > 0:
693        chat_gpt.messages[-1].content = (
694            chat_gpt.messages[-1].content.rstrip("\n") + "\n</function_call>"
695        )  # add end tag to assistant message
696        return function_calls
697
698    # try adding </invoke> tag as well
699    function_calls = AnthropicFunctionCall.mock_function_calls_from_string(
700        function_calls_string.strip("\n") + "\n</invoke>\n</function_call>"
701    )
702    if len(function_calls) > 0:
703        # update state of chat_gpt
704        chat_gpt.messages[-1].content = (
705            chat_gpt.messages[-1].content.rstrip("\n") + "\n</invoke>\n</function_call>"
706        )
707        return function_calls
708    # try adding </parameters> tag as well
709    function_calls = AnthropicFunctionCall.mock_function_calls_from_string(
710        function_calls_string.strip("\n")
711        + "\n</parameters>\n</invoke>\n</function_call>"
712    )
713    if len(function_calls) > 0:
714        # update state of chat_gpt
715        chat_gpt.messages[-1].content = (
716            chat_gpt.messages[-1].content.rstrip("\n")
717            + "\n</parameters>\n</invoke>\n</function_call>"
718        )
719    return function_calls
720
721
722def handle_function_call(
723    repo_context_manager: RepoContextManager, function_call: AnthropicFunctionCall, llm_state: dict[str, str]
724):
725    function_name = function_call.function_name
726    function_input = function_call.function_parameters
727    logger.info(f"Tool Call: {function_name} {function_input}")
728    file_path = function_input.get("file_path")
729    valid_path = False
730    output_prefix = f"Output for {function_name}:\n"
731    output = ""
732    current_top_snippets_string = "\n".join(
733        [snippet.denotation for snippet in repo_context_manager.current_top_snippets]
734    )
735    if function_name == "code_search":
736        code_entity = f'"{function_input["code_entity"]}"'  # handles cases with two words
737        code_entity = escape_ripgrep(code_entity) # escape special characters
738        rg_command = [
739            "rg",
740            "-n",
741            "-i",
742            code_entity,
743            repo_context_manager.cloned_repo.repo_dir,
744        ]
745        try:
746            result = subprocess.run(
747                " ".join(rg_command), text=True, shell=True, capture_output=True
748            )
749            rg_output = result.stdout
750            if rg_output:
751                # post process rip grep output to be more condensed
752                rg_output_pretty, file_output_dict = post_process_rg_output(
753                    repo_context_manager.cloned_repo.repo_dir, SweepConfig(), rg_output
754                )
755                non_stored_files = [
756                    file_path
757                    for file_path in file_output_dict
758                    if file_path not in repo_context_manager.top_snippet_paths
759                ]
760                non_stored_files_string = "The following files have not been stored:\n" + "\n".join(non_stored_files) + "\n"
761                output = (
762                    f"SUCCESS: Here are the code_search results:\n<code_search_results>\n{rg_output_pretty}<code_search_results>\n" +
763                    get_stored_files(repo_context_manager) + non_stored_files_string +
764                    "Use the `view_file` tool to determine which non-stored files are most relevant to solving the issue. Use `store_file` to add any important non-stored files to the context."
765                )
766            else:
767                output = f"FAILURE: No results found for code_entity: {code_entity} in the entire codebase. Please try a new code_entity. Consider trying different whitespace or a truncated version of this code_entity."
768        except Exception as e:
769            logger.error(
770                f"FAILURE: An Error occured while trying to find the code_entity {code_entity}: {e}"
771            )
772            output = f"FAILURE: No results found for code_entity: {code_entity} in the entire codebase. Please try a new code_entity. Consider trying different whitespace or a truncated version of this code_entity."
773    elif function_name == "view_file":
774        try:
775            file_contents = repo_context_manager.cloned_repo.get_file_contents(
776                file_path
777            )
778            # check if file has been viewed already
779            function_call_history = llm_state.get("function_call_history", [])
780            # unnest 2d list
781            previous_function_calls = [
782                call for sublist in function_call_history for call in sublist
783            ]
784            previously_viewed_files = [
785                call.function_parameters.get("file_path")
786                for call in previous_function_calls
787                if call.function_name == "view_file"
788            ]
789            previously_viewed_files = list(dict.fromkeys(previously_viewed_files))
790            if file_path in previously_viewed_files:
791                previously_viewed_files_str = "\n".join(previously_viewed_files)
792                output = f"WARNING: `{file_path}` has already been viewed. Please refer to the file in your previous function call. These files have already been viewed:\n{previously_viewed_files_str}"
793            else:
794                output = f'SUCCESS: Here are the contents of `{file_path}`:\n<source>\n{file_contents}\n</source>'
795            if file_path not in [snippet.file_path for snippet in repo_context_manager.current_top_snippets]:
796                suffix = f'\nIf you are CERTAIN this file is RELEVANT, call store_file with the same parameters ({{"file_path": "{file_path}"}}).'
797            else:
798                suffix = '\nThis file has already been stored.'
799            output += suffix
800        except FileNotFoundError:
801            file_contents = ""
802            similar_file_paths = "\n".join(
803                [
804                    f"- {path}"
805                    for path in repo_context_manager.cloned_repo.get_similar_file_paths(
806                        file_path
807                    )
808                ]
809            )
810            output = f"FAILURE: This file path does not exist. Did you mean:\n{similar_file_paths}"
811    elif function_name == "store_file":
812        try:
813            file_contents = repo_context_manager.cloned_repo.get_file_contents(
814                file_path
815            )
816            valid_path = True
817        except Exception:
818            file_contents = ""
819            similar_file_paths = "\n".join(
820                [
821                    f"- {path}"
822                    for path in repo_context_manager.cloned_repo.get_similar_file_paths(
823                        file_path
824                    )
825                ]
826            )
827            output = f"FAILURE: This file path does not exist. Did you mean:\n{similar_file_paths}"
828        else:
829            snippet = Snippet(
830                file_path=file_path,
831                start=0,
832                end=len(file_contents.splitlines()),
833                content=file_contents,
834            )
835            if snippet.denotation in current_top_snippets_string:
836                output = f"FAILURE: {get_stored_files(repo_context_manager)}"
837            else:
838                repo_context_manager.add_snippets([snippet])
839                current_top_snippets_string = "\n".join(
840                    [
841                        snippet.denotation
842                        for snippet in repo_context_manager.current_top_snippets
843                    ]
844                )
845                output = (
846                    f"SUCCESS: {file_path} was added to the context. It will be used as a reference or modified to resolve the issue. Here are the current selected snippets:\n{current_top_snippets_string}"
847                    if valid_path
848                    else f"FAILURE: The file path '{file_path}' does not exist. Please check the path and try again."
849                )
850    elif function_name == "submit":
851        plan = function_input.get("plan")
852        repo_context_manager.update_issue_report_and_plan(f"# Highly Suggested Plan:\n\n{plan}\n\n")
853        output = PLAN_SUBMITTED_MESSAGE
854    else:
855        output = f"FAILURE: Invalid tool name {function_name}"
856    justification = (
857        function_input["justification"] if "justification" in function_input else ""
858    )
859    logger.info(
860        f"Tool Call: {function_name}\n{justification}\n{output}"
861    )
862    return (output_prefix + output)
863
864
865reflections_prompt_prefix = """
866CRITICAL FEEDBACK - READ CAREFULLY AND ADDRESS ALL POINTS
867<critical_feedback_to_address>
868Here is the feedback from your previous attempt. You MUST read this extremely carefully and follow ALL of the reviewer's advice. If they tell you to store specific files, view store them first. If you do not fully address this feedback you will fail to retrieve all of the relevant files.
869{all_reflections}
870</critical_feedback_to_address>"""
871
872reflection_prompt = """<attempt_and_feedback_{idx}>
873<previous_files_stored>
874Files stored from previous attempt:
875{files_read}
876</previous_files_stored>
877<rating>
878Rating from previous attempt: {score} / 10
879</rating>
880<feedback>
881Reviewer feedback on previous attempt:
882{reflections_string}
883</feedback>
884</attempt_and_feedback_{idx}>"""
885
886def format_reflections(reflections_to_gathered_files: dict[str, tuple[list[str], int]]) -> str:
887    formatted_reflections_prompt = ""
888    if not reflections_to_gathered_files:
889        return formatted_reflections_prompt
890    all_reflections_string = "\n"
891    # take only the MAX_REFLECTIONS sorted by score
892    top_reflections = sorted(
893        reflections_to_gathered_files.items(), key=lambda x: x[1][1] * 100 + len(x[1][0]), reverse=True # break ties by number of files stored
894    )[:MAX_REFLECTIONS]
895    for idx, (reflection, (gathered_files, score)) in enumerate(top_reflections):
896        formatted_reflection = reflection_prompt.format(
897            files_read="\n".join(gathered_files),
898            reflections_string=reflection,
899            score=str(score),
900            idx=str(idx + 1),
901        )
902        all_reflections_string += f"\n{formatted_reflection}"
903    formatted_reflections_prompt = reflections_prompt_prefix.format(
904        all_reflections=all_reflections_string
905    )
906    return formatted_reflections_prompt
907
908def render_all_attempts(function_call_histories: list[list[list[AnthropicFunctionCall]]]) -> str:
909    formatted_attempts = ""
910    for idx, function_call_history in enumerate(function_call_histories):
911        formatted_function_calls = render_function_calls_for_attempt(function_call_history)
912        formatted_attempts += f"<attempt_{idx}>\n{formatted_function_calls}\n</attempt_{idx}>"
913    return formatted_attempts
914
915def render_function_calls_for_attempt(function_call_history: list[list[AnthropicFunctionCall]]) -> str:
916    formatted_function_calls = ""
917    idx = 0
918    for function_calls in function_call_history:
919        for function_call in function_calls:
920            function_call.function_parameters.pop("justification", None) # remove justification
921            function_call_cleaned_string = function_call.function_name + "\n".join([str(k) + "|" + str(v) for k, v in function_call.function_parameters.items()])
922            formatted_function_calls += f"<function_call_{idx}>{function_call_cleaned_string}</function_call_{idx}>\n"
923        if function_calls:
924            idx += 1
925    return formatted_function_calls
926
927def get_stored_files(repo_context_manager: RepoContextManager) -> str:
928    fetched_files_that_are_stored = [snippet.file_path for snippet in repo_context_manager.current_top_snippets]
929    joined_files_string = "\n".join(fetched_files_that_are_stored)
930    stored_files_string = f'The following files have been stored already:\n{joined_files_string}.\n' if fetched_files_that_are_stored else ""
931    return stored_files_string
932
933def search_for_context_with_reflection(repo_context_manager: RepoContextManager, reflections_to_read_files: dict[str, tuple[list[str], int]], user_prompt: str, rollout_function_call_histories: list[list[list[AnthropicFunctionCall]]], problem_statement: str) -> tuple[list[Message], list[list[AnthropicFunctionCall]]]:
934    _, function_call_history = perform_rollout(repo_context_manager, reflections_to_read_files, user_prompt)
935    rollout_function_call_histories.append(function_call_history)
936    rollout_stored_files = [snippet.file_path for snippet in repo_context_manager.current_top_snippets]
937    # truncated_message_results = message_results[1:] # skip system prompt
938    # joined_messages = "\n\n".join([message.content for message in truncated_message_results])
939    # overall_score, message_to_contractor = EvaluatorAgent().evaluate_run(
940    #     problem_statement=problem_statement, 
941    #     run_text=joined_messages,
942    #     stored_files=rollout_stored_files,
943    # )
944    return 0, "", repo_context_manager, rollout_stored_files
945
946def perform_rollout(repo_context_manager: RepoContextManager, reflections_to_gathered_files: dict[str, tuple[list[str], int]], user_prompt: str) -> list[Message]:
947    function_call_history = []
948    formatted_reflections_prompt = format_reflections(reflections_to_gathered_files)
949    updated_user_prompt = user_prompt + formatted_reflections_prompt
950    chat_gpt = ChatGPT()
951    chat_gpt.messages = [Message(role="system", content=sys_prompt + formatted_reflections_prompt)]
952    function_calls_string = chat_gpt.chat_anthropic(
953        content=updated_user_prompt,
954        stop_sequences=["</function_call>"],
955        model=CLAUDE_MODEL,
956        message_key="user_request",
957    )
958    bad_call_count = 0
959    llm_state = {} # persisted across one rollout
960    for _ in range(MAX_ITERATIONS):
961        function_calls = validate_and_parse_function_calls(
962            function_calls_string, chat_gpt
963        )
964        function_outputs = ""
965        for function_call in function_calls[:MAX_PARALLEL_FUNCTION_CALLS]:
966            function_outputs += handle_function_call(repo_context_manager, function_call, llm_state) + "\n"
967            llm_state["function_call_history"] = function_call_history
968            if PLAN_SUBMITTED_MESSAGE in function_outputs:
969                return chat_gpt.messages, function_call_history
970        function_call_history.append(function_calls)
971        if len(function_calls) == 0:
972            function_outputs = "FAILURE: No function calls were made or your last function call was incorrectly formatted. The correct syntax for function calling is this:\n" \
973                + "<function_call>\n<invoke>\n<tool_name>tool_name</tool_name>\n<parameters>\n<param_name>param_value</param_name>\n</parameters>\n</invoke>\n</function_call>" + "\nRemember to gather ALL relevant files. " + get_stored_files(repo_context_manager)
974            bad_call_count += 1
975            if bad_call_count >= NUM_BAD_FUNCTION_CALLS:
976                return chat_gpt.messages, function_call_history
977        if len(function_calls) > MAX_PARALLEL_FUNCTION_CALLS:
978            remaining_function_calls = function_calls[MAX_PARALLEL_FUNCTION_CALLS:]
979            remaining_function_calls_string = mock_function_calls_to_string(remaining_function_calls)
980            function_outputs += "WARNING: You requested more than 1 function call at once. Only the first function call has been processed. The unprocessed function calls were:\n<unprocessed_function_call>\n" + remaining_function_calls_string + "\n</unprocessed_function_call>"
981        try:
982            function_calls_string = chat_gpt.chat_anthropic(
983                content=function_outputs,
984                model=CLAUDE_MODEL,
985                stop_sequences=["</function_call>"],
986            )
987        except Exception as e:
988            logger.error(f"Error in chat_anthropic: {e}")
989            # return all but the last message because it likely causes an error
990            return chat_gpt.messages[:-1], function_call_history
991    return chat_gpt.messages, function_call_history
992
993def context_dfs(
994    user_prompt: str,
995    repo_context_manager: RepoContextManager,
996    problem_statement: str,
997    num_rollouts: int,
998) -> bool | None:
999    repo_context_manager.current_top_snippets = []
1000    # initial function call
1001    reflections_to_read_files = {}
1002    rollouts_to_scores_and_rcms = {}
1003    rollout_function_call_histories = []
1004    for rollout_idx in range(num_rollouts):
1005        # operate on a deep copy of the repo context manager
1006        if rollout_idx > 0:
1007            user_prompt = repo_context_manager.format_context(
1008                unformatted_user_prompt=unformatted_user_prompt_stored,
1009                query=problem_statement,
1010            )
1011        overall_score, message_to_contractor, copied_repo_context_manager, rollout_stored_files = search_for_context_with_reflection(
1012            repo_context_manager=repo_context_manager,
1013            reflections_to_read_files=reflections_to_read_files,
1014            user_prompt=user_prompt,
1015            rollout_function_call_histories=rollout_function_call_histories,
1016            problem_statement=problem_statement
1017        )
1018        logger.info(f"Completed run {rollout_idx} with score: {overall_score} and reflection: {message_to_contractor}")
1019        if overall_score is None or message_to_contractor is None:
1020            continue # can't get any reflections here
1021        # reflections_to_read_files[message_to_contractor] = rollout_stored_files, overall_score
1022        rollouts_to_scores_and_rcms[rollout_idx] = (overall_score, copied_repo_context_manager)
1023        if overall_score >= SCORE_THRESHOLD and len(rollout_stored_files) > STOP_AFTER_SCORE_THRESHOLD_IDX:
1024            break
1025    # if we reach here, we have not found a good enough solution
1026    # select rcm from the best rollout
1027    logger.info(f"{render_all_attempts(rollout_function_call_histories)}")
1028    all_scores_and_rcms = list(rollouts_to_scores_and_rcms.values())
1029    best_score, best_rcm = max(all_scores_and_rcms, key=lambda x: x[0] * 100 + len(x[1].current_top_snippets)) # sort first on the highest score, break ties with length of current_top_snippets
1030    for score, rcm in all_scores_and_rcms:
1031        logger.info(f"Rollout score: {score}, Rollout files: {[snippet.file_path for snippet in rcm.current_top_snippets]}")
1032    logger.info(f"Best score: {best_score}, Best files: {[snippet.file_path for snippet in best_rcm.current_top_snippets]}")
1033    return best_rcm
1034
1035if __name__ == "__main__":
1036    try:
1037        from sweepai.utils.github_utils import get_installation_id
1038        from sweepai.utils.ticket_utils import prep_snippets
1039
1040        organization_name = "sweepai"
1041        installation_id = get_installation_id(organization_name)
1042        cloned_repo = ClonedRepo("sweepai/sweep", installation_id, "main")
1043        query = "allow 'sweep.yaml' to be read from the user/organization's .github repository. this is found in client.py and we need to change this to optionally read from .github/sweep.yaml if it exists there"
1044        # golden response is
1045        # sweepai/handlers/create_pr.py:401-428
1046        # sweepai/config/client.py:178-282
1047        ticket_progress = TicketProgress(
1048            tracking_id="test",
1049        )
1050        repo_context_manager = prep_snippets(cloned_repo, query, ticket_progress)
1051        rcm = get_relevant_context(
1052            query,
1053            repo_context_manager,
1054            ticket_progress,
1055            chat_logger=ChatLogger({"username": "wwzeng1"}),
1056        )
1057        for snippet in rcm.current_top_snippets:
1058            print(snippet.denotation)
1059    except Exception as e:
1060        logger.error(f"context_pruning.py failed to run successfully with error: {e}")
1061        raise e
1062

At the end of the file, add a if __name__ == "__main__": block with:

A try/except to catch and print any errors
Code to:
- Get an installation ID using get_installation_id()
- Create a ClonedRepo for "sweepai/sweep"
- Create a sample query string
- Call prep_snippets() to create a RepoContextManager
- Call get_relevant_context() with the query and RepoContextManager
- Print out the snippets in the final RepoContextManager This will serve as a runnable example to manually test the context pruning flow.

Plan

This is based on the results of the Planning step. The plan may expand from failed GitHub Actions runs.

Createtests/test_context_pruning.py522afed

Modifysweepai/core/context_pruning.py 522afed

Code Snippets Found

This is based on the results of the Searching step.

tests/test_watch.py:0-12

1import os
2import pickle
3
4from sweepai.watch import handle_event
5
6event_pickle_paths = [
7    "pull_request_opened_34875324597.pkl",
8    "issue_labeled_11503901425.pkl",
9]
10for path in event_pickle_paths:
11    event = pickle.load(open(os.path.join("tests/events", path), "rb"))
12    handle_event(event, do_async=False)

sweepai/utils/multi_query.py:0-102

1import re
2
3from loguru import logger
4from sweepai.core.chat import ChatGPT
5from sweepai.core.entities import Message
6
7# TODO: add docs and tests later
8
9system_message = """You are a thorough and meticulous AI assistant helping a user search for relevant files in a codebase to resolve a GitHub issue. The user will provide a description of the issue, including any relevant details, logs, or observations. Your task is to:
10
111. Summary
12
13Summarize the key points of the issue concisely, but also list out any unfamiliar terms, acronyms, or entities mentioned that may require additional context to fully understand the problem space and identify all relevant code areas.
14
152. Solution
16
17Describe thoroughly in extreme detail what the ideal code fix would look like:
18- Dive deep into the low-level implementation details of how you would change each file. Explain the logic, algorithms, data structures, etc. 
19- Explicitly call out any helper functions, utility modules, libraries or APIs you would leverage.
20- Carefully consider ALL parts of the codebase that could be relevant, including (in decreasing relevance):
21  - Database schemas, models
22  - Type definitions, interfaces, enums, constants
23  - Shared utility code for common operations like date formatting, string manipulation, etc.
24  - Database mutators and query logic 
25  - User-facing messages, error messages, localization, i18n
26  - Exception handling, error recovery, retries, fallbacks
27  - API routes, request/response handling, serialization
28  - UI components, client-side logic, event handlers
29  - Backend services, data processing, business logic
30  - Logging, monitoring, metrics, error tracking, observability, o11y
31  - Auth flows, session management, encryption
32  - Infrastructure, CI/CD, deployments, config
33- List out any unfamiliar domain terms to search for to better understand schemas, types, relationships between entities, etc. Finding data models is key.
34- Rate limiting, caching and other cross-cutting concerns could be very relevant for issues with scale or performance.
35
363. Queries
37
38Generate a list of 10 diverse, highly specific, focused "where" queries to use as vector database search queries to find the most relevant code sections to directly resolve the GitHub issue.
39- Reference very specific functions, variables, classes, endpoints, etc. using exact names.
40- Describe the purpose and behavior of the code in detail to differentiate it. 
41- Ask about granular logic within individual functions/methods.
42- Mention adjacent code like schemas, configs, and helpers to establish context.
43- Use verbose natural language that mirrors the terminology in the codebase.
44- Aim for high specificity to pinpoint the most relevant code in a large codebase.
45
46Format your response like this:
47
48<summary>
49[Brief 1-2 sentence summary of the key points of the issue]
50</summary>
51
52<solution>
53[detailed sentences describing what an ideal fix would change in the code and how
54
55Exhaustive list of relevant parts of the codebase that could be used in the solution include:
56- [Module, service, function or endpoint 1] 
57- [Module, service, function or endpoint 2]
58- [etc.]
59</solution>
60
61<queries>
62<query>Where is the [extremely specific description of code section 1]?</query>
63<query>Where is the [extremely specific description of code section 2]?</query>
64<query>Where is the [extremely specific description of code section 3]?</query>
65...
66</queries>
67
68Examples of good queries:
69- Where is the function that compares the user-provided password hash against the stored hash from the database in the user-authentication service?
70- Where is the code that constructs the GraphQL mutation for updating a user's profile information, and what specific fields are being updated?
71- Where are the React components that render the product carousel on the homepage, and what library is being used for the carousel functionality?
72- Where is the endpoint handler for processing incoming webhook events from Stripe in the backend API, and how are the events being validated and parsed?
73- Where is the function that generates the XML sitemap for SEO, and what are the specific criteria used for determining which pages are included?
74- Where are the push notification configurations and registration logic implemented using the Firebase Cloud Messaging library in the mobile app codebase?
75- Where are the Elasticsearch queries that power the autocomplete suggestions for the site's search bar, and what specific fields are being searched and returned?
76- Where is the logic for automatically provisioning and scaling EC2 instances based on CPU and memory usage metrics from CloudWatch in the DevOps scripts?"""
77
78def generate_multi_queries(input_query: str):
79    chatgpt = ChatGPT(
80        messages=[
81            Message(
82                content=system_message,
83                role="system",
84            )
85        ],
86    )
87    stripped_input = input_query.strip('\n')
88    response = chatgpt.chat_anthropic(
89        f"<github_issue>\n{stripped_input}\n</github_issue>", 
90        model="claude-3-opus-20240229"
91    )
92    pattern = re.compile(r"<query>(?P<query>.*?)</query>", re.DOTALL)
93    queries = []
94    for q in pattern.finditer(response):
95        query = q.group("query").strip()
96        if query:
97            queries.append(query)
98    logger.debug(f"Generated {len(queries)} queries from the input query.")
99    return queries
100
101if __name__ == "__main__":
102    input_query = "I am trying to set up payment processing in my app using Stripe, but I keep getting a 400 error when I try to create a payment intent. I have checked the API key and the request body, but I can't figure out what's wrong. Here is the error message I'm getting: 'Invalid request: request parameters are invalid'. I have attached the relevant code snippets below. Can you help me find the part of the code that is causing this error?"

sweepai/core/context_pruning.py:946-1061

946def perform_rollout(repo_context_manager: RepoContextManager, reflections_to_gathered_files: dict[str, tuple[list[str], int]], user_prompt: str) -> list[Message]:
947    function_call_history = []
948    formatted_reflections_prompt = format_reflections(reflections_to_gathered_files)
949    updated_user_prompt = user_prompt + formatted_reflections_prompt
950    chat_gpt = ChatGPT()
951    chat_gpt.messages = [Message(role="system", content=sys_prompt + formatted_reflections_prompt)]
952    function_calls_string = chat_gpt.chat_anthropic(
953        content=updated_user_prompt,
954        stop_sequences=["</function_call>"],
955        model=CLAUDE_MODEL,
956        message_key="user_request",
957    )
958    bad_call_count = 0
959    llm_state = {} # persisted across one rollout
960    for _ in range(MAX_ITERATIONS):
961        function_calls = validate_and_parse_function_calls(
962            function_calls_string, chat_gpt
963        )
964        function_outputs = ""
965        for function_call in function_calls[:MAX_PARALLEL_FUNCTION_CALLS]:
966            function_outputs += handle_function_call(repo_context_manager, function_call, llm_state) + "\n"
967            llm_state["function_call_history"] = function_call_history
968            if PLAN_SUBMITTED_MESSAGE in function_outputs:
969                return chat_gpt.messages, function_call_history
970        function_call_history.append(function_calls)
971        if len(function_calls) == 0:
972            function_outputs = "FAILURE: No function calls were made or your last function call was incorrectly formatted. The correct syntax for function calling is this:\n" \
973                + "<function_call>\n<invoke>\n<tool_name>tool_name</tool_name>\n<parameters>\n<param_name>param_value</param_name>\n</parameters>\n</invoke>\n</function_call>" + "\nRemember to gather ALL relevant files. " + get_stored_files(repo_context_manager)
974            bad_call_count += 1
975            if bad_call_count >= NUM_BAD_FUNCTION_CALLS:
976                return chat_gpt.messages, function_call_history
977        if len(function_calls) > MAX_PARALLEL_FUNCTION_CALLS:
978            remaining_function_calls = function_calls[MAX_PARALLEL_FUNCTION_CALLS:]
979            remaining_function_calls_string = mock_function_calls_to_string(remaining_function_calls)
980            function_outputs += "WARNING: You requested more than 1 function call at once. Only the first function call has been processed. The unprocessed function calls were:\n<unprocessed_function_call>\n" + remaining_function_calls_string + "\n</unprocessed_function_call>"
981        try:
982            function_calls_string = chat_gpt.chat_anthropic(
983                content=function_outputs,
984                model=CLAUDE_MODEL,
985                stop_sequences=["</function_call>"],
986            )
987        except Exception as e:
988            logger.error(f"Error in chat_anthropic: {e}")
989            # return all but the last message because it likely causes an error
990            return chat_gpt.messages[:-1], function_call_history
991    return chat_gpt.messages, function_call_history
992
993def context_dfs(
994    user_prompt: str,
995    repo_context_manager: RepoContextManager,
996    problem_statement: str,
997    num_rollouts: int,
998) -> bool | None:
999    repo_context_manager.current_top_snippets = []
1000    # initial function call
1001    reflections_to_read_files = {}
1002    rollouts_to_scores_and_rcms = {}
1003    rollout_function_call_histories = []
1004    for rollout_idx in range(num_rollouts):
1005        # operate on a deep copy of the repo context manager
1006        if rollout_idx > 0:
1007            user_prompt = repo_context_manager.format_context(
1008                unformatted_user_prompt=unformatted_user_prompt_stored,
1009                query=problem_statement,
1010            )
1011        overall_score, message_to_contractor, copied_repo_context_manager, rollout_stored_files = search_for_context_with_reflection(
1012            repo_context_manager=repo_context_manager,
1013            reflections_to_read_files=reflections_to_read_files,
1014            user_prompt=user_prompt,
1015            rollout_function_call_histories=rollout_function_call_histories,
1016            problem_statement=problem_statement
1017        )
1018        logger.info(f"Completed run {rollout_idx} with score: {overall_score} and reflection: {message_to_contractor}")
1019        if overall_score is None or message_to_contractor is None:
1020            continue # can't get any reflections here
1021        # reflections_to_read_files[message_to_contractor] = rollout_stored_files, overall_score
1022        rollouts_to_scores_and_rcms[rollout_idx] = (overall_score, copied_repo_context_manager)
1023        if overall_score >= SCORE_THRESHOLD and len(rollout_stored_files) > STOP_AFTER_SCORE_THRESHOLD_IDX:
1024            break
1025    # if we reach here, we have not found a good enough solution
1026    # select rcm from the best rollout
1027    logger.info(f"{render_all_attempts(rollout_function_call_histories)}")
1028    all_scores_and_rcms = list(rollouts_to_scores_and_rcms.values())
1029    best_score, best_rcm = max(all_scores_and_rcms, key=lambda x: x[0] * 100 + len(x[1].current_top_snippets)) # sort first on the highest score, break ties with length of current_top_snippets
1030    for score, rcm in all_scores_and_rcms:
1031        logger.info(f"Rollout score: {score}, Rollout files: {[snippet.file_path for snippet in rcm.current_top_snippets]}")
1032    logger.info(f"Best score: {best_score}, Best files: {[snippet.file_path for snippet in best_rcm.current_top_snippets]}")
1033    return best_rcm
1034
1035if __name__ == "__main__":
1036    try:
1037        from sweepai.utils.github_utils import get_installation_id
1038        from sweepai.utils.ticket_utils import prep_snippets
1039
1040        organization_name = "sweepai"
1041        installation_id = get_installation_id(organization_name)
1042        cloned_repo = ClonedRepo("sweepai/sweep", installation_id, "main")
1043        query = "allow 'sweep.yaml' to be read from the user/organization's .github repository. this is found in client.py and we need to change this to optionally read from .github/sweep.yaml if it exists there"
1044        # golden response is
1045        # sweepai/handlers/create_pr.py:401-428
1046        # sweepai/config/client.py:178-282
1047        ticket_progress = TicketProgress(
1048            tracking_id="test",
1049        )
1050        repo_context_manager = prep_snippets(cloned_repo, query, ticket_progress)
1051        rcm = get_relevant_context(
1052            query,
1053            repo_context_manager,
1054            ticket_progress,
1055            chat_logger=ChatLogger({"username": "wwzeng1"}),
1056        )
1057        for snippet in rcm.current_top_snippets:
1058            print(snippet.denotation)
1059    except Exception as e:
1060        logger.error(f"context_pruning.py failed to run successfully with error: {e}")
1061        raise e

platform/README.md:50-65

50```sh
51pnpm start
52```
53
54## Using Sweep Unit Test Tool
55
561. Insert the path to your local repositorrey.
57   - You can run `pwd` to use your current working directory.
58   - (Optional) Edit the branch name to checkout into a new branch for Sweep to work in (defaults to current branch).
592. Select an existing file for Sweep to add unit tests to.
603. Add meticulous instructions for the unit tests to add, such as the additional edge cases you would like covered.
614. Modify the "Test Script" to write your script for running unit tests, such as `python $FILE_PATH`. You may use the variable $FILE_PATH to refer to the current path. Click the "Run Tests" button to test the script.
62   - Hint: use the $FILE_PATH parameter to only run the unit tests in the current file to reduce noise from the unit tests from other files.
635. Click "Generate Code" to get Sweep to generate additional unit tests.
646. Then click "Refresh" or the check mark to restart or approve the change.
65

sweepai/handlers/create_pr.py:357-456

357def add_config_to_top_repos(installation_id, username, repositories, max_repos=3):
358    user_token, g = get_github_client(installation_id)
359
360    repo_activity = {}
361    for repo_entity in repositories:
362        repo = g.get_repo(repo_entity.full_name)
363        # instead of using total count, use the date of the latest commit
364        commits = repo.get_commits(
365            author=username,
366            since=datetime.datetime.now() - datetime.timedelta(days=30),
367        )
368        # get latest commit date
369        commit_date = datetime.datetime.now() - datetime.timedelta(days=30)
370        for commit in commits:
371            if commit.commit.author.date > commit_date:
372                commit_date = commit.commit.author.date
373
374        # since_date = datetime.datetime.now() - datetime.timedelta(days=30)
375        # commits = repo.get_commits(since=since_date, author="lukejagg")
376        repo_activity[repo] = commit_date
377        # print(repo, commits.totalCount)
378        logger.print(repo, commit_date)
379
380    sorted_repos = sorted(repo_activity, key=repo_activity.get, reverse=True)
381    sorted_repos = sorted_repos[:max_repos]
382
383    # For each repo, create a branch based on main branch, then create PR to main branch
384    for repo in sorted_repos:
385        try:
386            logger.print("Creating config for", repo.full_name)
387            create_config_pr(
388                None,
389                repo=repo,
390                cloned_repo=ClonedRepo(
391                    repo_full_name=repo.full_name,
392                    installation_id=installation_id,
393                    token=user_token,
394                ),
395            )
396        except SystemExit:
397            raise SystemExit
398        except Exception as e:
399            logger.print(e)
400    logger.print("Finished creating configs for top repos")
401
402
403def create_gha_pr(g, repo):
404    # Create a new branch
405    branch_name = "sweep/gha-enable"
406    repo.create_git_ref(
407        ref=f"refs/heads/{branch_name}",
408        sha=repo.get_branch(repo.default_branch).commit.sha,
409    )
410
411    # Update the sweep.yaml file in this branch to add "gha_enabled: True"
412    sweep_yaml_content = (
413        repo.get_contents("sweep.yaml", ref=branch_name).decoded_content.decode()
414        + "\ngha_enabled: True"
415    )
416    repo.update_file(
417        "sweep.yaml",
418        "Enable GitHub Actions",
419        sweep_yaml_content,
420        repo.get_contents("sweep.yaml", ref=branch_name).sha,
421        branch=branch_name,
422    )
423
424    # Create a PR from this branch to the main branch
425    pr = repo.create_pull(
426        title="Enable GitHub Actions",
427        body="This PR enables GitHub Actions for this repository.",
428        head=branch_name,
429        base=repo.default_branch,
430    )
431    return pr
432
433
434SWEEP_TEMPLATE = """\
435name: Sweep Issue
436title: 'Sweep: '
437description: For small bugs, features, refactors, and tests to be handled by Sweep, an AI-powered junior developer.
438labels: sweep
439body:
440  - type: textarea
441    id: description
442    attributes:
443      label: Details
444      description: Tell Sweep where and what to edit and provide enough context for a new developer to the codebase
445      placeholder: |
446        Unit Tests: Write unit tests for <FILE>. Test each function in the file. Make sure to test edge cases.
447        Bugs: The bug might be in <FILE>. Here are the logs: ...
448        Features: the new endpoint should use the ... class from <FILE> because it contains ... logic.
449        Refactors: We are migrating this function to ... version because ...
450  - type: input
451    id: branch
452    attributes:
453      label: Branch
454      description: The branch to work off of (optional)
455      placeholder: |
456        main"""

sweepai/core/reflection_utils.py:0-176

1
2import re
3
4from loguru import logger
5
6from sweepai.core.chat import ChatGPT
7from sweepai.core.entities import Message
8
9response_format = """Respond using the following structured format:
10
11<judgement_on_task>
12Provide extensive, highly detailed criteria for evaluating the contractor's performance, such as:
13- Did they identify every single relevant file needed to solve the issue, including all transitive dependencies?
14- Did they use multiple code/function/class searches to exhaustively trace every usage and dependency of relevant classes/functions?
15- Did they justify why each file is relevant and needed to solve the issue?
16- Did they demonstrate a complete, comprehensive understanding of the entire relevant codebase and architecture?
17
18Go through the contractor's process step-by-step. For anything they did even slightly wrong or non-optimally, call it out and explain the correct approach. Be extremely harsh and scrutinizing. If they failed to use enough code/function/class searches to find 100% of relevant usages or if they missed any files that are needed, point these out as critical mistakes. Do not give them the benefit of the doubt on anything.
19</judgement_on_task>
20
21<overall_score>
22Evaluate the contractor from 1-10, erring on the low side:
231 - Completely failed to identify relevant files, trace dependencies, or understand the issue
242 - Identified a couple files from the issue description but missed many critical dependencies 
253 - Found some relevant files but had major gaps in dependency tracing and codebase understanding
264 - Identified several key files but still missed important usages and lacked justification
275 - Found many relevant files but missed a few critical dependencies
286 - Identified most key files and dependencies but still had some gaps in usage tracing
297 - Found nearly all relevant files but missed a couple edge case usages or minor dependencies
308 - Exhaustively traced nearly all dependencies with robust justification, only minor omissions
319 - Perfectly identified every single relevant file and usage with airtight justification 
3210 - Flawless, absolutely exhaustive dependency tracing and codebase understanding
33</overall_score>
34
35<message_to_contractor>
36Provide a single sentence of extremely specific, targeted, and actionable critical feedback, addressed directly to the contractor.
379-10: Flawless work exhaustively using code/function/class searches to identify 100% of necessary files and usages!
385-8: You failed to search for [X, Y, Z] to find all usages of [class/function]. You need to understand [A, B, C] dependencies.
391-4: You need to search for [X, Y, Z] classes/functions to find actually relevant files. You missed [A, B, C] critical dependencies completely.
40</message_to_contractor>
41
42Do not give any positive feedback unless the contractor literally achieved perfection. Be extremely harsh and critical in your evaluation. Assume incompetence until proven otherwise. Make the contractor work hard to get a high score."""
43
44state_eval_prompt = """You are helping contractors on a task that involves finding all of the relevant files needed to resolve a github issue. You are an expert at this task and have solved it hundreds of times. This task does not involve writing or modifying code. The contractors' goal is to identify all necessary files, not actually implement the solution. The contractor should not be coding at all. 
45
46Your job is to review the contractor's work with an extremely critical eye. Leave no stone unturned in your evaluation. Read through every single step the contractor took and analyze it in depth.
47
48""" + response_format + \
49"""
50Here are some examples of how you should evaluate the contractor's work:
51
52<examples>
53Example 1 (Score: 9):
54<judgement_on_task>
55The contractor did an outstanding job identifying all of the relevant files needed to resolve the payment processing issue. They correctly identified the core Payment.java model where the payment data is defined, and used extensive code searches for "Payment", "pay", "process", "transaction", etc. to exhaustively trace every single usage and dependency.
56
57They found the PaymentController.java and PaymentService.java files where Payment objects are created and processed, and justified how these are critical for the payment flow. They also identified the PaymentRepository.java DAO that interacts with the payments database.
58
59The contractor demonstrated a deep understanding of the payment processing architecture by tracing the dependencies of the PaymentService on external payment gateways like StripeGateway.java and PayPalGateway.java. They even found the PaymentNotificationListener.java that handles webhook events from these gateways.
60
61To round out their analysis, the contractor identified the PaymentValidator.java and PaymentSecurityFilter.java as crucial parts of the payment processing pipeline for validation and security. They justified the relevance of each file with clear explanations tied to the reported payment bug.
62
63No relevant files seem to have been missed. The contractor used a comprehensive set of searches for relevant classes, functions, and terms to systematically map out the entire payment processing codebase. Overall, this shows an excellent understanding of the payment architecture and all its nuances.
64</judgement_on_task>
65<overall_score>9</overall_score>
66<message_to_contractor>
67Excellent work identifying Payment.java, PaymentController.java, PaymentService.java, and all critical dependencies.
68</message_to_contractor>
69
70Example 2 (Score: 4): 
71<judgement_on_task>
72The contractor identified the UserAccount.java file where the login bug is occurring, but failed to use nearly enough code/function/class searches to find many other critical files. While they noted that LoginController.java calls UserAccount.authenticateUser(), they didn't search for the "authenticateUser" function to identify LoginService.java which orchestrates the login flow.  
73
74They completely missed using searches for the "UserAccount" class, "credentials", "principal", "login", etc. to find the UserRepository.java file that loads user data from the database and many other files involved in authentication. Searching for "hash", "encrypt", "password", etc. should have revealed the critical PasswordEncryptor.java that handles password hashing.
75
76The contractor claimed UserForgotPasswordController.java and UserCreateController.java are relevant, but failed to justify this at all. These files are not directly related to the login bug.
77
78In general, the contractor seemed to stumble upon a couple relevant files, but failed to systematically trace the login code path and its dependencies. They showed a superficial and incomplete understanding of the login architecture and process. Many critical files were completely missed and the scope was not properly focused on login.
79</judgement_on_task>
80<overall_score>4</overall_score>  
81<message_to_contractor>
82Failed to search for "authenticateUser", "UserAccount", "login", "credentials". Missed LoginService.java, UserRepository.java, PasswordEncryptor.java.
83</message_to_contractor>
84
85Example 3 (Score: 2):
86<judgement_on_task>
87The files identified by the contractor, like index.html, styles.css, and ProductList.vue, are completely irrelevant for resolving the API issue with product pricing. The front-end product list display code does not interact with the pricing calculation logic whatsoever.
88
89The contractor completely failed to focus their investigation on the backend api/products/ directory where the pricing bug actually occurs. They did not perform any searches for relevant classes/functions like "Product", "Price", "Discount", etc. to find the ProductController.java API endpoint and the PriceCalculator.java service it depends on.
90
91Basic searches for the "Product" class should have revealed the Product.java model and ProductRepository.java database access code as highly relevant, but these were missed. The contractor failed to demonstrate any understanding of the API architecture and the flow of pricing data from the database to the API response.
92
93The contractor also did not look for any configuration files that provide pricing data, which would be critical for the pricing calculation. They did not search for "price", "cost", etc. in JSON or properties files.
94
95Overall, the contractor seemed to have no clue about the actual pricing bug or the backend API codebase. They looked in completely the wrong places, failed to perform any relevant code/function/class searches, and did not identify a single relevant file for the reported bug. This shows a fundamental lack of understanding of the pricing feature and backend architecture.
96</judgement_on_task>
97<overall_score>2</overall_score>
98<message_to_contractor>
99index.html, styles.css, ProductList.vue are irrelevant. Search api/products/ for "Product", "Price", "Discount" classes/functions.
100</message_to_contractor>
101
102Example 4 (Score: 7):
103<judgement_on_task>
104The contractor identified most of the key files involved in the user profile update process, including UserProfileController.java, UserProfileService.java, and UserProfile.java. They correctly traced the flow of data from the API endpoint to the service layer and model.
105
106However, they missed a few critical dependencies. They did not search for "UserProfile" to find the UserProfileRepository.java DAO that loads and saves user profiles to the database. This is a significant omission in their understanding of the data persistence layer.
107
108The contractor also failed to look for configuration files related to user profiles. Searching for "profile" in YAML or properties files should have revealed application-profiles.yml which contains important profile settings. 
109
110While the contractor had a decent high-level understanding of the user profile update process, they showed some gaps in their low-level understanding of the data flow and configuration. They needed to be more thorough in tracing code dependencies to uncover the complete set of relevant files.
111</judgement_on_task>
112<overall_score>7</overall_score>
113<message_to_contractor>
114Missed UserProfileRepository.java and application-profiles.yml dependencies. Search for "UserProfile" and "profile" to find remaining relevant files.
115</message_to_contractor>
116</examples>"""
117
118# general framework for a dfs search
119# 1. sample trajectory
120# 2. for each trajectory, run the assistant until it hits an error or end state
121#    - in either case perform self-reflection
122# 3. update the reflections section with the new reflections
123CLAUDE_MODEL = "claude-3-opus-20240229"
124
125class EvaluatorAgent(ChatGPT):
126    def evaluate_run(self, problem_statement: str, run_text: str, stored_files: list[str]):
127        self.model = CLAUDE_MODEL
128        self.messages = [Message(role="system", content=state_eval_prompt)]
129        formatted_problem_statement = f"This is the task for the contractor to research:\n<task_to_research>\n{problem_statement}\n</task_to_research>"
130        contractor_stored_files = "\n".join([file for file in stored_files])
131        stored_files_section = f"""The contractor stored these files:\n<stored_files>\n{contractor_stored_files}\n</stored_files>"""
132        content = formatted_problem_statement + "\n\n" + f"<contractor_attempt>\n{run_text}\n</contractor_attempt>"\
133             + f"\n\n{stored_files_section}\n\n" + response_format
134        evaluate_response = self.chat_anthropic(
135            content=content,
136            stop_sequences=["</message_to_contractor>"],
137            model=CLAUDE_MODEL,
138            message_key="user_request",
139        )
140        evaluate_response += "</message_to_contractor>" # add the stop sequence back in, if it stopped for another reason we've crashed
141        overall_score = None
142        message_to_contractor = None
143        try:
144            overall_score_pattern = r"<overall_score>(.*?)</overall_score>"
145            message_to_contractor_pattern = r"<message_to_contractor>(.*?)</message_to_contractor>"
146
147            overall_score_match = re.search(overall_score_pattern, evaluate_response, re.DOTALL)
148            message_to_contractor_match = re.search(message_to_contractor_pattern, evaluate_response, re.DOTALL)
149
150            if overall_score_match is None or message_to_contractor_match is None:
151                return overall_score, message_to_contractor
152
153            overall_score = overall_score_match.group(1).strip()
154            # check if 1 through 10 are a match
155            if not re.match(r"^[1-9]|10$", overall_score):
156                return None, None
157            else:
158                overall_score_match = re.match(r"^[1-9]|10$", overall_score)
159                overall_score = overall_score_match.group(0).strip()
160            overall_score = int(overall_score)
161            message_to_contractor = message_to_contractor_match.group(1).strip()
162            return overall_score, message_to_contractor
163        except Exception as e:
164            logger.info(f"Error evaluating response: {e}")
165            return overall_score, message_to_contractor
166
167if __name__ == "__main__":
168    try:
169        pass
170    except Exception as e:
171        import sys
172        info = sys.exc_info()
173        import pdb
174        # pylint: disable=no-member
175        pdb.post_mortem(info[2])
176        raise e

sweepai/core/prompts.py:629-1084

629modify_file_hallucination_prompt = [
630    {
631        "content": """File Name: (non-existent example)
632<old_file>
633example = True
634if example:
635    x = 1 # comment
636    print("hello")
637    x = 2
638
639class Example:
640    foo: int = 1
641
642    def func():
643        a = 3
644
645</old_file>
646
647---
648
649Code Planning:
650Step-by-step thoughts with explanations:
651* Thought 1
652* Thought 2
653...
654
655Commit message: "feat/fix: the commit message"
656
657Detailed plan of modifications:
658* Modification 1
659* Modification 2
660...
661
662Code Generation:
663
664```
665Generate a diff based on the given plan using the search and replace pairs in the format below.
666* Always prefer the least amount of changes possible, but ensure the solution is complete
667* Prefer multiple small changes over a single large change.
668* NEVER write ellipses anywhere in the diffs. Simply write two diff hunks: one for the beginning and another for the end.
669* Always add lines before and after. The ORIGINAL section should be at least 5 lines long.
670
671The format is as follows:
672
673<<<< ORIGINAL
674line_before
675old_code
676line_after
677====
678line_before
679new_code
680line_after
681>>>> UPDATED
682```
683
684Commit message: "the commit message"
685
686Request: "Change hello to goodbye and change 3 to 4". Limit your changes to the request.
687
688Instructions:
6891. Complete the Code Planning step
6902. Complete the Code Generation step""",
691        "role": "user",
692        "key": "modify_file_hallucination",
693    },
694    {
695        "content": """Code Planning:
696Step-by-step thoughts with explanations:
697* We need to print "goodbye" instead of "hello".
698* We need to update the value of the variable a from 3 to 4.
699
700Detailed plan of modifications:
701* Change the output of the print statement from "hello" to "goodbye" as an example modification.
702* I will update the value of a from 3 to 4.
703
704Code Generation:
705```
706<<<< ORIGINAL
707example = True
708if example:
709    x = 1 # comment
710    print("hello")
711    x = 2
712====
713example = True
714if example:
715    x = 1 # comment
716    print("goodbye")
717    x = 2
718>>>> UPDATED
719
720<<<< ORIGINAL
721class Example:
722    foo: int = 1
723
724    def func():
725        a = 3
726====
727class Example:
728    foo: int = 1
729
730    def func():
731        a = 4
732>>>> UPDATED
733```
734
735Commit message: "Changed goodbye to hello and 3 to 4"\
736""",
737        "role": "assistant",
738        "key": "modify_file_hallucination",
739    },
740]
741
742# TODO: IMPORTANT: THIS DEPENDS ON THE ABOVE PROMPT, modify_file_hallucination_prompt
743modify_file_prompt_3 = """\
744File Name: {filename}
745<old_file>
746{code}
747</old_file>
748
749---
750
751User's request:
752{instructions}
753
754Limit your changes to the request.
755
756Instructions:
757Complete the Code Planning step and Code Modification step.
758Remember to NOT write ellipses, code things out in full, and use multiple small hunks.\
759"""
760
761modify_recreate_file_prompt_3 = """\
762File Name: {filename}
763<old_file>
764{code}
765</old_file>
766
767---
768
769User's request:
770{instructions}
771
772Limit your changes to the request.
773
774Format:
775```
776<new_file>
777{{new file content}}
778</new_file>
779```
780
781Instructions:
7821. Complete the Code Planning step
7832. Complete the Code Modification step, remembering to NOT write ellipses, write complete functions, and use multiple small hunks where possible."""
784
785modify_file_system_message = """\
786You are a brilliant and meticulous engineer assigned to write code for the file to address a Github issue. When you write code, the code works on the first try and is syntactically perfect and complete. You have the utmost care for your code, so you do not make mistakes and every function and class will be fully implemented. Take into account the current repository's language, frameworks, and dependencies. You always follow up each code planning session with a code modification.
787
788When you modify code:
789* Always prefer the least amount of changes possible, but ensure the solution is complete.
790* Prefer multiple small changes over a single large change.
791* Do not edit the same parts multiple times.
792* Make sure to add additional lines before and after the original and updated code to disambiguate code when replacing repetitive sections.
793* NEVER write ellipses anywhere in the diffs. Simply write two diff hunks: one for the beginning and another for the end.
794
795Respond in the following format. Both the Code Planning and Code Modification steps are required.
796
797### Format ###
798
799## Code Planning:
800
801Thoughts and detailed plan:
8021.
8032.
8043.
805...
806
807Commit message: "feat/fix: the commit message"
808
809## Code Modification:
810
811Generated diff hunks based on the given plan using the search and replace pairs in the format below.
812```
813The first hunk's description
814
815<<<< ORIGINAL
816{exact copy of lines you would like to change}
817====
818{updated lines}
819>>>> UPDATED
820
821The second hunk's description
822
823<<<< ORIGINAL
824second line before
825first line before
826old code
827first line after
828second line after
829====
830second line before
831first line before
832new code
833first line after
834second line after
835>>>> UPDATED
836```"""
837
838RECREATE_LINE_LENGTH = -1
839
840modify_file_prompt_4 = """\
841File Name: {filename}
842
843<file>
844{code}
845</file>
846
847---
848
849Modify the file by responding in the following format:
850
851Code Planning:
852
853Step-by-step thoughts with explanations:
854* Thought 1
855* Thought 2
856...
857
858Detailed plan of modifications:
859* Replace x with y
860* Add a foo method to bar
861...
862
863Code Modification:
864
865```
866Generate a diff based on the given instructions using the search and replace pairs in the following format:
867
868<<<< ORIGINAL
869second line before
870first line before
871old code
872first line after
873second line after
874====
875second line before
876first line before
877new code
878first line after
879second line after
880>>>> UPDATED
881```
882
883Commit message: "the commit message"
884
885The user's request is:
886{instructions}
887
888Instructions:
8891. Complete the Code Planning step
8902. Complete the Code Modification step
891"""
892
893rewrite_file_system_prompt = "You are a brilliant and meticulous engineer assigned to write code for the file to address a Github issue. When you write code, the code works on the first try and is syntactically perfect and complete. You have the utmost care for your code, so you do not make mistakes and every function and class will be fully implemented. Take into account the current repository's language, frameworks, and dependencies."
894
895rewrite_file_prompt = """\
896File Name: {filename}
897<old_file>
898{code}
899</old_file>
900
901---
902
903User's request:
904{instructions}
905
906Limit your changes to the request.
907
908Rewrite the following section from the old_file to handle this request.
909
910<section>
911
912{section}
913
914</section>
915
916Think step-by-step on what to modify, then wrap the final answer in the brackets <section></section> XML tags. Only rewrite the section and do not close hanging parentheses and tags.\
917"""
918
919sandbox_code_repair_modify_prompt_2 = """
920File Name: {filename}
921
922<file>
923{code}
924</file>
925
926---
927
928Above is the code that was written by an inexperienced programmer, and contain errors such as syntax errors, linting erors and type-checking errors. The CI pipeline returned the following logs:
929
930stdout:
931```
932{stdout}
933```
934
935stderr
936```
937{stderr}
938```
939
940Respond in the following format:
941
942Code Planning
943
944Determine the following in code planning:
9451. Are there any syntax errors? Look through the file to find all syntax errors.
9462. Are there basic linting errors, like undefined variables, undefined members or type errors?
9473. Are there incorrect imports and exports?
9484. Are there any other errors not listed above?
949
950Determine whether changes are necessary based on the errors (ignore warnings).
951
952Code Modification:
953
954Generate a diff based on the given plan using the search and replace pairs in the format below.
955* Always prefer the least amount of changes possible, but ensure the solution is complete
956* Prefer multiple small changes over a single large change.
957* NEVER write ellipses anywhere in the diffs. Simply write two diff hunks: one for the beginning and another for the end.
958* DO NOT modify the same section multiple times.
959* Always add lines before and after. The ORIGINAL section should be at least 5 lines long.
960* Restrict the changes to fixing the errors from the logs.
961
962The format is as follows:
963
964```
965<<<< ORIGINAL
966second line before
967first line before
968old code of first hunk
969first line after
970second line after
971====
972second line before
973first line before
974new code of first hunk
975first line after
976second line after
977>>>> UPDATED
978
979<<<< ORIGINAL
980second line before
981first line before
982old code of second hunk
983first line after
984second line after
985====
986second line before
987first line before
988new code of second hunk
989first line after
990second line after
991>>>> UPDATED
992```
993
994Commit message: "the commit message"
995
996Instructions:
9971. Complete the Code Planning step
9982. Complete the Code Modification step
999"""
1000
1001pr_code_prompt = ""  # TODO: deprecate this
1002
1003pull_request_prompt = """Now, create a PR for your changes. Be concise but cover all of the changes that were made.
1004For the pr_content, add two sections, description and summary.
1005Use GitHub markdown in the following format:
1006
1007pr_title = "..."
1008branch = "..."
1009pr_content = \"\"\"
1010...
1011...
1012\"\"\""""
1013
1014summarize_system_prompt = """
1015You are an engineer assigned to helping summarize code instructions and code changes.
1016"""
1017
1018user_file_change_summarize_prompt = """
1019Summarize the given instructions for making changes in a pull request.
1020Code Instructions:
1021{message_content}
1022"""
1023
1024assistant_file_change_summarize_prompt = """
1025Please summarize the following file using the file stubs.
1026Be sure to repeat each method signature and docstring. You may also add additional comments to the docstring.
1027Do not repeat the code in the file stubs.
1028Code Changes:
1029{message_content}
1030"""
1031
1032code_repair_check_system_prompt = """\
1033You are a genius trained for validating code.
1034You will be given two pieces of code marked by xml tags. The code inside <diff></diff> is the changes applied to create user_code, and the code inside <user_code></user_code> is the final product.
1035Our goal is to validate if the final code is valid. This means there are no undefined variables, no syntax errors, has no unimplemented functions (e.g. pass's, comments saying "rest of code") and the code runs.
1036"""
1037
1038code_repair_check_prompt = """\
1039This is the diff that was applied to create user_code. Only make changes to code in user_code if the code was affected by the diff.
1040
1041This is the user_code.
1042<user_code>
1043{user_code}
1044</user_code>
1045
1046Reply in the following format:
1047
1048Step-by-step thoughts with explanations:
10491. No syntax errors: True/False
10502. No undefined variables: True/False
10513. No unimplemented functions: True/False
10524. Code runs: True/False
1053
1054<valid>True</valid> or <valid>False</valid>
1055"""
1056
1057code_repair_system_prompt = """\
1058You are a genius trained for code stitching.
1059You will be given two pieces of code marked by xml tags. The code inside <diff></diff> is the changes applied to create user_code, and the code inside <user_code></user_code> is the final product. The intention was to implement a change described as {feature}.
1060Our goal is to return a working version of user_code that follows {feature}. We should follow the instructions and make as few edits as possible.
1061"""
1062
1063code_repair_prompt = """\
1064This is the diff that was applied to create user_code. Only make changes to code in user_code if the code was affected by the diff.
1065
1066This is the user_code.
1067<user_code>
1068{user_code}
1069</user_code>
1070
1071Instructions:
1072* Do not modify comments, docstrings, or whitespace.
1073
1074The only operations you may perform are:
10751. Indenting or dedenting code in user_code. This code MUST be code that was modified by the diff.
10762. Adding or deduplicating code in user_code. This code MUST be code that was modified by the diff.
1077
1078Return the working user_code without xml tags. All of the text you return will be placed in the file.
1079"""
1080
1081doc_query_rewriter_system_prompt = """\
1082You must rewrite the user's github issue to leverage the docs. In this case we want to look at {package}. It's used for: {description}. Using the github issue, write a search query that searches for the potential answer using the documentation. This query will be sent to a documentation search engine with vector and lexical based indexing. Make this query contain keywords relevant to the {package} documentation.
1083"""
1084

sweepai/agents/assistant_function_modify.py:0-308

1import os
2import json
3import subprocess
4import traceback
5from collections import defaultdict
6
7from loguru import logger
8
9from sweepai.agents.assistant_wrapper import openai_assistant_call, tool_call_parameters
10from sweepai.agents.agent_utils import ensure_additional_messages_length
11from sweepai.config.client import SweepConfig
12from sweepai.core.entities import AssistantRaisedException, FileChangeRequest, Message
13from sweepai.logn.cache import file_cache
14from sweepai.utils.chat_logger import ChatLogger, discord_log_error
15from sweepai.utils.diff import generate_diff
16from sweepai.utils.file_utils import read_file_with_fallback_encodings
17from sweepai.utils.github_utils import ClonedRepo, update_file
18from sweepai.utils.progress import AssistantConversation, TicketProgress
19from sweepai.utils.str_utils import get_all_indices_of_substring
20from sweepai.utils.utils import CheckResults, get_check_results
21from sweepai.utils.modify_utils import post_process_rg_output, manual_code_check
22
23# Pre-amble using ideas from https://github.com/paul-gauthier/aider/blob/main/aider/coders/udiff_prompts.py
24# Doesn't regress on the benchmark but improves average code generated and avoids empty comments.
25
26# Add COT to each tool
27
28instructions = """You are an expert software developer tasked with editing code to fulfill the user's request. Your goal is to make the necessary changes to the codebase while following best practices and respecting existing conventions. 
29
30To complete the task, follow these steps:
31
321. Carefully analyze the user's request to identify the key requirements and changes needed. Break down the problem into smaller sub-tasks.
33
342. Search the codebase for relevant files, functions, classes, and variables related to the task at hand. Use the search results to determine where changes need to be made. 
35
363. For each relevant file, identify the minimal code changes required to implement the desired functionality. Consider edge cases, error handling, and necessary imports.
37
384. If new functionality is required that doesn't fit into existing files, create a new file with an appropriate name and location.
39
405. Make the code changes in a targeted way:
41   - Preserve existing whitespace, comments and code style
42   - Make surgical edits to only the required lines of code
43   - If a change is complex, break it into smaller incremental changes
44   - Ensure each change is complete and functional before moving on
45
466. When providing code snippets, be extremely precise with indentation:
47   - Count the exact number of spaces used for indentation
48   - If tabs are used, specify that explicitly 
49   - Ensure the indentation of the code snippet matches the original file exactly
507. After making all the changes, review the modified code to verify it fully satisfies the original request.
518. Once you are confident the task is complete, submit the final solution.
52
53In this environment, you have access to the following tools to assist in fulfilling the user request:
54
55You MUST call them like this:
56<function_calls>
57<invoke>
58<tool_name>$TOOL_NAME</tool_name>
59<parameters>
60<$PARAMETER_NAME>$PARAMETER_VALUE</$PARAMETER_NAME>
61...
62</parameters>
63</invoke>
64</function_calls>
65
66Here are the tools available:
67<tools>
68<tool_description>
69<tool_name>analyze_problem_and_propose_plan</tool_name>
70<description>
71Carefully analyze the user's request to identify the key requirements, changes needed, and any constraints or considerations. Break down the problem into sub-tasks.
72</description>
73<parameters>
74<parameter>
75<name>problem_analysis</name>
76<type>str</type>
77<description>
78Provide a thorough analysis of the user's request, identifying key details, requirements, intended behavior changes, and any other relevant information. Organize and prioritize the sub-tasks needed to fully address the request.
79</description>
80</parameter>
81<parameter>
82<name>proposed_plan</name>
83<type>str</type>
84<description>
85Describe the plan to solve the problem, including the keywords to search, modifications to make, and all required imports to complete the task.
86</description>
87</parameter>
88</parameters>
89</tool_description>
90
91<tool_description>
92<tool_name>search_codebase</tool_name>
93<description>
94Search the codebase for files, functions, classes, or variables relevant to a task. Searches can be scoped to a single file or across the entire codebase.
95</description>
96<parameters>
97<parameter>
98<name>justification</name>
99<type>str</type>
100<description>
101Explain why searching for this query is relevant to the task and how the results will inform the code changes.
102</description>
103</parameter>
104<parameter>
105<name>file_name</name>
106<type>str</type>
107<description>
108(Optional) The name of a specific file to search within. If not provided, the entire codebase will be searched.
109</description>
110</parameter>
111<parameter>
112<name>keyword</name>
113<type>str</type>
114<description>
115The search query, such as a function name, class name, or variable. Provide only one query term per search.
116</description>
117</parameter>
118</parameters>
119</tool_description>
120
121<tool_description>
122<tool_name>analyze_and_identify_changes</tool_name>
123<description>
124Determine the minimal code changes required in a file to implement a piece of the functionality. Consider edge cases, error handling, and necessary imports.
125</description>
126<parameters>
127<parameter>
128<name>file_name</name>
129<type>str</type>
130<description>
131The name of the file where changes need to be made.
132</description>
133</parameter>
134<name>changes</name>
135<type>str</type>
136<description>
137Describe the changes to make in the file. Specify the location of each change and provide the code modifications. Include any required imports or updates to existing code.
138</description>
139</parameter>
140</parameters>
141</tool_description>
142
143<tool_description>
144<tool_name>view_file</tool_name>
145<description>
146View the contents of a file from the codebase. Useful for viewing code in context before making changes.
147</description>
148<parameters>
149<parameter>
150<name>justification</name>
151<type>str</type>
152<description>
153Explain why viewing this file is necessary to complete the task or better understand the existing code.
154</description>
155</parameter>
156<parameter>
157<name>file_name</name>
158<type>str</type>
159<description>
160The name of the file to retrieve, including the extension. File names are case-sensitive.
161</description>
162</parameter>
163</parameters>
164</tool_description>
165
166<tool_description>
167<tool_name>make_change</tool_name>
168<description>
169Make a SINGLE, TARGETED code change in a file. Preserve whitespace, comments and style. Changes should be minimal, self-contained and only address one specific modification. If a change requires modifying multiple separate code sections, use multiple calls to this tool, one for each independent change.
170</description>
171<parameters>
172<parameter>
173<name>justification</name>
174<type>str</type>
175<description>
176Explain how this SINGLE change contributes to fulfilling the user's request.
177</description>
178</parameter>
179<parameter>
180<name>file_name</name>
181<type>str</type>
182<description>
183Name of the file to make the change in. Ensure correct spelling as this is case-sensitive.
184</description>
185</parameter>
186<parameter>
187<name>original_code</name>
188<type>str</type>
189<description>
190The existing lines of code that need to be modified or replaced. This should be a SINGLE, CONTINUOUS block of code, not multiple separate sections. Include unchanged surrounding lines for context.
191</description>
192</parameter>
193<parameter>
194<name>new_code</name>
195<type>str</type>
196<description>
197The new lines of code to replace the original code, implementing the SINGLE desired change. If the change is complex, break it into smaller targeted changes and use separate make_change calls for each.
198</description>
199</parameter>
200</parameters>
201</tool_description>
202
203<tool_description>
204<tool_name>create_file</tool_name>
205<description>
206Create a new code file in the specified location with the given file name and extension. This is useful when the task requires adding entirely new functionality or classes to the codebase.
207</description>
208<parameters>
209<parameter>
210<name>file_path</name>
211<type>str</type>
212<description>
213The path where the new file should be created, relative to the root of the codebase. Do not include the file name itself.
214</description>
215</parameter>
216<parameter>
217<name>file_name</name>
218<type>str</type>
219<description>
220The name to give the new file, including the extension. Ensure the name is clear, descriptive, and follows existing naming conventions.
221</description>
222</parameter>
223<parameter>
224<parameter>
225<name>contents</name>
226<type>str</type>
227<description>
228The contents of this new file.
229</description>
230</parameter>
231<parameter>
232<name>justification</name>
233<type>str</type>
234<description>
235Explain why creating this new file is necessary to complete the task and how it fits into the existing codebase structure.
236</description>
237</parameter>
238</parameters>
239</tool_description>
240
241<tool_description>
242<tool_name>submit_result</tool_name>
243<description>
244Indicate that the task is complete and all requirements have been satisfied. Provide the final code changes or solution.
245</description>
246<parameters>
247<parameter>
248<name>justification</name>
249<type>str</type>
250<description>
251Summarize the code changes made and how they fulfill the user's original request. Provide the complete, modified code if applicable.
252</description>
253</parameter>
254</parameters>
255</tool_description>
256"""
257
258# NO_TOOL_CALL_PROMPT = """ERROR
259# No tool calls were made. If you are done, please use the submit_result tool to indicate that you have completed the task. If you believe you are stuck, use the search_codebase tool to further explore the codebase or get additional context if necessary.
260
261NO_TOOL_CALL_PROMPT = """FAILURE
262No function calls were made or your last function call was incorrectly formatted. The correct syntax for function calling is this:
263
264<function_calls>
265<invoke>
266<tool_name>tool_name</tool_name>
267<parameters>
268<param_name>param_value</param_name>
269</parameters>
270</invoke>
271</function_calls>
272
273Here is an example:
274
275<function_calls>
276<invoke>
277<tool_name>analyze_problem_and_propose_plan</tool_name>
278<parameters>
279<problem_analysis>The problem analysis goes here</problem_analysis>
280<proposed_plan>The proposed plan goes here</proposed_plan>
281</parameters>
282</invoke>
283</function_calls>
284
285If you are really done, call the submit function.
286"""
287
288unformatted_tool_call_response = "<function_results>\n<result>\n<tool_name>{tool_name}<tool_name>\n<stdout>\n{tool_call_response_contents}\n</stdout>\n</result>\n</function_results>"
289
290
291def int_to_excel_col(n):
292    result = ""
293    if n == 0:
294        result = "A"
295    while n > 0:
296        n, remainder = divmod(n - 1, 26)
297        result = chr(65 + remainder) + result
298    return result
299
300
301def excel_col_to_int(s):
302    result = 0
303    for char in s:
304        result = result * 26 + (ord(char) - 64)
305    return result - 1
306
307TOOLS_MAX_CHARS = 20000
308

sweepai/utils/openai_listwise_reranker.py:381-484

381reranking_prompt = f"""You are a powerful code search engine. You must order the list of code snippets from the most relevant to the least relevant to the user's query. You must order ALL TEN snippets.
382First, for each code snippet, provide a brief explanation of what the code does and how it relates to the user's query.
383
384Then, rank the snippets based on relevance. The most relevant files are the ones we need to edit to resolve the user's issue. The next most relevant snippets are dependencies - code that is crucial to read and understand while editing the other files to correctly resolve the user's issue.
385
386Note: For each code snippet, provide an explanation of what the code does and how it fits into the overall system, even if it's not directly relevant to the user's query. The ranking should be based on relevance to the query, but all snippets should be explained.
387
388The response format is:
389<explanations>
390file_path:start_line-end_line
391Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
392file_path:start_line-end_line
393Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
394file_path:start_line-end_line
395Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
396file_path:start_line-end_line
397Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
398file_path:start_line-end_line
399Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
400file_path:start_line-end_line
401Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
402file_path:start_line-end_line
403Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
404file_path:start_line-end_line
405Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
406file_path:start_line-end_line
407Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
408file_path:start_line-end_line
409Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
410</explanations>
411
412<ranking>
413first_most_relevant_snippet
414second_most_relevant_snippet
415third_most_relevant_snippet
416fourth_most_relevant_snippet
417fifth_most_relevant_snippet
418sixth_most_relevant_snippet
419seventh_most_relevant_snippet
420eighth_most_relevant_snippet
421ninth_most_relevant_snippet
422tenth_most_relevant_snippet
423</ranking>
424
425Here is an example:
426
427{example_prompt}
428
429This example is for reference. Please provide explanations and rankings for the code snippets based on the user's query."""
430
431user_query_prompt = """This is the user's query:
432<user_query>
433{user_query}
434</user_query>
435
436This is the list of ten code snippets that you must order by relevance:
437<code_snippets>
438{formatted_code_snippets}
439</code_snippets>
440
441Remember: The response format is:  
442<explanations>
443file_path:start_line-end_line
444Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
445file_path:start_line-end_line
446Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
447file_path:start_line-end_line
448Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
449file_path:start_line-end_line
450Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
451file_path:start_line-end_line
452Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
453file_path:start_line-end_line
454Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
455file_path:start_line-end_line
456Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
457file_path:start_line-end_line
458Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
459file_path:start_line-end_line
460Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
461file_path:start_line-end_line
462Explanation of what the code does, regardless of its relevance to the user's query. Provide context on how it fits into the overall system.
463</explanations>
464
465<ranking>
466first_most_relevant_snippet
467second_most_relevant_snippet
468third_most_relevant_snippet
469fourth_most_relevant_snippet
470fifth_most_relevant_snippet
471sixth_most_relevant_snippet
472seventh_most_relevant_snippet
473eighth_most_relevant_snippet
474ninth_most_relevant_snippet
475tenth_most_relevant_snippet
476</ranking>
477
478As a reminder, the user query is:  
479<user_query>
480{user_query}  
481</user_query>
482
483Provide the explanations and ranking below:"""
484

sweepai/utils/progress.py:0-283

1from __future__ import annotations
2
3import time
4from enum import Enum
5from threading import Thread
6
7from openai import OpenAI
8from pydantic import BaseModel, ConfigDict, Field
9
10from sweepai.config.server import MONGODB_URI, OPENAI_API_KEY
11from sweepai.core.entities import FileChangeRequest, Snippet
12from sweepai.global_threads import global_threads
13from sweepai.utils.chat_logger import discord_log_error, global_mongo_client
14
15
16class AssistantAPIMessageRole(Enum):
17    SYSTEM = "system"
18    USER = "user"
19    ASSISTANT = "assistant"
20    CODE_INTERPRETER_INPUT = "code_interpreter_input"
21    CODE_INTERPRETER_OUTPUT = "code_interpreter_output"
22    FUNCTION_CALL_INPUT = "function_call_input"
23    FUNCTION_CALL_OUTPUT = "function_call_output"
24
25
26class AssistantAPIMessage(BaseModel):
27    model_config = ConfigDict(use_enum_values=True, validate_default=True)
28    role: AssistantAPIMessageRole
29    content: str = ""
30
31
32class AssistantStatus(Enum):
33    QUEUED = "queued"
34    IN_PROGRESS = "in_progress"
35    REQUIRES_ACTION = "requires_action"
36    CANCELLING = "cancelling"
37    CANCELLED = "cancelled"
38    FAILED = "failed"
39    COMPLETED = "completed"
40    EXPIRED = "expired"
41
42
43class AssistantConversation(BaseModel):
44    model_config = ConfigDict(use_enum_values=True, validate_default=True)
45    messages: list[AssistantAPIMessage] = []
46    is_active: bool = True
47    status: AssistantStatus = "in_progress"
48    assistant_id: str = ""
49    run_id: str = ""
50    thread_id: str = ""
51
52    @classmethod
53    def from_ids(
54        cls,
55        assistant_id: str,
56        run_id: str,
57        thread_id: str,
58    ) -> AssistantConversation | None:
59        client = OpenAI(api_key=OPENAI_API_KEY)
60        try:
61            assistant = client.beta.assistants.retrieve(
62                assistant_id=assistant_id, timeout=1.5
63            )
64            run = client.beta.threads.runs.retrieve(
65                run_id=run_id, thread_id=thread_id, timeout=1.5
66            )
67        except Exception:
68            return None
69        messages: list[AssistantAPIMessage] = [
70            AssistantAPIMessage(
71                role=AssistantAPIMessageRole.SYSTEM,
72                content=assistant.instructions,
73            )
74        ]
75        return cls(
76            messages=messages,
77            status=run.status,
78            is_active=run.status not in ("succeeded", "failed"),
79            assistant_id=assistant_id,
80            run_id=run_id,
81            thread_id=thread_id,
82        )
83
84    def update_from_ids(
85        self,
86        assistant_id: str,
87        run_id: str,
88        thread_id: str,
89    ) -> AssistantConversation:
90        assistant_conversation = AssistantConversation.from_ids(
91            assistant_id=assistant_id, run_id=run_id, thread_id=thread_id
92        )
93        if not assistant_conversation:
94            return self
95        self.messages = assistant_conversation.messages
96        self.is_active = assistant_conversation.is_active
97        self.status = assistant_conversation.status
98        return self
99
100
101class TicketProgressStatus(Enum):
102    SEARCHING = "searching"
103    PLANNING = "planning"
104    CODING = "coding"
105    COMPLETE = "complete"
106    ERROR = "error"
107
108
109class SearchProgress(BaseModel):
110    model_config = ConfigDict(use_enum_values=True, validate_default=True)
111
112    indexing_progress: int = 0
113    indexing_total: int = 0
114    rephrased_query: str = ""
115    retrieved_snippets: list[Snippet] = []
116    final_snippets: list[Snippet] = []
117    pruning_conversation: AssistantConversation = AssistantConversation()
118    pruning_conversation_counter: int = 0
119    repo_tree: str = ""
120
121
122class PlanningProgress(BaseModel):
123    assistant_conversation: AssistantConversation = AssistantConversation()
124    file_change_requests: list[FileChangeRequest] = []
125
126
127class CodingProgress(BaseModel):
128    file_change_requests: list[FileChangeRequest] = []
129    assistant_conversations: list[AssistantConversation] = []
130
131
132class PaymentContext(BaseModel):
133    use_faster_model: bool = True
134    pro_user: bool = True
135    daily_tickets_used: int = 0
136    monthly_tickets_used: int = 0
137
138
139class TicketContext(BaseModel):
140    title: str = ""
141    description: str = ""
142    repo_full_name: str = ""
143    issue_number: int = 0
144    branch_name: str = ""
145    is_public: bool = True
146    pr_id: int = -1
147    start_time: int = 0
148    done_time: int = 0
149    payment_context: PaymentContext = PaymentContext()
150
151
152class TicketUserStateTypes(Enum):
153    RUNNING = "running"
154    WAITING = "waiting"
155    EDITING = "editing"
156
157
158class TicketUserState(BaseModel):
159    model_config = ConfigDict(use_enum_values=True, validate_default=True)
160    state_type: TicketUserStateTypes = TicketUserStateTypes.RUNNING
161    waiting_deadline: int = 0
162
163
164class TicketProgress(BaseModel):
165    model_config = ConfigDict(use_enum_values=True, validate_default=True)
166    tracking_id: str
167    username: str = ""
168    context: TicketContext = TicketContext()
169    status: TicketProgressStatus = TicketProgressStatus.SEARCHING
170    search_progress: SearchProgress = SearchProgress()
171    planning_progress: PlanningProgress = PlanningProgress()
172    coding_progress: CodingProgress = CodingProgress()
173    prev_dict: dict = Field(default_factory=dict)
174    error_message: str = ""
175    user_state: TicketUserState = TicketUserState()
176
177    @classmethod
178    def load(cls, tracking_id: str) -> TicketProgress:
179        if MONGODB_URI is None:
180            return None
181        db = global_mongo_client["progress"]
182        collection = db["ticket_progress"]
183        doc = collection.find_one({"tracking_id": tracking_id})
184        return cls(**doc)
185
186    def refresh(self):
187        if MONGODB_URI is None:
188            return
189        new_ticket_progress = TicketProgress.load(self.tracking_id)
190        self.__dict__.update(new_ticket_progress.__dict__)
191
192    def _save(self):
193        # Can optimize by only saving the deltas
194        try:
195            if MONGODB_URI is None:
196                return None
197            # cannot encode enum object
198            if isinstance(self.status, Enum):
199                self.status = self.status.value  # Convert enum member to its value
200            if self.model_dump() == self.prev_dict:
201                return
202            current_dict = self.model_dump()
203            del current_dict["prev_dict"]
204            self.prev_dict = current_dict
205            db = global_mongo_client["progress"]
206            collection = db["ticket_progress"]
207            collection.update_one(
208                {"tracking_id": self.tracking_id}, {"$set": current_dict}, upsert=True
209            )
210            # convert status back to enum object
211            self.status = TicketProgressStatus(self.status)
212        except Exception as e:
213            discord_log_error(str(e) + "\n\n" + str(self.tracking_id))
214
215    def save(self, do_async: bool = True):
216        if do_async:
217            thread = Thread(target=self._save)
218            thread.start()
219            global_threads.append(thread)
220        else:
221            self._save()
222
223    def wait(self, wait_time: int = 20):
224        if MONGODB_URI is None:
225            return
226        try:
227            # check if user set breakpoints
228            current_ticket_progress = TicketProgress.load(self.tracking_id)
229            current_ticket_progress.user_state = current_ticket_progress.user_state
230            current_ticket_progress.user_state.state_type = TicketUserStateTypes.WAITING
231            current_ticket_progress.user_state.waiting_deadline = (
232                int(time.time()) + wait_time
233            )
234            # current_ticket_progress.save(do_async=False)
235            # time.sleep(3)
236            # for i in range(10 * 60):
237            #     current_ticket_progress = TicketProgress.load(self.tracking_id)
238            #     user_state = current_ticket_progress.user_state
239            #     if i == 0:
240            #         logger.info(user_state)
241            #     if user_state.state_type.value == TicketUserStateTypes.RUNNING.value:
242            #         logger.info(f"Continuing...")
243            #         return
244            #     if (
245            #         user_state.state_type.value == TicketUserStateTypes.WAITING.value
246            #         and user_state.waiting_deadline < int(time.time())
247            #     ):
248            #         logger.info(f"Continuing...")
249            #         user_state.state_type = TicketUserStateTypes.RUNNING.value
250            #         return
251            #     time.sleep(1)
252            #     if i % 10 == 9:
253            #         logger.info(f"Waiting for user for {self.tracking_id}...")
254            # raise Exception("Timeout")
255        except Exception as e:
256            discord_log_error(
257                "wait() method crashed with:\n\n"
258                + str(e)
259                + "\n\n"
260                + str(self.tracking_id)
261            )
262
263
264def create_index():
265    # killer code to make everything way faster
266    db = global_mongo_client["progress"]
267    collection = db["ticket_progress"]
268    collection.create_index("tracking_id", unique=True)
269
270
271if __name__ == "__main__":
272    ticket_progress = TicketProgress(tracking_id="test")
273    # ticket_progress.error_message = (
274    #     "I'm sorry, but it looks like an error has occurred due to"
275    #     + " a planning failure. Please create a more detailed issue"
276    #     + " so I can better address it. Alternatively, reach out to Kevin or William for help at"
277    #     + " https://discord.gg/sweep."
278    # )
279    # ticket_progress.status = TicketProgressStatus.ERROR
280    ticket_progress.save()
281    ticket_progress.wait()
282    new_ticket_progress = TicketProgress.load("test")
283    print(new_ticket_progress)

docs/pages/blogs/ai-unit-tests.mdx:0-50

1# 🧪 Having GPT-4 Iterate on Unit Tests like a Human
2**William Zeng** - October 21th, 2023
3
4Hi everyone, my name is William and I’m one of the founders of Sweep. <br></br>
5**Sweep** is an AI junior developer that writes and fixes code by mirroring how a developer works.
6
7## 1. **Read the task description and codebase.**
8
9ClonedRepo is our wrapper around the Git API that makes it easy to clone and interact with a repo.
10We don't have any tests for this class, so we asked Sweep to write them.
11
12Here Sweep starts by reading the original GitHub issue: **“Sweep: Write unit tests for ClonedRepo”**. https://github.com/sweepai/sweep/issues/2377
13
14
15Sweep searches over the codebase with our in-house code search engine, ranking this symbol and file first: `ClonedRepo:sweepai/utils/github_utils.py`.
16This file [sweepai/utils/github_utils.py](https://github.com/sweepai/sweep/blob/main/sweepai/utils/github_utils.py) is ~370 lines long, but because we know the symbol `ClonedRepo`, we extracted the relevant code (~250 lines) without the other functions and classes.
17
18```python
19import git
20# more imports
21...
22
23class ClonedRepo:
24    repo_full_name: str
25    installation_id: str
26    branch: str | None = None
27    token: str | None = None
28
29    @cached_property
30    def cache_dir(self):
31        # logic to create a cached directory
32
33    # other ClonedRepo methods
34
35    def get_file_contents(self, file_path, ref=None):
36        local_path = os.path.join(self.cache_dir, file_path)
37        if os.path.exists(local_path):
38            with open(local_path, "r", encoding="utf-8", errors="replace") as f:
39                contents = f.read()
40            return contents
41        else:
42            raise FileNotFoundError(f"{local_path} does not exist.")
43
44    # other ClonedRepo methods
45```
46
47We read this to identify the necessary tests.
48
49## 2. **Write the tests.**
50

platform/cypress/support/e2e.ts:0-19

1// ***********************************************************
2// This example support/e2e.ts is processed and
3// loaded automatically before your test files.
4//
5// This is a great place to put global configuration and
6// behavior that modifies Cypress.
7//
8// You can change the location of this file or turn off
9// automatically serving support files with the
10// 'supportFile' configuration option.
11//
12// You can read more here:
13// https://on.cypress.io/configuration
14// ***********************************************************
15
16// Import commands.js using ES2015 syntax:
17import "./commands";
18
19// Alternatively you can use CommonJS syntax:

sweepai/utils/convert_openai_anthropic.py:0-130

1from __future__ import annotations
2from dataclasses import dataclass
3import re
4
5
6def convert_openai_function_to_anthropic_prompt(function: dict) -> str:
7    unformatted_prompt = """<tool_description>
8<tool_name>{tool_name}</tool_name>
9<description>
10{description}
11</description>
12<parameters>
13{parameters}
14</parameters>
15</tool_description>"""
16    unformatted_parameter = """<parameter>
17<name>{parameter_name}</name>
18<type>{parameter_type}</type>
19<description>{parameter_description}</description>
20</parameter>"""
21    parameters_strings = []
22    
23    for parameter_name, parameter_dict in function["parameters"]["properties"].items():
24        parameters_strings.append(unformatted_parameter.format(
25            parameter_name=parameter_name,
26            parameter_type=parameter_dict["type"],
27            parameter_description=parameter_dict["description"],
28        ))
29    return unformatted_prompt.format(
30        tool_name=function["name"],
31        description=function["description"],
32        parameters="\n".join(parameters_strings),
33    )
34
35def convert_all_functions(functions: list) -> str:
36    # convert all openai functions to print anthropic prompt
37    for function in functions:
38        print(convert_openai_function_to_anthropic_prompt(function))
39
40@dataclass
41class AnthropicFunctionCall:
42    function_name: str
43    function_parameters: dict[str, str]
44
45    def to_string(self) -> str:
46        function_call_string = "<invoke>\n"
47        function_call_string += f"<tool_name>{self.function_name}</tool_name>\n"
48        function_call_string += "<parameters>\n"
49        for param_name, param_value in self.function_parameters.items():
50            function_call_string += f"<{param_name}>\n{param_value}\n</{param_name}>\n"
51        function_call_string += "</parameters>\n"
52        function_call_string += "</invoke>"
53        return function_call_string
54
55    @staticmethod
56    def mock_function_calls_from_string(function_calls_string: str) -> list[AnthropicFunctionCall]:
57        function_calls = []
58
59        # Regular expression patterns
60        function_name_pattern = r'<tool_name>(.*?)</tool_name>'
61        parameters_pattern = r'<parameters>(.*?)</parameters>'
62        parameter_pattern = r'<(.*?)>(.*?)<\/\1>'
63        
64        # Extract function calls
65        function_call_matches = re.findall(r'<invoke>(.*?)</invoke>', function_calls_string, re.DOTALL)
66        for function_call_match in function_call_matches:
67            # Extract function name
68            function_name_match = re.search(function_name_pattern, function_call_match)
69            function_name = function_name_match.group(1) if function_name_match else None
70
71            # Extract parameters section
72            parameters_match = re.search(parameters_pattern, function_call_match, re.DOTALL)
73            parameters_section = parameters_match.group(1) if parameters_match else ''
74
75            # Extract parameters within the parameters section
76            parameter_matches = re.findall(parameter_pattern, parameters_section, re.DOTALL)
77            function_parameters = {}
78            for param in parameter_matches:
79                parameter_name = param[0]
80                parameter_value = param[1]
81                function_parameters[parameter_name] = parameter_value.strip()
82
83            if function_name and function_parameters != {}:
84                function_calls.append(AnthropicFunctionCall(function_name, function_parameters))
85
86        return function_calls
87
88def mock_function_calls_to_string(function_calls: list[AnthropicFunctionCall]) -> str:
89    function_calls_string = "<function_call>\n"
90    for function_call in function_calls:
91        function_calls_string += function_call.to_string() + "\n"
92    function_calls_string += "</function_call>"
93    return function_calls_string
94
95if __name__ == "__main__":    
96    test_str = """<function_call>
97<invoke>
98<tool_name>submit_report_and_plan</tool_name>
99<parameters>
100<report>
101The main API implementation for the Sweep application is in the `sweepai/api.py` file. This file handles various GitHub events, such as pull requests, issues, and comments, and triggers corresponding actions.
102
103The `PRChangeRequest` class, defined in the `sweepai/core/entities.py` file, is used to encapsulate information about a pull request change, such as the comment, repository, and user information. This class is utilized throughout the `sweepai/api.py` file to process and respond to the different GitHub events.
104
105To solve the user request, the following plan should be followed:
106
1071. Carefully review the `sweepai/api.py` file to understand how the different GitHub events are handled and the corresponding actions that are triggered.
1082. Analyze the usage of the `PRChangeRequest` class in the `sweepai/api.py` file to understand how it is used to process pull request changes.
1093. Determine the specific issue or feature that needs to be implemented or fixed based on the user request.
1104. Implement the necessary changes in the `sweepai/api.py` file, utilizing the `PRChangeRequest` class as needed.
1115. Ensure that the changes are thoroughly tested and that all relevant cases are covered.
1126. Submit the changes for review and deployment.
113</report>
114<plan>
1151. Review the `sweepai/api.py` file to understand the overall structure and flow of the application, focusing on how GitHub events are handled and the corresponding actions that are triggered.
1162. Analyze the usage of the `PRChangeRequest` class in the `sweepai/api.py` file to understand how it is used to process pull request changes, including the information it encapsulates and the various methods that operate on it.
1173. Determine the specific issue or feature that needs to be implemented or fixed based on the user request. This may involve identifying the relevant GitHub event handlers and the corresponding logic that needs to be modified.
1184. Implement the necessary changes in the `sweepai/api.py` file, utilizing the `PRChangeRequest` class as needed to process the pull request changes. This may include adding new event handlers, modifying existing ones, or enhancing the functionality of the `PRChangeRequest` class.
1195. Thoroughly test the changes to ensure that all relevant cases are covered, including edge cases and error handling. This may involve writing additional unit tests or integration tests to validate the functionality.
1206. Once the changes have been implemented and tested, submit the modified `sweepai/api.py` file for review and deployment.
121</plan>
122</parameters>
123</invoke>
124</function_call>"""
125
126    function_calls = AnthropicFunctionCall.mock_function_calls_from_string(test_str)
127    for function_call in function_calls:
128        print(function_call)
129        print(function_call.to_string())
130    print(mock_function_calls_to_string(function_calls))

Add tests for context agentsweepai/sweep#3491Sign in

Progress

Plan

Code Snippets Found

Add tests for context agent`sweepai/sweep#3491`