GPT-4.1 Prompting Guide 1: Agentic Workflows
GPT-4.1 Prompting Guide
GPT-4.1이 새롭게 등장했습니다. 이전 모델보다 코딩, 지시 수행, 긴 문맥 처리 등 여러 면에서 더 똑똑해졌어요. 특히 사용자의 지시를 더 정확하게, 문자 그대로 따르는 경향이 강해져서 프롬프트를 쓰는 방식도 조금 달라져야한다고 하는데, 이 글에서는 4.1의 달라진 점을 짚어보고, 그 능력을 제대로 끌어낼 수 있는 활용 팁들을 알아보겠습니다.
참조: https://cookbook.openai.com/examples/gpt4-1_prompting_guide
0: GPT-4.1 특징
- 기존 베스트 프랙티스는 여전히 유효
- 명확하고 구체적인 지시 제공하라
- 맥락 예시 포함하라
- 프롬프트를 통해 모델에게 “계획” 유도하라
- 하지만 프롬프트 마이그레이션이 필요할 수 있음
- GPT-4.1은 이전 모델보다 지시사항을 더 문자적으로 따름
- 의도 추론보다는 명시적 지시를 선호함
- 모델 조정이 쉬워짐 (steerability)
- 기대한 행동과 다를 경우, 한 줄 명확한 지시문으로 대부분 교정 가능
- “모델이 잘못된 방향으로 가고 있어” 한 줄이면 교정 가능
- 기대한 행동과 다를 경우, 한 줄 명확한 지시문으로 대부분 교정 가능
1: Agentic Workflows
Agentic Workflow란,
에이전트 워크플로우란 AI에게 단순한 지시 한 번으로 끝나는 작업이 아니라,
스스로 계획하고, 도구를 사용하며, 문제를 해결해나가는 일련의 과정을 맡기는 방식을 말합니다.
GPT-4.1은 이 ‘에이전트형 활용’에 아주 맞게 더욱 발전했는데요. 즉, 문제를 끝까지 책임지고 해결하도록 유도할 수 있고, 필요한 경우 도구(tool)도 직접 호출해 사용하며, 중간에 멈추지 않고 계속 작업을 이어갈 수 있도록 발전했습니다.
🧠 Agent Prompt 에 꼭 들어가야 하는 3가지
GPT-4.1을 에이전트처럼 쓰려면, **시작 프롬프트(system prompt)**에 아래 세 가지가 필요합니다.
1. 계속하기 (Persistensce)
역할을 명확히 부여하고, 질문이 완전히 해결될 때까지 스스로 계속 진행하도록 유도합니다.
너는 에이전트입니다. 사용자의 요청이 완전히 해결될 때까지 작업을 계속하세요. 문제가 해결되었다고 확신할 때에만 종료하고 제어를 사용자에게 넘기세요.
2. 도구 적극 사용 (Tool-calling)
불확실할 때는 툴을 적극 활용하게 하여 헛다리 짚는 응답을 줄입니다.
파일 내용이나 코드베이스 구조에 대해 확신이 없다면, 추측하지 말고 반드시 툴을 사용해 직접 정보를 수집하세요.
3. 계획 세우기 (Planning) - Optional
툴 호출 전에 반드시 계획을 세우고, 호출 후 결과에 대해 반성적 사고를 유도합니다.
각 함수 호출 전에는 반드시 충분히 계획을 세우고, 호출 결과에 대해 깊이 반성하세요. 툴 호출만 연속으로 하지 마세요. 그렇게 하면 문제 해결 능력이 떨어지고 통찰력 있는 판단이 어렵습니다.
위 세 가지 리마인더를 포함한 프롬프트 사용 시, 성능 20% 향상 (SWE-bench 기준)했고, 특히 계획 유도(Planning) 를 추가했을 때만 해도 정답률 4% 추가 상승했다고 합니다.
(→ 생각을 글로 “소리내어 하기(thinking out loud)”가 효과적이었음!)
실제 적용 예시
아래는 Open AI가 SWE-bench Verified에서 최고 성능을 낼 때 사용한 system prompt 일부 입니다.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get(
"OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"
)
)
SYS_PROMPT_SWEBENCH = """
You will be tasked to fix an issue from an open-source repository.
Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.
You MUST iterate and keep going until the problem is solved.
You already have everything you need to solve this problem in the /testbed folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct. NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.
Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it. At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
# Workflow
## High-Level Problem Solving Strategy
1. Understand the problem deeply. Carefully read the issue and think critically about what is required.
2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.
4. Implement the fix incrementally. Make small, testable code changes.
5. Debug as needed. Use debugging techniques to isolate and resolve issues.
6. Test frequently. Run tests after each change to verify correctness.
7. Iterate until the root cause is fixed and all tests pass.
8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, and remember there are hidden tests that must also pass before the solution is truly complete.
Refer to the detailed sections below for more information on each step.
## 1. Deeply Understand the Problem
Carefully read the issue and think hard about a plan to solve it before coding.
## 2. Codebase Investigation
- Explore relevant files and directories.
- Search for key functions, classes, or variables related to the issue.
- Read and understand relevant code snippets.
- Identify the root cause of the problem.
- Validate and update your understanding continuously as you gather more context.
## 3. Develop a Detailed Plan
- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
- Break down the fix into small, incremental changes.
## 4. Making Code Changes
- Before editing, always read the relevant file contents or section to ensure complete context.
- If a patch is not applied correctly, attempt to reapply it.
- Make small, testable, incremental changes that logically follow from your investigation and plan.
## 5. Debugging
- Make code changes only if you have high confidence they can solve the problem
- When debugging, try to determine the root cause rather than addressing symptoms
- Debug for as long as needed to identify the root cause and identify a fix
- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
- To test hypotheses, you can also add test statements or functions
- Revisit your assumptions if unexpected behavior occurs.
## 6. Testing
- Run tests frequently using `!python3 run_tests.py` (or equivalent).
- After each change, verify correctness by running relevant tests.
- If tests fail, analyze failures and revise your patch.
- Write additional tests if needed to capture important behaviors or edge cases.
- Ensure all tests pass before finalizing.
## 7. Final Verification
- Confirm the root cause is fixed.
- Review your solution for logic correctness and robustness.
- Iterate until you are extremely confident the fix is complete and all tests pass.
## 8. Final Reflection and Additional Testing
- Reflect carefully on the original intent of the user and the problem statement.
- Think about potential edge cases or scenarios that may not be covered by existing tests.
- Write additional tests that would need to pass to fully validate the correctness of your solution.
- Run these new tests and ensure they all pass.
- Be aware that there are additional hidden tests that must also pass for the solution to be successful.
- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.
"""
PYTHON_TOOL_DESCRIPTION = """This function is used to execute Python code or terminal commands in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0 seconds. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail. Just as in a Jupyter notebook, you may also execute terminal commands by calling this function with a terminal command, prefaced with an exclamation mark.
In addition, for the purposes of this task, you can call this function with an `apply_patch` command as input. `apply_patch` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the `apply_patch` command, you should pass a message of the following structure as "input":
%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF
Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.
For each snippet of code that needs to be changed, repeat the following:
[context_before] -> See below for further instructions on context.
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign.
[context_after] -> See below for further instructions on context.
For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change's [context_after] lines in the second change's [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
@@ class BaseClass
@@ def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.
%%bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: pygorithm/searching/binary_search.py
@@ class BaseClass
@@ def search():
- pass
+ raise NotImplementedError()
@@ class Subclass
@@ def search():
- pass
+ raise NotImplementedError()
*** End Patch
EOF
File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, python will always say "Done!", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the "Done!" is output.
"""
python_bash_patch_tool = {
"type": "function",
"name": "python",
"description": PYTHON_TOOL_DESCRIPTION,
"parameters": {
"type": "object",
"strict": True,
"properties": {
"input": {
"type": "string",
"description": " The Python code, terminal command (prefaced by exclamation mark), or apply_patch command that you wish to execute.",
}
},
"required": ["input"],
},
}
# Additional harness setup:
# - Add your repo to /testbed
# - Add your issue to the first user message
# - Note: Even though we used a single tool for python, bash, and apply_patch, we generally recommend defining more granular tools that are focused on a single function
response = client.responses.create(
instructions=SYS_PROMPT_SWEBENCH,
model="gpt-4.1-2025-04-14",
tools=[python_bash_patch_tool],
input=f"Please answer the following question:\nBug: Typerror..."
)
response.to_dict()["output"]
도구 설정 팁
GPT-4.1은 OpenAI API의 tools 필드에 정의된 툴을 자연스럽게 활용하도록 학습되었습니다.
- 도구는 tools 필드로 제공하세요
프롬프트에 직접 도구 설명을 쓰기보다는 API에 넘기는 것이 정확도에 더 좋습니다. (실험 결과, 성능 2% 향상) - 도구 이름은 명확하게,
설명은 간결하고 구체적으로 쓰세요. - 예시를 제공하고 싶다면 description에 넣지 말고 system prompt에 # Examples 섹션으로 따로 빼세요.
✨ 마무리 팁
GPT-4.1은 기본적으로 추론형 모델은 아닙니다 (내부적인 사고 흐름을 갖추진 않음).
→ 그러나 프롬프트를 통해 외부적으로 “계획-실행-반성” 흐름을 유도할 수 있습니다.
- GPT-4.1은 프롬프트만 잘 짜면 굉장히 똑똑하게 작동합니다.
- 간단한 3문장만으로도 성능을 확 끌어올릴 수 있어요.
- 프롬프트를 설계할 땐 항상 도구 사용 + 반복 실행 + 계획 유도를 염두에 두세요.