This is the fourth article in the series, and it is discussing how the concepts in some of our previous articles, including some of our primer articles, are connected. Many security vulnerabilities lie in the point between two systems. For example, where one system generates code or commands for another system to execute and, in particular, if the generating system uses external input to generate such code. The external input doesn't necessarily have to be direct user input, it could be from a configuration file, or data (e.g., cookie values, HTTP headers, etc.) that can be manipulated by an attacker.
Who is this article for?
Understanding application layer security threats is important for a wide-range of professions, including:
Security Architects and Solution Architects, as they need to understand the potential risks and mitigations to take appropriate design decisions, such as access control.
Software Engineers / Developers, because they need to understand how to build their applications in a secure manner and avoid critical mistakes.
Security Consultants of various types as they need to understand the risks of what they may be reviewing or consulting on.
Penetration Testers who need to understand how to attack these systems!
As reference for this article, it is worth reading the following articles that we have already published:
Firstly, to understand generated code attacks, we need to understand what generated code is. In simple terms, generated code is code created by one application for another to execute.
When building applications, in particular where an application (be it mobile app, desktop application, web application, or an API) is built containing multiple technologies that interact - and specifically where one component (c1) talks to another component (c2) by passing it some form of programming code which is to be interpreted and then executed by c2 - then you have generated code.
In this scenario c1 will have generated, in one way or another, code. This is extremely common, there are several common scenarios:
And there are many, many other options, especially these days with increasing use of JSON and Infrastructure-as-Code (IaC).
In all of these scenarios, the receiving component, c2, trusts that the data supplied by c1 is non-malicious and generally - due to the generic nature of the components - it would be impractical for c2 to make a distinction between malicious and non-malicious data.
The following diagram highlights the basic scenario:
Generated Code Attacks are where an attacker exploits the trust between the two or more systems (i.e. between c1 and c2 as above). The receiving, for example often a database, would receive a wide-range of commands from the sender (c1). It is practically difficult - or impossible - for c1 to determine a malicious DROP or DELETE command from a legitimate one.
So, all those aforementioned generated code scenarios (and many more) can be mapped to well known (generated code) attacks/vulnerabilities, as follows:
Of course a buffer overrun in general also follows a similar pattern, but at a lower-level in the technology stack. A component is sending data to another component for interpretation and processing.
The receiving component trusts the data it is sent - which is the crux of the problem. Practically, in most scenarios, because the receiving component is often a generalised system and an attacker would necessarily send recognised commands, it is difficult for the receiving component to know the difference between a legitimate request and a malicious one.
The following diagram expands upon the previous, highlighting how there is an additional condition needed for an attack to succeed - externally supplied input (i1) that is used in the generation of the generated code:
The key point now is that i1 is, in most cases, benign. However, in the case of an attack, this is malicious input.
The following presents two examples, based on the above sample scenarios, one discussing SQL Injection and the other discussing XSS.
In this case, c1 could be a web application, c2 would be a database (e.g. MySQL, Oracle, SQL Server etc). Suppose c1 is generating a SQL statement along the lines of:
SELECT product_name, product_description FROM TProduct WHERE product_name = '<user_supplied_input>'
In order to return product details in some kind of retail / eCommerce web application and where <user_supplied_input> is the input i1.
If i1 is malicious, a number of attacks are possible. The attacker can inject a UNION SELECT to exfiltrate data from the database. In reality, an attacker could exfiltrate all information from the database if they are able to inject arbitrary SQL in a scenario such as this. Additionally, they could potentially launch destructive attacks using DROP and DELETE commands. If the database server supports something like xp_cmdshell (a database feature that allows the execution of shell commands) it would be possible to interact with the underlying OS that the database server is running on, and almost certainly be able to gain a remote shell (i.e., remote access to the database server).
In this case, c1 could be a web application, c2 be the user's browser (e.g. Chrome, Firefox, Edge etc). Suppose c1 is a C#.NET web application (ASP.NET) generating HTML content using the code below:
Response.Write("<p> Your search for " + Request.QueryString["q"] + " returned the following results: </p>");
If an attacker were to specify i1 (injecting into the q query string parameter) along the lines of:
<script>alert(1);</script>
This would cause the C#.NET code to generate the HTML content of:
<p> Your search for <script>alert(1);</script> returned the following results: </p>
This would be received by the browser and executed. Of course, this particular example is fairly benign (if slightly annoying, by introducing a pop-up alert box).
At Firesand, we are often asked for advice with questions such as: "How to stop SQL Injection", "How do I stop XSS", "How do I prevent XXE" and so on. Our advice is to not focus on these specific attacks, as they are all instances of a wider class of problem: Generated Code Attack. Thus, if you defend really well against one instance, do you know if you have defended against any other instances? Whereas, if you resolve the Generated Code Attack problem, you fix not only the one you are concerned about, but also any others that you have not yet considered.
As most, if not all, generated code attacks rely on being able to supply input into a system that is not expected, the primary defence against Generated Code attacks in is, therefore, input validation. Always ensure that input is in the expected form, before you accept it and process it. For example, in the aforementioned SQL Injection and XSS scenarios, the supplied input would be expected to be in the form of lower and upper case alphanumeric characters (possibly with white space) - the exact valid input set is case-specific, of course! If the input is anything other than that expected input, it must be rejected. In doing so, most generated code attacks will fail.
Another line of defence is to sanitise (via encoding) output data when dealing with data being sent to external systems (e.g., use URL Encoding and/or HTML Entity Encoding when a web application sends data back to a browser for rendering).
This series has underscored the critical intersections where security vulnerabilities emerge in the architecture of modern software systems, particularly through the lens of generated code. As we have explored, the vulnerabilities primarily stem from the trust placed in automated processes that generate and execute code across disparate systems. This trust, while practically necessary, opens up avenues for attack through SQL injections, XSS attacks, and more, as detailed through the examples provided.
The fundamental challenge is to ensure that all code generated by one system and consumed by another is thoroughly scrutinised and sanitised. Security architects, developers, and testers must incorporate robust validation mechanisms at every stage of code generation and execution to safeguard against malicious inputs that can lead to catastrophic breaches. As technology continues to evolve at a rapid pace, the complexity of these interactions will only increase. The emergence of new paradigms such as IaC and the proliferation of APIs across micro-services architectures amplify the need for stringent security protocols.
In essence, many well-known security issues arise from this core concept: a system automatically generates code that another system trusts and executes. If the generating system fails to properly validate its input, this trust can be exploited, allowing attacks to propagate to the final receiving system.
Therefore, to ensure that an application or system does not fall foul of a wide-range of security vulnerabilities, they must validate inputs into any form of code generation.
Cookie Notice
We use cookies to ensure that we give you the best experience on our website. Please confirm you are happy to continue.