Prerequisites:
  1. Scoped Objects
  2. Program Points

Symbolic Expressions

Symbolic expressions are LiSA’s internal language for defining atomic operations that a program performs during its execution. These are the subjects of the evaluations performed by Semantic Domains to infer program properties, and allow domain definitions that are independent of the source programming language. The rationale behind this design choice is to allow LiSA to support multiple programming languages and paradigms by not attributing any specific meaning to syntactic constructs. For instance, an assignment in Python can automatically wrap or unwrap tuple values, while in Java it can perform boxing or unboxing of primitive types, neither of which happens in C. Thus, LiSA allows Statements (i.e., syntactic constructs appearing in the source program) to implement their own semantics by decomposing themselves into a series of symbolic expressions that capture the intended operations in a language-agnostic way. This allows abstract domains to focus on the semantics of small and well-defined operations, rather than having to deal with the peculiarities of each programming language. The decomposition is semantic: the state of the analysis can be queried for any information (e.g., types of variables, values of constants, aliasing of memory locations) that may be needed to narrow down the possible behaviors of the expression. The same statement can thus produce a different decomposition each time it is analyzed, depending on the current state of the analysis. Symbolic expressions are also extensible, meaning that new ones can be added to support new language constructs or paradigms.

In this page, we present the main interfaces and classes that define symbolic expressions in LiSA, focusing on ones that have an important role in the overall design of the framework.

 Note:
This page contains class diagrams. Interfaces are represented with yellow rectangles, abstract classes with blue rectangles, and concrete classes with green rectangles. After type names, type parameters are reported, but their bounds are omitted for clarity. Only public members are listed in each type: the + symbol marks instance members, the * symbol marks static members, and a ! in front of the name denotes a member with a default implementation. Method-specific type parameters are written before the method name, wrapped in <>. When a class or interface has already been introduced in an earlier diagram, its inner members are omitted.

The Symbolic Expression Class

The SymbolicExpression class is the root of the symbolic expression hierarchy. It extends the ScopedObject<SymbolicExpression> interface, meaning that all symbolic expressions can be scoped when entering or exiting a new context during an analysis. This is particularly useful for expressions that contain Identifiers, as it allows to isolate variable references when analyzing function calls. In the class diagrams in the remainder of this page, methods have been removed from concrete expression instances for the sake of clarity, and they will be described in the relevant sections.

The SymbolicExpression class hierarchy

Each symbolic expression is identified by a static Type, that is, a supertype of all possible runtime types the expression can take, and a Code Location, corresponding to the syntactic construct that generated the expression. SymbolicExpression defines three abstract methods that must be implemented by all subclasses:

Other than the above methods, SymbolicExpression instances must define accept, which plays a central role in LiSA as it allows recursive traversal of symbolic expressions through the visitor pattern. When a recursive taversal is needed, one can simply call accept on the root expression, passing an ExpressionVisitor to it together with any additional parameters needed. The ExpressionVisitor interface is parametric on the type T of the value returned by each visit operation. The interface defines a visit method for each concrete subclass of SymbolicExpression, allowing to define custom behaviors for each expression type. It also defines visits for arbitrary HeapExpressions and ValueExpressions, that can be used to cover user-defined expressions not bundled within LiSA. The recursive traversal happens automatically: each visit method receives as parameter the expression being visited, the result of recursive visits on its sub-expressions, and any additional parameters passed to the initial accept call. SymbolicExpression define accept by simply calling the appropriate visit method on the visitor, passing itself and the results of recursive accept calls on its sub-expressions.

The SymbolicExpression class hiearrchy is split among HeapExpressions, that represent operations that manipulate the dynamic memory (e.g., object creation, field access), and ValueExpressions, that represent operations that produce values (e.g., arithmetic operations, constants). This distinction is important as it allows analyses to focus on either heap manipulations or value computations, depending on their purpose. Both kinds of expressions can be combined to form complex expressions that manipulate both values and memory locations.

Heap Expressions

Heap expressions are symbolic expressions that represent operations that manipulate the dynamic memory of the program. These include:

When adopting the Simple Abstract Domain framework, heap expressions are always rewritten by the HeapDomain before being evaluated by ValueDomain and TypeDomain.

Value Expressions

Value expressions are symbolic expressions that represent operations that produce values during program execution. ValueExpressions offer an additional method: removeNegations, that simplifies boolean expressions by removing negations and inverting comparison operators, where possible. Value expressions include:

When adopting the Simple Abstract Domain framework, value expressions might be rewritten by the HeapDomain before being evaluated by ValueDomain and TypeDomain. This is because sub-expressions of a value expression may be heap expressions that need to be rewritten first.

Identifiers

Identifiers play a key role in LiSA: other than modeling program variables, they alre also used by a number of analysis components to represent specal values (e.g., the return value of a function, the exception being thrown).

The Identifier class hierarchy

An Identifier is a ValueExpression that is uniquely identified by its name. An identifier can be weak (as returned by the isWeak method), meaning that it can represent multiple program locations at once (e.g., when modeling aliasing or arrays) or strong, meaning that it represents a single program variable. As discussed in the Semantic Domain page, special care must be taken when updating weak identifiers to avoid unsound behaviors. Identifiers can also be scoped, meaning that they can be hidden behind some program construct (e.g., function calls) to avoid clashing with local definitions. For instance, when an Identifier named x is scoped by the instruction scope, it undergoes a sort of renaming, becoming [scope]x. This allows to distinguish between different instances of x defined in different scopes. Method canBeScoped returns true if the identifier can be scoped, while isScopedByCall returns true if the identifier has been scoped by a call. An identifier can be combined with another one through lub, that raises an exception whenever the two identifiers are not compatible (i.e., they have different names). If they have the same name, the resulting identifier is weak if at least one of the two is weak. Two more predicates are provided by the Identifier class: canBeAssigned, returning wether assignments having that identifier as left-hand side are allowed (useful for preventing assignments to special memory locations), and isInstrumentedReceiver, returning true if the identifier represents an entity (e.g., object or array) that is being initialized by the current expression.

Identifier has five direct subclasses:

Thus, any identifier that can be scoped becomes an OutOfScopeIdentifier named scope:name, where scope is the ScopeToken (see the definition of the Scoped Object Interface) used to scope it, and name is the original identifier’s name.

Finally, the Variable class has three subclasses representing special synthetic variables that LiSA uses to model specific program constructs: