Prerequisites:
  1. Minimal Program Components
  2. Statements, Expressions, and Edges
  3. Annotations

Control Flow Graphs

A control flow graph (CFG) is LiSA’s representation of the body of a function, method, or procedure. The nodes of the graph are Statement instances, and the directed edges between them are instances of Edge and define the possible flows of execution. Each CFG has a descriptor that carries its metadata — name, parameters, return type, annotations, and more — independent of the code body itself. CFGs live inside Units, which in turn are part of a Program; those are described in their own documentation pages.

This page describes the building blocks of control flow graphs in LiSA: starting from the descriptor and the data structures it contains, through the generic graph infrastructure the CFG builds on, and finishing with the CFG class itself and its native counterpart.

 Note:
This page contains class diagrams. Interfaces are represented with yellow rectangles, abstract classes with blue rectangles, and concrete classes with green rectangles. After type names, type parameters are reported, but their bounds are omitted for clarity. Only public members are listed in each type: the + symbol marks instance members, the * symbol marks static members, and a ! in front of the name denotes a member with a default implementation. Method-specific type parameters are written before the method name, wrapped in <>. When a class or interface has already been introduced in an earlier diagram, its inner members are omitted.

Descriptors

The CodeMemberDescriptor class carries all the metadata associated with a code member — a function, method, or procedure — without containing the body code itself. It is shared by both CFG and NativeCFG instances, and serves as the single point of truth for information about a code member’s signature and context.

CFG Descriptor

The main information tracked by a CodeMemberDescriptor includes:

CodeMemberDescriptor is created by frontends at parsing time and refined during validation, when annotations are propagated and the override chain is established. Modifications to a descriptor after validation has completed may lead to incorrect results.

Formal Parameters

Parameter represents a formal parameter of a code member. Each parameter carries its name and static type (via getName() and getStaticType()), its source location (getLocation()), and any annotations attached to it (getAnnotations()). Optionally, a parameter may have a default value expression, returned by getDefaultValue(), which is used at call sites that omit the corresponding argument. The order of parameters in the array returned by getFormals() matches the order of formal parameters in the source signature.

Local Variables

A VariableTableEntry represents a local variable defined in the body of a code member. Each entry tracks the variable’s index in the local variable table (getIndex()), its name and static type (getName(), getStaticType()), and its source location (getLocation() — the location of its declaration). Each entry also records the scope of the variable as a pair of statements: getScopeStart() returns the first statement where the variable is visible, and getScopeEnd() the last, both of which may be null when the variable is visible throughout the entire code member. Given a CFG, the createReference(CFG) method builds the VariableRef expression that syntactically accesses the variable. Annotations on local variables are accessible via getAnnotations(), and are looked up by the descriptor’s getAnnotationsOf method to resolve annotation queries by variable name within the correct scope.

 Tip:
Information on local variables is optional: its main purpose is to track annotations and to provide scoping logic. If not provided, all variables are assumed to have no annotations and to be visible throughout the entire code member they are defined in.

Error Handling Blocks

Error handling blocks encode the exception-handling structure of a code member, such as Java’s try/catch/finally construct. LiSA represents these through three classes.

Error Handling Blocks

A ProtectionBlock is the top-level container for a single exception-handling construct. It groups together the different parts of the construct by providing access to them:

Instead, getFullBody(CFG) returns the full collection of statements covered by this protection block across all its parts.

A ProtectedBlock represents a contiguous region of code within an exception-handling construct: the try body, the else clause, the finally clause, or the body of a catch block. It tracks the first and last Statement delimiting the region through getStart() and getEnd(), and the getBody() method returns all the statements contained within it. Two boolean methods, canBeContinued() and alwaysContinues(), describe whether normal execution can exit the block and whether it always does so, respectively; these are used when analyzing the reachability of statements after the construct.

A CatchBlock is a specialization of the error-handling construct for a single catch clause. In addition to the body (a ProtectedBlock accessible via getBody()), it records the exception types it handles, returned by getExceptions(), and the identifier that names the caught exception within the block, returned by getIdentifier(). The connection between protection blocks and the ErrorEdge type (which transfers control to the appropriate catch block when an error is raised) is described in the Statements, Expressions, and Edges page.

Control Flow Structures

Control flow structures encode the high-level syntactic constructs — loops and conditionals — that organize the statements inside a CFG. They are stored in the descriptor’s getControlFlowStructures() collection and can be populated either by the frontend at parse time or inferred heuristically from the CFG’s edge structure.

Control Flow Structures

The abstract ControlFlowStructure class is the common base for all such structures. Each instance is anchored to a condition statement (the guarding expression of the construct, returned by getCondition()) and a first follower (the first statement after the structure exits, returned by getFirstFollower()). The allStatements() method returns all statements covered by the structure, including the condition and first follower, while getTargetedStatements() returns only those that are directly part of the structure’s body and that are the target of a conditional edge. The contains(Statement) method tests whether a given statement belongs to the structure.

Two concrete subclasses are provided:

When frontends cannot provide control flow structures directly (e.g., because the source language uses low-level jump instructions), the ControlFlowExtractor class can be used (or extended) to infer them heuristically from the CFG’s edges. Its extract(CFG) method uses a dominator-based algorithm to find loops (via back edges) and a graph-visiting heuristic to identify IfThenElse structures. The result is a collection of ControlFlowStructure instances that can be passed to the descriptor for storage.

 Tip:
Information on control flow structures is optional: its main purpose is to drive optimized fixpoints and provide additional information to checks. If not provided, optimized fixpoints will not be available.
 Warning:
Heuristic extraction is a best-effort process. CFGs with arbitrary jumps (e.g., goto, break, or continue that exit multiple levels) may cause the extractor to miss or misidentify some structures. It is always preferable to have frontends supply control flow structures directly from source information.

Code Members

A code member is any program construct that has a body of code and a signature. In LiSA, all code members implement the CodeMember interface, which requires the implementation of a single method: getDescriptor(), returning the CodeMemberDescriptor that carries the code member’s metadata. The validate() default method can be overridden to enforce well-formedness checks on the code member’s structure.

Code Members

Three classes implement CodeMember:

Graphs Containing Code

LiSA’s CFG builds on a generic, reusable graph infrastructure defined in the it.unive.lisa.util.datastructures.graph.code package. This infrastructure provides the data structures and interfaces needed to represent directed graphs whose nodes and edges are ordered and comparable, without making any assumption about the specific types of nodes and edges used.

Code Graph Classes

The CodeNode<G, N, E> and CodeEdge<G, N, E> interfaces are the marker interfaces for nodes and edges of a CodeGraph. Both extend their respective base interfaces from the generic graph package (Node and Edge) and add a Comparable constraint, so that nodes and edges can be ordered within the graph. CodeEdge further defines three methods shared by all edge implementations: isUnconditional() (whether the edge is always traversed), isErrorHandling() (whether the edge transfers control to an error handler), and newInstance(N, N) (which produces a copy of the edge connecting different nodes, used during graph simplification).

The CodeGraph<G, N, E> abstract class is the central data structure. It is parametric to the concrete graph type G, the node type N (constrained to CodeNode<G, N, E>), and the edge type E (constrained to CodeEdge<G, N, E>). The graph manages its nodes and edges through an internal NodeList<G, N, E>, accessible via getNodeList(). The main operations available on a CodeGraph are:

The NodeList<G, N, E> class implements the underlying adjacency structure of the graph and handles all the low-level bookkeeping: insertion and removal of nodes and edges, and efficient lookup of neighbors. It also provides a simplify() method that removes nodes connected by unconditional edges (i.e., nodes whose only incoming and outgoing edges are isUnconditional()) and rewires their neighbors directly, reducing the number of nodes in the graph without changing its semantics.

Control Flow Graphs

The CFG class is LiSA’s main representation of the body of a function, method, or procedure. It instantiates the CodeGraph<G, N, E> abstraction as CodeGraph<CFG, Statement, Edge>, using Statement instances as nodes and Edge instances as edges. It also implements CodeMember, connecting it to the descriptor-based metadata system.

CFG

Beyond the graph operations inherited from CodeGraph, the CFG class provides the following:

Native Code

Not all code members that LiSA must reason about have a body that can be represented as a control flow graph. Standard library functions, runtime primitives, and other external constructs may be provided as native code members whose semantics is expressed directly as a pluggable expression rather than as a graph.

NativeCFG

A NativeCFG is a CodeMember that has only a CodeMemberDescriptor and no body. Its semantics is captured by a NaryExpression class that must also implement the PluggableStatement interface, passed to the NativeCFG constructor as a Class object. When a call to a native CFG is encountered during analysis, the rewrite(Statement, Expression...) method is invoked: it uses reflection to call the build(CFG, CodeLocation, Expression[]) static factory method of the construct class, producing a NaryExpression that replaces the original call and whose forward semantics implements the intended behavior. The PluggableStatement interface’s setOriginatingStatement(Statement) method is also called on the resulting expression, linking it back to the call that triggered the rewrite.

 Tip:
To implement a native construct, create a class that extends a suitable NaryExpression subclass and also implements PluggableStatement. Provide a public static build(CFG, CodeLocation, Expression[]) factory method, define the semantics in forwardSemanticsAux, and pass the class to the NativeCFG constructor. LiSA will take care of the rest.

NativeCFGs avoid the submission of library code to the analysis, which in turn reduces the complexity of the analysis. The semantics of a library function is expressed as if it were a single instruction, that is effectively inlined at the call site, without the need to analyze its internal code line by line. This also means that NativeCFGs will not have a corresponding AnalyzedCFG available in the analysis results.