Call Graphs

The call graph in LiSA has a dual responsibility: it implements the logic for resolving a call to its targets, and it keeps track of the callees and callers of each control flow graph. The latter is the cause for the CallGraph inheriting from the BaseGraph class, a simple graph implementation provided by LiSA based on an adjacency list. The latter class won’t be detailed here, as it mostly provides standard constructs for graph manipulation (e.g., adding and removing nodes and edges, iterating over the graph, etc.).

 Note:
This page contains class diagrams. Interfaces are represented with yellow rectangles, abstract classes with blue rectangles, and concrete classes with green rectangles. After type names, type parameters are reported, but their bounds are omitted for clarity. Only public members are listed in each type: the + symbol marks instance members, the * symbol marks static members, and a ! in front of the name denotes a member with a default implementation. Method-specific type parameters are written before the method name, wrapped in <>. When a class or interface has already been introduced in an earlier diagram, its inner members are omitted.

Calls

Understanding how calls are represented in LiSA is crucial to reason on the call graph. The Call class is the root of the call hierarchy, and contains the general structure that is shared among all call types.

The Call class hierarchy

The superclass of Call, the NaryExpression class, is detailed in the expressions page: for the contents of this page, it suffices to know that it is a generalization of an expression that can have zero or more sub-expressions. The Call abstract class defines the components of a call’s signature: the name of the target program member that is being invoked (getTargetName), the qualifier of the call, if any (getQualifier — an optional string that is used to refer to the unit in which the target member is declared, e.g., a class name for a Java static method call, or a module name for a Python function call), and the arguments passed to the call (getParameters).

In addition, each call has a type (getCallType), that is defined through an enumeration that can assume values STATIC, INSTANCE, or UNKNOWN. The call type is meant as a way to instruct the call graph on how to resolve the call. A STATIC call is a call that does not have a receiver: during resolution, targets will be searched in the whole program, optionally restricting the search to types that match the qualifier, if any. An INSTANCE call instead is a call that has a receiver: the type of the first parameter determines the leaf of the type hierarchy where the call targets are to be searched, that is traversed upwards as is typical of object-oriented languages. An UNKNOWN call is a call for which it is not possible to determine whether it has a receiver or not at parse time, for which both resolutions are attempted.

The most common call instance that is produced by frontends is the UnresolvedCall: a call that has not been resolved yet, and that must be handled by the call graph to determine its targets. Since the resolution of an UnresolvedCall leads to the creation of a second call, the Call class offers means to record the UnresolvedCall that led to its creation: using setSource, one can record that a call originated from a given UnresolvedCall, that can be retrieved using getSource. This is useful to access the control flow graph containing the call: since resolved call do not appear syntactically in the program, they are not directly part of a control flow graph but must resort to their source UnresolvedCall to access the graph they belong to.

The resolution process yields to the creation of Call instances that also implement the ResolvedCall interface, that provides means to access the targets of the call. getTargets returns a collection of CodeMembers, an interface that describes the general structure of a program member that contains code (e.g., a control flow graph, or a summary of a function from the standard library — more information available in the control flow graphs page). The interface is implemented by several classes. The three core implementations are:

Since a call might resolve to a mix of NativeCFGs and regular CFGs, the MultiCall class is provided as a wrapper to represent a call that embeds multiple underlying calls. Finally, the TruncatedParamsCall is a special call generated when calls whose type is UNKNOWN are resolved to static targets: the first parameter passed to the call is actually the qualifier, and is thus removed in semantic computations.

Symbol Aliasing

Often, programming languages allow some means of defining aliases for program members, similar to Python’s import as or C#’s using. These might affect the resolution of calls, as the name of a function or class might have been locally renamed by one of these constructs. Such aliases can be stored in an instance of the SymbolAliasing class. This is a subtype of FunctionalLattice that maps instances of Symbol (i.e., a name with an optional qualifier) to a set of Symbols that represent the possible aliases of the symbol. Instances of SymbolAliasing are automatically extracted from the ProgramState and passed to the call graph’s resolve method.

The CallGraph class

The CallGraph class is the root class of the call graph hierarchy. It inherits from BaseGraph (hidden in the class diagram), and provides standard graph methods to access, add, and remove nodes and edges.

The CallGraph class hierarchy

The CallGraph class adds several methods, most of which have default implementations:

Three methods require implementation by subtypes of CallGraph. The getCallSites method yields all calls that invoke a given CodeMember. This method is used by the overload accepting a collection of CodeMembers to return the union of their call sites. New edges in the call graph can be added either by resolving a call, or by manually informing the graph that a call is being executed. In both methods, NativeCalls are never registered in the call graph as they do not point to a CFG. The registerCall method can be used, given a CFGCall (i.e., an already-resolved call that is linked to the CFG it might invoke), to add an edge from a caller to one or more callees. Instead, the resolve method receives an UnresolvedCall, the set of Types for each argument, and SymbolAliasing information, produces a resolved Call instance.

For a list of call graphs already implemented in LiSA, see the Configuration page.

The BaseCallGraph class

In most cases, the resolution logic follows the same workflow, parametric to all language-specific algorithms. Such workflow is implemented in the BaseCallGraph class, that provides implementations for the abstract methods of CallGraph:

All methods involved in the resolution of both instance and non-instance calls are implemented, but are left open for modifications by subclasses. The resolution process implemented in resolve proceeds as follows:

Both instance and non-instance resolution exploit few auxiliary methods, each with a default implementation that can be overridden by subclasses to implement possible language-specific optimizations:

resolveNonInstance proceeds by first selecting the types where the targets are to be searched are determined by looking at the call qualifier (if no qualifier is present, all types in the program are considered), also considering possible aliases. Then, each type is searched independently: if the type is not part of a hierarchy (e.g., it is not part of a language that has classes), the search filters all code members with isATarget. Then, for all selected targets, the distance from a perfect target is computed (i.e., the number of type conversions necessary for using the actual parameters of the call as parameters of the target) using the program’s parameter matching strategy, and the ones with the lowest distance are added to the possible targets. Note that if the call is ambiguous (i.e., there are multiple targets with the same distance in the same type), an exception is raised. Instead, if tye type is part of a hierarchy, the hierarchy is traversed upwards using the program’s hierarchy traversal strategy. For each traversed type, the same process described above is applied, but the search is stopped as soon as at least a target is found, since the traversal strategy ensures that the first targets found are the most specific ones.

resolveInstance proceeds similarly, but the search is restricted to the type of the receiver (i.e., the first parameter of the call) and its supertypes. The resolution starts by selecting the possible types of the receiver through getPossibleTypesOfReceiver, the only abstract method of the class, that can implement several techniques for call graph building (e.g., class hierarchy analysis, rapid type analysis, etc.). For each such type, the corresponding type definition is retriieved through getReceiverCompilationUnit. The hieararchy of the definition is then traversed using the program’s hierarchy traversal strategy, and for each traversed type the instance code members are filtered using the same process described above: isATarget is used to check the suitability of the code member, the distance from a perfect target is computed, and the best targets are selected. The search is stopped as soon as at least a target is found, since the traversal strategy ensures that the first targets found are the most specific ones. In case of ambiguity, an exception is raised.

Implementations of BaseCallGraph thus only need to provide an implementation of getPossibleTypesOfReceiver to implement a call graph building algorithm, and can rely on the default implementations of the other methods, that are designed to be as generic as possible while respecting the possible language-specific features that might affect the resolution process.