Legacy Program Understanding Through Program Slicing and Plan Recognition


David Eichmann, Ph.D., Assistant Professor, UHCL; Charles Pittman, Ph.D., JSC; Post-Doctoral Fellow selection pending

One of the greatest challenges in the maintenance and adaptation of long-lived software systems is the comprehension of system structure after months or years of modifications, few of which are ever reflected in changes to the corresponding portions of the system design documents. The maintainers are left with a daunting task discerning the design and functionality of the system from the only remaining trustable artifact, the source code. For small programs (hundreds or even thousands of source lines) this can prove reasonably tractable, but most currently fielded software systems are measured in tens of thousands, hundreds of thousands, and even tens of millions of source lines. NASA has some of the largest software systems in existence. The software supporting the analysis and design of Space Shuttle flights alone is comprised of over 2.5 million FORTRAN statements. Clearly, automated support for program comprehension is required in these cases.

A diverse variety of tools supporting an equally diverse variety of approaches to comprehending programs has been developed, but these have focused on the structure of the program itself, presenting diagrammatic representations of the block structure of the program, its calling structure, the flow of control, and related components. The problem with these representations is that they relate not to the design and rationale for application semantics, but only to the expression of those semantics in a given programming language. There are a great many constraints, including a limited choice of programming constructs, that obscure the full design. Adding an extended history of quick fixes and after-the-fact enhancements exacerbates the situation.

Chikofsky and Cross[1] characterize forward engineering as the traditional process of moving from high-level abstractions and logical implementation-independent designs to the physical implementation of a system; reverse engineering as the process of analyzing a subject system to identify the system's components and their interrelationships, and creating representations of the system in another form or at a higher level of abstraction; and re-engineering as the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form. Hence, forward engineering drives the creation of implementation from design representations, while reverse engineering drives the recreation of the design representations from the implementation, re-engineering drives the recreation of a new implementation from the old implementation. Intermediate representations can play implicit or explicit roles in these transformations.

Weiser introduced the concept of program slicing[2],3 originally as an aid to program debugging. He was interested in the abstraction mechanisms a software engineer employs in analyzing an existing program while trying to comprehend the program in order to make debugging modifications. Before Weiser's work, all abstraction mechanisms to date had decomposed programs by grouping sequential program elements. After a series of experiments in which he studied the behavior of programmers who were attempting to comprehend and debug FORTRAN programs,[2] Weiser concluded that while the abstraction and grouping of sequential sets of statements did, indeed, serve as an aid to program comprehension, programmers who were attempting to debug a program used a different mental abstraction mechanism for grouping program statements. Specifically, they used an abstraction mechanism which grouped generally nonsequential sets of statements. Weiser concluded that the statements grouped in this way were those which applied to what he called "units of data components" which range in size from simple variables to the data storage portions of large ADTs. Rather than looking at sequences of program statements, he found that he could understand the mental abstraction of the debugging process by examining data flow diagrams of a program. He called his theory of data behavior "abstraction slicing," and the units of abstraction, "slices." The slices abstract a program based on the behavior of the program with respect to a specified set of data components variables rather than sequential statement listing.

A number of researchers[4,5,6] have worked to improve the techniques of program slicing, to improve the process of generating correct, minimal slices efficiently, for a variety of languages. Gallagher and Lyle[7] developed the idea of totally decomposing a program by slicing and arranging the decomposition slices into a poset based on the partial ordering of statement set inclusion. They use this decomposition poset as the basis for guaranteeing whether a change made to a program will or will not have an effect outside a specific set of program statements. The arrangement of the program into a total decomposition slice poset provides a redocumentation of the program which makes clear to the maintainer the relationships of statements which are dependent upon one another, and those which are independent.

The notion of organizing a program in ways other than the traditional hierarchy of units of increasing abstraction is not unique to slicing.[8],9 Block structured languages can smear concepts across multiple procedures in multiple scopes, requiring a slicing-like mechanism to extract it in a form that a pattern-matcher can recognize.

Soloway and Ehrlich[10] developed the theory that programming knowledge consists in part of programming plans. A programming plan is an abstract structure which a programmer uses as a template or link between a goal and a specific program fragment instance. A programmer might use, for example, a data guard plan to help accomplish the goal of preventing division by zero. In the program, the data guard plan is manifested in the test predicate and control structure necessary to prevent division by zero, while allowing division by appropriate values. While a plan may be a single abstract entity, it is manifested in a program by statements which are, in general, nonsequential. Furthermore, in many cases, an appropriate choice of slicing criterion applied to a program segment is sufficient to recover an intact plan as a slice. Rich and Wills[11] have developed a prototype module of the Programmer's Apprentice called the "Recognizer" which automatically recognizes clichés, which essentially are the manifestation of plans in programs.

Our work on interface slicing centers around the use of program slicing techniques to tune complex assets for client requirements.[l2] The issue here is the computational complexity and micro (i.e., statement level) focus of more traditional slicing techniques. Interface slicing operates at the macro (i.e., module specification) level, substantially improving the complexity of dependency analyses. Our goal with this work is to be able to make efficient use of highly flexible generic artifacts, where only the required functionality is propagated into the new system.

Our work with design patterns[13] concerns the expression of design fragments (using a metaphor borrowed from the architectural community) and focuses on the manner in which those fragments are composed in generating the structure of a software system. Patterns are hence the aspect of software design that corresponds to programming plans composed into the implementation of the software system. Integrating the use of patterns into the selection and definition of plans is a key aspect in our work, since it allows for the presentation of a more readable version of the recovered design than the plans alone would allow.

The remaining work falls into the following categories. The first should be completed early in 1997. Construction of the various tools will comprise the duration of the project, with the experimentation occurring incrementally as new functionality is added to respective tools.

Definition of a shared intermediate program representation: Both program slicing and plan recognition require manipulation of the source code for a program. In order to facilitate interaction between the two techniques, we will define and implement a shared, intermediate representation of the program source that both tools will employ.

Construction of the parser front-end to generate the intermediate representation: The parser front-end will accept program source and generate the intermediate representation as an attributed parse tree and an associated program dependence graph. Both of these structures will be stored persistently, so that subsequent manipulation by the other tools will not require reparsing the original source.

Construction of the slicer: The slicer will accept the intermediate representation of a program and generate a partially ordered set of program slices (similar to the approach of Gallagher and Lyle). These slices will be constructed as a hyperlinked graph into the original attributed parse tree, allowing for generation and storage of numerous slices which can be shared among multiple users and plan recognizer instances.

Construction of the plan recognizer: The plan recognizer will support pattern-based recognition of plans across both full attributed parse trees constructed by the parser and slices constructed by the program slicer. Support will also be provided for the definition and maintenance of plans. A key contribution here will be the use of the program slicer in the identification of candidate code fragments to be generalized into candidate plans.

Experimentation
We are particularly interested in assessing the performance characteristics of a slicer and a plan recognizer for practical pro grams and determining how slicing can be employed to enhance the effectiveness of a plan recognizer operating on large programs. Both generally available source and samples available through our NASA collaborators will be used to test our prototypes on practical examples.

References
1E. Chikofsky and J. Cross. "Reverse Engineering and Design Recovery: a Taxonomy," IEEE Software 7.1 (1990): 13-17.
2M. Weiser. "Programmers Use Slices When Debugging," Communications of the ACM, 25.7 (July 1982): 446-52.
3M. Weiser. "Program Slicing," IEEE Transactions on Software Engineering SE 10.4 (1984): 352-57.
4J. Ferrante, K. Ollenslein, and J. Watren. "The Program Dependence Graph and its Use in Optimization," ACM Transactions on Programming Languages and Systems 9.3 (1987): 319-49.
5S. Horwitz, T. Reps, and D. Binkley. "Interprocedural Slicing Using Dependence Graphs," ACM Transactions on Programming Languages and Systems, 12.1 (1990): 26-60.
6H. Leung and H. Reghbati. "Comments," IEEE Transactions on Software Engineering SE-13.12, (Dec. 1987): 1370-71.
7K. Gallagher and J. Lyle. "Using Program Slicing in Software Maintenance," IEEE Transactions on Software Engineering 17.8 (1991): 751-61.
8A. Engberls, W. Kozaczynski, and J. Ning. "Concept Recognition-Based Program Transformation," Proceedings of the Conference on Software Maintenance, Sorrento, Italy, 15-17 Oct., 1991. 73-82.
9M. Platoff, M. Wagner and J. Camaratta "An Integrated Program Representation and Toolkit for the Maintenance of C Programs," Proceedings of the Conference on Software Maintenance, Sorrento, Italy, Oct. 15-17, 1991. 129-37.
10E. Soloway and K. Ehrlich. "Empirical Studies of Programming Knowledge," IEEE Transactions on Software Engineering SE-10.5, (1984): 595-609.
11C. Rich and L. Wills. "Recognizing a Program's Design: A Graph-Parsing Approach," IEEE Software 7.1 (1990): 82-89.
12J. Beck and D. Eichmann. "Program and Interface Slicing for Reverse Engineering," Proceedings of the Working Conference on Reverse Engineering, Baltimore, MD, May 21-23, 1993. 54-63; International Conference on Software Engineering, Baltimore, MD, May 17-21, 1993. 509-18.
13C. Irving and D. Eichmann. "Patterns and Design Adaptability," 3rd Annual Conference on the Pattern Languages of Programs, Allenon Park, Illinois, Sept. 46, 1996.