Computer-Assisted Visual Communication

Barry David Wessler

Dept

July 1973

UTEC-CSc-73-127

This research was supported in part by the University of Utah Computer Science Division and the Advanced Research Projects Agency of the Department of Defense, monitored by the Rome Air Development Center, Griffiss Air Force Base, New York 13440, under contract F30602-70-C-0300.

ACKNOWLEDGEMENTS
ABSTRACT
CHAPTER 1: INTRODUCTION
CHAPTER 2: SYSTEM STRUCTURE AND MOTION
CHAPTER 3: WALKING
CHAPTER 4: THE FILM "NOT JUST REALITY"
APPENDIX A: IMAGE QUALITY
APPENDIX B: THE FRAME GENERATOR
APPENDIX C: LIP SYNCHRONIZATION
BIBLIOGRAPHY

LIST OF FIGURES, TABLES, AND ALGORITHMS

Figure 2.1 System Block Diagram
Figure 2.2 Structure of Man
Figure 2.3 Interpolating Functions
Figure 3.1 Football Player
Figure 3.2 Examples of Walks
Figure 3.3 Selecting Object to be Changed
Figure 3.4 Walk Program Display
Figure 3.5 Structure of the Man Figure
Figure 3.6 Coordinate System for Man Figure
Figure A.1 Line Drawings Vs. Visible Surface Pictures
Figure A.2 The Jaggies
Figure A.3 Smooth Shading
Figure A.4 Blurring
Figure A.5 Smooth Shading Basis
Figure A.6 Resolution
Figure A.7 Gamma Correction
Figure A.8 Variable Light Source
Figure B.1 BNF for Primitive Object Description
Figure B.2 Rigid Object Syntax
Figure B.3 Syntax for Library File
Figure B.4 BNF for Frame by Frame Description
Figure C.1 Bar Chart
Figure C.2 Speech Analysis Input Program

ACKNOWLEDGEMENTS
ABSTRACT

ACKNOWLEDGEMENTS

I would like to thank Dave Evans and Ivan Sutherland for their help, support and enthusiasm for this project. I would also like to thank Glen Fleck for aiding in the development of the Treatment of the film.

Thanks also to all my co-workers, especially Ed Catmull, Henri Gouraud and Martin Newell. Thanks also to the people who aided me in the film production: ARPA, for providing the equipment and computer time; KUED, for providing editing and sound equipment; BYU, for providing the optical printer service; Rex Cambell, for his great voice; Lino Ferretti, for his great music; and Fred Parke, for the titles. I would also like to thank Kathy Gerber for typing; the PDP-10 for formatting and the Xerox Graphics Printer for printing this document. Also to Mike Milochik for all the hours he spent in the photo lab processing my films and stills.

ABSTRACT

The purpose of this research was to build an environment in which an author can create a visual communications vehicle (a film) which will convey his ideas and thoughts. The primary motivation for this effort was the powerful ability of film to combine instruction with entertainment. The entertainment component can increase the attention span of the observer so that difficult concepts can he transferred. This capability can be very valuable for short attention span observers (e.g. young children, corporation executives, etc.). Entertainment. also permits repetition, until the transfer is made, without serious objection by the observer (e.g. advertising commercials and Sesame Street).

The development of the Visible Surface algorithm, by Watkins at the University of Utah, was the key to providing a mechanism for the computer to produce images of sufficient visual interest for stimulating films. The Watkins Algorithm produces half tone images of three-dimensional objects described by manipulating a collection of polygons stored in the computer. These objects may he realistic, completely abstract, or representative in appearance. The objects are akin to physical models or puppets in that they operate in a three-dimensional perspective space and are shaded by a controllable light source. They are also like two-dimensional animated drawings in that they are not constrained by physical laws or rigid body motions.

The basic tasks accomplished in creating the environment for computer assisted Visual Communications were: System (breaking up the overall task into its major components and defining the communication paths between these components); Structure (defining the internal and external representation of objects to be manipulated and displayed); and Motion (finding ways of describing the dynamics of activities in effective, compact ways). An environment was constructed and used to produce several films.

Special attention was given to human-like motion. No attempt was made to be anatomically correct but rather to characterize activities such as walking and talking to convey mood and emotion to the observer.

A film has been completed describing the goals of this research and showing the capabilities of the Computer-assisted Visual Communications environment created.

CHAPTER 1: INTRODUCTION

From Webster's [1]:

VISUAL - adj. - 1: Of, relating to, or used in vision . . . . . 7: of, relating to, or constituting a means of instruction (as a map or film) by means of sight.

COMMUNICATION - n. - 1: An act or instance of transmitting . . . . . 5: a process by which meanings are exchanged between individuals through a common system of symbols. 6: a technique for expressing ideas effectively.

The University of Utah has been a pioneer in the development of Visible Surface image technology. The goal of this research was to create an environment to use this image technology for visual communications, an environment in which meanings are exchanged between individuals through a common system of (visual) symbols.

The form of the communication may be by: story telling (where the meaning is conveyed by conjecture and symbolism); lecture; demonstration or example; or through interaction by two or more individuals.

The medium of the communicator may be: paper; film; video, live (e.g. a play or a presentation), or computer display.

The form used in the communications is generally a matter of preference of the author (Author, generally applied to the written word, is used in the broader context here to include painter, film maker, etc.), is context dependent, and is often a combination of forms. (William Shakespeare told stories, Sigmund Freud gave lectures; but many of their meanings were the same).

There are several intrinsic differences in the media. Some media are inherently passive (paper, film, etc.) while others can he (but are not always) responsive to the observer. Yet another distinction is between static (photograph, painting, etc.) and dynamic (film, TV, etc.) media. The chart below presents some of the media in each category.

The environment to he created by this research is to generate passive, dynamic visual communications on a recorded media (film, presumably). This does not mean that there is no interaction between the computer and the author; rather, there is no interaction between the medium and the observer. In fact, there is no real interaction between the computer and author using visible surface pictures; rather the interaction uses typewriter and traditional line drawing displays. This decision was principally economic. The University of Utah does not have the hardware necessary to generate dynamic visible surface pictures in real time (channel controller, clipper, matrix, multiplier, y-sorter, and storage would be required in addition to existing equipment). This eliminates the use of the play back technique [2] between the author and the computer. The loss of the responsive media for the observer also eliminates several important applications such as training simulators and computer-assisted instruction.

The development of the Visible Surface algorithm, by Watkins [3], was the key in providing a mechanism for the computer to produce images of sufficient visual interest for stimulating films. The reader unfamiliar with Visible Surface images should look at the photographs in Appendix A. The Watkins' Algorithm produces half tone images of three-dimensional objects described, by a collection of polygons stored in the computer. These objects may be realistic, completely abstract, or representative in appearance. The objects are akin to physical models or pupets in that they operate in a three-dimensional perspective space and are shaded hy a controllable light source. They are also like two-dimensional animated drawings in that they are not constrained by physical laws or rigid body motions.

The purpose of this research was to build an environment in which an author can create a visual communications vehicle (a film) which will convey his ideas and thoughts. The primary motivation for this effort was the powerful ability of film to combine instruction with entertainment. The entertainment component can increase the attention span of the observer so that difficult concepts presented can he transferred. This capability can be very valuable for short attention span observers (e.g. young children, corporation executives, etc.). Entertainment also permits repetition, until the transfer is made, without serious objection by the observer (e.g. advertising commercials and Sesame Street.)

Before proceeding a caveat is in order: Dynamic Visible Surface images represents a new medium for generating visual communications, but far from the only way. It is not expected that this medium will replace any existing techniques. Visible Surface images have their own set of advantages and limitations which make them valuable for expressing some ideas and not others. This thesis was an attempt to explore both the advantages and limitations of this medium.

1.1 Previous Work

Past efforts in computer assisted movie making can he classed in three categories: Line drawing systems, half tone two-dimensional systems, and three-dimensional half tone systems.

There has been considerable effort in line drawing animation. Two paths have been taken: abstract and representative. In the abstract work the computer was used to generate a sequence of patterns of lines and dots under program control [4]. Some interesting films have resulted. This general area has been summed up by the following: We started some years ago with jittery sequences of geometric nonsense and bit by bit literally we have worked up to smoothly flowing, shimmering, full color sequences of geometric nonsense. Many representative type systems have been done with varying degrees of success [5]. The best known is the work of Baecker [6] at M.I.T. Baecker used the interactive computer to sketch objects, trajectories of motion and other time-based control parameters. The action segment could then be reviewed in real time. The author could, after seeing the segment, modify the objects or motion to get the desired effect. Baecker's work and the basis of his success stems from an observation by Gombrich: to doodle and watch what happens - has indeed become one of the acknowledged means of extending the language of art[7]. The system can produce movies very rapidly, and has produced several interesting segments.

The work in two dimensional half tones was pioneered (and monopolized) by Knowlton [8]. His work was not interactive design but languages for movie generation. The result of this effort has been two languages, BEFLIX and L6, and several movies.

The three-dimensional half tone work has been done at the University of Illinois, General Electric Company, and the University of Utah. Bouknight at the University of Illinois [9] adapted and extended one of the early Visible Surface algorithms to include shadows. Several film segments were produced to show the algorithm's capability. The General Electric Company [10], developed in the early 1960's, a visible surface algorithm for the image generator in a NASA flight simulator. The image generator produces real-time color pictures in response to the flight controls. The original algorithm was very restrictive but adequate for the flight simulator appplication. Two excellent films have been made, one showing the use of the image generator in the flight simulator business and the other done under contract to NASA showing the construction of the Sky Lab. The system used for specifying the motion in these films has not been described in the literature. At the University of Utah two efforts have been reported [11] [12], Catmull's Hand and Parke's Face. These along with several smaller efforts stimulated by a seminar in computer animation were cross fertilizing with this thesis.

1.2 Animation

From Websters:

ANIMATION - n. - 2a: a motion picture made by photographing successive positions or poses of puppets and other inanimate objects so that projection of the film produces a picture in which the puppets or other objects seem to move in a life like and realistic manner. b. Animated cartoon.

CARTOON - n. - 2a: a drawing that is often symbolic and usually intended as humor, caricature, or satire and comment on public and usually political matters. b. Comic

According to the definition of animation, this thesis is concerned directly with animation. Yet the word has been avoided in this document because of the close association with CARTOON which implies humorous, flighty activities. The principle intent of this thesis was to create a vehicle for communicating information.

The Animation industry is, however, a very fertile source of information, techniques and terminology pertaining to the task at hand. There are a large number of books concerned with the history of the industry but only one was found that dealt directly with the concepts and methods of animated film production [13]. The following information was gleaned from that text and from personal conversations with Glen Fleck.

The industry is a structured hierarchy of people resembling apprenticeship industries of previous centuries. At the top level is a chief animator who plays a key role in character design and story hoard development. At the next level are the Animators who design the action and draw several frames per minute of film, called Key frames. They may also specify how the transition is to be made between the key frames. At the third level are the junior animators who produce every 10 to 20th frame using the directions provided by the animators. These frames are generally at extremes of the action. The junior animator also make a spacing guide to show the fourth level in-betweener how to complete the action. The in-betweener then makes each drawing required for the film. Not included in this list are the clean-up men, checkers, painters, background men, etc. that are necessary in the traditional animation studio.

The important notion gleaned from this structure is the importance of the key frames and transition specification. In the animation industry this information is presented as drawings, notes and some curves. Much of the work in this thesis capitalizes on the key frame concepts.

Due to the enormous expense of animation, considerable thought and effort must go into the planning of a film. There are six distinct phases in planning instructional animation: task analysis, synopsis, research, treatment, script, and story board. Below are definitions of each phase. Chapter Four presents the planning for the film produced in conjunction with this thesis.

Task Analysis:: What do you want the audience to know or do as a consequence of seeing the film?
Synopsis:: A one page outline of the essential concepts of the projected film. It is composed of the following elements: target audience, purpose, subject, core idea, plot, identification and cinematic form.
Research:: Accurate information concerning subject matter of film. The kind of information to search for is visual information . . . . The eye remembers, the ear forgets; at least 90 percent of the film's content should be visual.
Treatment:: Should indicate where the film will begin, what will constitute the main body and how it will end. The treatment is an expansion of the plot of the synopsis.
Script: Final written presentation. Its format should be split page; visual information on the left, audio elements on the right.
Story board:: Establishes the film's style, continuity and visual approach. Each major change of scene, sequence, or concept in a film should be indicated with a story board sketch.

In the animation industry, because of difficulties in painting the final artwork, objects are generally painted flat i.e. the whole object is a single color. Exaggeration of motion is used to compensate for the loss three-dimensional form. This distortion of 'stretch' in anticipation and 'squash' in reaction is an accepted technique of film animation ... Distortion endows the flat image with vitality and dynamism. [13]. The use of this exaggeration is not as necessary in Visible Surface animation, however, because objects retain their three-dimensionality in presentation. Exaggeration, on the other hand, is a very valuable technique for conveying some important information.

1.3 Uses of Dynamic Visual Communication

This section describes some of the (perhaps commercially feasible) applications of visible surface animated movies. There are six major areas of interest: Vehicle Simulators, TV Advertising, Non-Advertising Television applications, Education-Training films, Motion Picture Special effects, and Design.

Vehicle Simulators:: Most applications of visible surface image technology to vehicle simulators involve real-time control of the vehicle operation. Some important applications do involve off-line picture production. For example, when trying to understand the vehicle rather than training the pilot, e.g. NASA-GE Skylab film, and when trying to evaluate whether the image quality is adequate for some real time vehicle trainer before buying one.
TV Advertising:: Animated TV commercials currently cost $35 million in production costs and are the largest market for animated film. The primary virture of the Visible Surface animation over traditional forms is the low production cost of the film. One way to take advantage of this low cost is to generate many action sequences with the same objects and the same basic theme. The sequence would then be varied within the advertisement's broadcast schedule. This variation on the same theme would increase the consumer's interest in the commercial and presumably enhance the product's appeal.
Non-Advertising Television applications:: Because of the ability to create animated sequences rapidly they could be applied to more timely subjects. For example, in the News area it may he possible to animate the national weather scene, or to animate simple political cartoons. Promotional and leading spots for regular programming and special events, e.g. athletic events, elections, space shots, etc. could be produced. Finally, film segments for National Educational Television shows, such as Sesame Street and Electric Company, would he a natural extension of the Advertising work.
Education-Training Films:: The primary advantages of Visible Surface films in this area are: (1) The three-dimensional model description of the objects of interest produce accurate images independent of the view from which they are observed; and (2) digital computer control of the movement of the objects and the camera in space can produce an accurate portrayal of the dynamics of an activity if that is the essence of the learning experience.
Motion Picture Special effects:: Some special effects, particularly in science fiction productions can be produced. The special effects will be especially effective where chroma-key can he used to superimpose live foreground action on some simulated background.
Design:: There are two applications of Visible Surface image technology: (1) Structural Analysis and (2) Aesthetics. A common method used in structural analysis is the finite element method. This technique uses a geometric description and a small area model of the material to find some structural property (stress, strain, temperature, etc.) at each grid point in the geometry. The result of the analysis and the geometric description provide natural input for the visible surface processor. The images created have proved valuable in both understanding the analysis results and checking the input geometry for correctness [14]. For designs created or modified in computer model form visualization of the model may be necessary to insure the aesthetic appeal of the final product.

1.4 Structure of the Thesis

Chapter 2 describes the basic tasks to be accomplished in creating the environment for dynamic visual communications. They are: System (breaking up the overall tasks into its major components and defining the communication paths between these components); Structure (defining the internal and external representation of objects in the system); and, Motion (finding ways of describing the dynamic activities in effective, compact ways).

The third chapter is devoted to understanding the specialized motion used in human-like walking. Descriptions are given of several systems built to allow easy specification of walks. Emphasis was placed on giving adequate flexability to allow for walks that conveyed the mood or emotion of the walker to the observer. No attempt was made to have the walk be anatomically precise.

Finally, the fourth chapter describes the film that was produced for the thesis. The film's purpose is to show that animated films can be made to convey new ideas to a wide audience.

CHAPTER 2: SYSTEM STRUCTURE AND MOTION

2.1 System

The overall goal of the Visual Communications environment was to take object descriptions and motion specifications and produce a sequence of Visible Surface pictures that corresponds to the specification. As the system was being designed there seemed to be too many reasonable alternative paths to follow, some toward more generalization, some toward greater efficiency. The decision was therefore made to break the problem up into subproblems with well defined communication paths between them. Another user can replace a particular implementation of a subproblem without disturbing the rest of the environment. Carefully controlled communication paths have also permitted subprocesses to be run on different machines with the information transmitted through some data communication mechanism (like files, subroutine interface, ARPANET,etc.).

The block diagram in Figure 2.1 shows the major elements of the system and the communication paths.

Figure 2.1 - System Block Diagram

2.1.1 The Modules

The image generator takes coordinate data that has been made into homogeneous coordinate space required by the clipping algorithm. It clips the data then produces either a line drawing or a visible surface image. The image is output on a Cathode Ray Tube for photographing.

The Frame Generator has two input channels and one output channel to the image generator. The object description input channel takes rigid object descriptions and basis vectors (which place, orient and scale the objects into a predefined space) and forms an internal structure within the frame generator for convenient transmission to the image generator. The frame by frame description specifies a transformation to be applied to each object to fix it in the object space coordinate system. Motion is then obtained by changing the transformation matrix from frame to frame to give the illusion of motion. The frame generator is described in detail in Appendix B.

The Movie Generator is not a single well defined program, but is rather a collection of programs for taking graphical or textual descriptions of the motion activity of each object and putting this into the frame by frame description required by the Frame Generator. One general purpose program, KEY FRAME, and several special purpose programs were built.

The Object Generator is also a collection of programs designed to generate object descriptions. To date most objects have been defined by hand calculating the coordinates of each vertex and then typing them in. Other techniques used include: two dimensional digitizing of a pair of photographs [15]; three dimensional digitzing of models [16]; and a program for defining and combining standard shapes (cubes, cylindrical solids, ctc.) [17].

2.1.2 The Communication Paths

The transformed polygon data path is best understood and best defined; it consists, in its most convenient form, of subroutine calls with parameters. The path describes the polygons in the image to be displayed, controls the image generator and automatic photographing equipment. There is no additional structure given to the polygons, such as that they belong to objects, because the image generator as it currently exists cannot use this information. The path is by no means static. It has undergone several changes to permit capabilities like homogeneous clipping, introduction of fog, priority, etc. Its primary virtue is not constancy hut rather it is an organizational building block. The use of this path has made it much easier to keep up with the constantly changing image generator technology.

In addition to the standard subroutines for image generation, there are transmission subroutines. These subroutnes have identical entries and parameters, but prepare the parameter data for transmission through some medium. The three transmissions subroutines that have been developed thus far are: (1) transmission from tho TENEX system to the Single User PDP-10 through shared core memory; (2) transmission from the Single User to a private disk pack for later use by the image generator; and, (3) transmission from an arbitrary TENEX system in ARPANET [18] back to Utah for image generation.

The Object description is a path using textual files to describe elementary objects. The general technique is to describe the object as polygons that form the surface to he seen. The polygons are described as a set of ordered points. The points are given by reference in the polygon description. The actual coordinate values of the points are also given in some coordinate space convenient for measuring (or computing) the object. A Basis (perhaps more than one) may also be provided for the object to transform the coordinates into a coordinate space that makes it easy to use.

The frame by frame description allows the user to specify the objects to be used in the scene, give the objects a hierarchical structure, and give commands which modify the objects in shape, size, position or orientation. The dynamic activity is then created by modifying the objects in a coherent sequential way that is time sychronized to the playback film speed (or the speed of whatever media the scene is being recorded on). This description also permits control over the image environment, such as resolution, fog level, type of film, etc. and control of the automatic movie camera.

The objects are placed in a three-dimensional space commonly referred to as the object space. Before transmission to the image generator the objects must be put into view space coordinates. This space has the observer looking down the z axis for convenient clipping and depth computation [19]. In order to change from object to view space, a specification is given of where the imaginary camera is to be in the object space and where that camera is to look. This provides an easy way for the film maker to photograph the simulated environment. The movie input and object input paths are not well defined and vary from program to program.

2.2 Structure

Within the frame generator there is the classic graphics structure problem solved by Sutherland in Sketchpad [20]; that is, that there are objects and instances of those objects. An object is a definition of the representation of the entity. An instance is an invocation of an object (presumably with some parametric changes).

In the Frame generator an object is represented as a list and an instance of an object is a pointer to that list. A list can have many pointers to it, each one representing another incarnation of that object.

The principle parameter of an instance effecting the called object is the Transformation Matrix. The Transformation Matrix is a 4 × 4 matrix which is able to map straight lines from one three-dimensional space into straight lines in any desired three-dimensional space [21]. In particular, it can translate, rotate and scale objects. It can also perform affine and perspective transformations (since they map straight lines into straight lines).

The hierarchical relationship between instances is preserved by concatenating (using matrix multiplication) the instance transformation matrices together; i.e., the son's transformation is relative to his father's. As an example, let's look at the human body. If you want to specify the position of the arm relative to the torso, the arm would be the son of the torso. One possible organization of the hierarchical relationship of the human body is shown in Figure 2.2.

Figure 2.2 - Structure of Man

2.2.1 Non-Rigid Objects

The transformation matrix seems to he ideal for animating rigid objects in space such as airports, airplanes, cars, boats, etc. For non-rigid objects, such as the human body, other techniques were found for describing motion. In order to use the convenient transformation matrix for as many objects as possible, two classes of non-rigid objects were formed called Semi-Rigid and Fluid.

Semi-Rigid objects are objects which are rigid hut are connected to some other object with a stretchable membrane. The limbs of the body are excellent examples of semi-rigid objects. The arm and forearm can be modelled as rigid objects connected by a stretchable membrane called the elbow. The basic technique used in semi-rigid objects is to specify some polygons whose points exist in another object. The points are transformed by the matrix associated with the object that owns the point. The polygons that bridge the gap between objects, therefore, appear to stretch.

Fluid objects are not constrained by a transformation matrix, though transformation matrices are used to place the fluid object in space. The technique used to specify movement is to provide several descriptions of the same object in some of its extreme positions and interpolate between these extremes. The objects must have identical descriptions (topologies) except for the coordinate values of the points. A specification of a fluid object is, therefore, given as two objects and a percentage change:

  NEWOBJECT←OBJECT1*PERCENT+OBJECT2*(1-PERCENT)

An example of a fluid object is the human lips. Modeling the lips as a semi rigid object would be difficult because of the flexibility of the lips. Alternatively the lips can he described as a fluid object by defining all the interesting end positions of the lips, e.g. regular, smile, frown, pursed etc., then interpolate between these end positions to get smooth transitions.

2.2.2 Intensity

For Visible Surface pictures an important element in specifying a polygon is the specification of how it is to be shaded. With the Gouraud shading [22], an intensity is required for each vertex in the polygon. The Gouraud algorithm then calculates the values at interior parts. The intensity at each vertex is found by estimating the normal of the object at that point. This is done by finding the planar equation of all polygons connected to a vertex and averaging their normals to obtain an estimated normal at the vertex. The normal must then be transformed from object space into the view space coordinates. The z component of the transformed normal is then used as the intensity value for that point.

The transformation is done by multiplying the planar equation by the transposed inverse of the matrix used to transform the coordinates.


[a b c d]*[M↑-1]↑T=[a' b' c' d']             [1]

where

[x y z w]*[M]=[x' y' z' w]                   [2]

The proof is as follows:

We know that the dot product of a point on the plane and the plane equation is zero.


[x y z w]*[a b c d]↑T = 0                    [3]

This is also true in the transformed space


[x' y' z' w']*[a' b' c' d']↑T = 0            [4]

from [2] and [4] we know

[x y z w]*[M]*[R]+[a b c d]↑T = 0            [5]

where [R] is the matrix we are looking for. [M]*[R] must be the identity matrix from equation [3]. Therefore, R=M↑-1 and equation [1] follows by transforming both sides of [4].

This computation provides lighting from behind the eye. To get a picture with the light source from an arbitrary place just find the matrix as if the eye were at the light source and invert.

This light source model produces rays which are all parallel. The reflected intensity is not a function of the distance from the light source or the observer. This light source model was chosen for two reasons: (1) it models the sun as a light source very well and, (2) it is computationally the easiest since the transformation for intensity is identical to the one for coordinates (just the value of the matrix is changed). Alternative light source models include non parallel rays eminating from the eye and distance dependent (1/R or 1/R↑2) sources.

2.3 Motion

In order to make a movie, the frame generator must he given a transformation specification for each active object in every frame. For a scene with 10 active objects, one minute's film requires 14,400 specifications for generation. Manual generation of this data would he a laborious undertaking even for the Animation Industry.

There are two basic techniques for reducing the amount of data that the designer needs to input: (1) a general program which takes a partial specificiation of the activity and fills in the missing data based on some general criteria and, (2) specialized programs which have knowledge of the objects being modeled and some constraints that they may have. The first type of program will be described in this section.

A third alternative, which has not been tried, is to build a general constraint solving program which is fed constraint information about the objects and their relationship to other objects and produces motion specifications automatically. This would be a kind of goal directed program of the type which has shown some limited success in directing mechanical manipulators in the Artificial Intelligence environment. Predicate Calculus systems like QA3 [23] and programming languages like PLANNER [24] have the generality needed, but are harder to use for non-trivial applications than direct algorithmic specification. These systems are developing more and more power and will some day be useful in the Visual Communications environment.

The general partial specification program is called KEYFRAME because it simulates one of the primary tools available to the animation industry. The industry structures itself hierarchically into chief animators, animators junior animators and in-betweeners. The chief animator may only do the character design and the story board. The animators may only make one in a hundred frames with instructions on how to fill it in. A junior animator may do every tenth frame with the in-betweeners doing the rest.

The KEYFRAME program tries to replace the junior animator's and in-betweener's function by taking descriptions of the state of the objects at key points in the action and interpolating the action in between. The state of an object is described by nine parameters: 3 position, 3 scaling and 3 rotation values. A mechanism for interpolation must also be specified and the frame number at which this state is to occur.

The program has a number of interpolation mechanisms available to it. All those implemented to date are of the fitting type rather than the approximating type. The fitting type was choosen because it guarantees that the curve (and hence the object) will be at a precise point at a precise time. Approximating routines have nicer characteristics in terms of mathematical stability and the amount of extraneous curvature introduced [25]. It is now felt that both types of functions are needed, giving the user his choise.

The fitting functions available (see Figure 2.3) are:

1. Constant: Stay at old value until a new key frame changes it.
2. Linear: Linearly change from old value to the new value.
3. Cosine: Fit a half cycle of the cosine function between each point.
4. Spline: Fit a third degree piece-wise polynomial through all of the specified points

Figure 2.3 - Interpolating Functions

The cosine and spline [26] functions are used most often. The cosine fit is perfect for things that stop at the specified points (most angular and scale changes do). The spline function is continuous in the zero, first and second derivatives and, therefore, models an object with mass very well (an object with mass has a continuous first derivative in motion). The problem with the fitting spline is that the individual spans may make wild girations in order to meet the boundary conditions at the knots. Approximating splines (such as B-splines) are much better behaved but do not pass through the knots. If a double knot exists in B-splines, the curve will pass through the specified point but the first derivative will no longer he guaranteed continuous. Other functions which may be useful in the KEYFRAME program include: (1) B-splines, (2) Nth order least mean square fit, and (3) second order fitting splines.

The nine variables of the state vector locate, orient and scale the object in space. In fact, the key frames for one variable may be different than another variable within the same object. Each variable is interpolated independently of the others. The nine values are used to form a transformation matrix to he associated with a particular instance of the object. The order that the nine values are used in forming the transformation matrix is critical in locating the object. A particular order was chosen with the expectation that an order vector capability would have to be added later. The need never arose in the six or seven films made using this program. The order choosen was to scale first, rotate about the z axis second, y axis third, x axis fourth, and finally translate the object. Knowledge of this order was important to the object designer. It did not affect data collection, however, because the basis was able to adjust the coordinate system to whatever system was desired. The existance of the Basis and its use by the KEYFRAME program greatly simplified the specification of state.

The premise used in the KEYFRAME program is that constraints are satisfied in the mind of the designer. The designer will sit down, enter an initial guess of the key frame specification desired, look at the curves generated by the program, perhaps modify the input data or change the fit type, then finally have the program produce a frame by frame specification file to input to the frame generator. The next step is to look at individual frames of the film on the line drawing scope then, if everything looks alright, get sample still frames in Visible Segment form. Then, a test movie might be made in black-and-white at reduced resolution to check the continuity of motion. Finally if the author is satisfied by the above steps a final color print will be made.

Several films have been made using the KEYFRAME program. One film was made to show the potential of Visual Communications in producing political cartoons. In 1971 a model for the U.S. capitol bulding (shown in Figure A.4) and the Boeing supersonic transport was built. The film represented the encounter of the SST and Congress portraying the possible outcomes. In the first segment the SST knocks the dome of the Capitol building off the roof. In the second segment the dome expands, gobbles up the incoming SST, then returns to its original size absorbing the airplane. In the third segment the plane passes through the dome neither object affect by the encounter.

A second film was made in a more serious vein showing a pilot's view of an airport landing. Figure A.2 shows the airport used in this film. The film was designed to show the use of Visible Surface pictures in the airplane pilot training environment.

CHAPTER 3: WALKING

3.1 Background

In the KEYFRAME system the author must keep track of the interaction between objects, e.g. if one object hits another the author must worry about the interaction between them. The advantage of the KEYFRAME system is that simple films can be generated very quickly and easily.

For more complex activities special programs have been written to generate the required frame by frame description. One interesting activity is the movement of the human body. There were two possible approaches to this problem: (1) anthropomorphic and, (2) characteristic. The first approach is to try to understand precise human body motion either by measurement or by modeling the joints, muscles, etc. Many anthropomorphic modeling studies [27] have been made, and at the University equipment has been constructed to measure the body in motion [28], The second alternative is to characterize the body movements so that they are believable and are rich enough to convey the mood or emotional content that the author is trying to represent. A comparison can be made of these two approaches to a photograph versus an artist's caricature of an individual. Both have their particular uses, neither being a priori better.

The interest in human body movement relates to the motivation of creating a communication vehicle for transmission of information and ideas. The human figure was felt to be critical as an identification mechanism for the observers. Tho figure might serve as the Master of Ceremonies (lecturer, ec.) or as a participant in a demonstration. Learning to control the body was felt to be a key ingredient in a useful Dynamic Visual Communications system.

Control of the figure has been learned through a succession of increasingly more complicated systems. Four systems have reached a state worthy of reporting; several others have been tried but were discarded. The four systems are called: (1) Football, (2) MAN1, (3) MAN2, and (4) MAN3.

3.1.1 Football System

The Football system was designed to generate sequences of football plays. A language was designed to describe the activities of the players and to control the camera. The activity specification contained the activity type, a speed control parameter (in units per frame), the frame number to start the activity and the frame number to end the acitivity. The activity types included: bending, falling, running, turning, and raising one or both arms. The parameter controlled both the speed and direction of the activity eg. ACT = BEND, PAR = 3, S = 25, F = 50 means that player will start to bend down at frame number 25, each frame bending 3 degrees more until frame 50. Parallel activities can be specified e.g. a player can simultaneously run and raise his arms. The program maintains a state vector for each player which is updated at each frame time by the activities specified for that frame. Simple constraints are applied to the players by the program e.g. arms and legs cannot go in a full circle, the player cannot bend backwards or through his legs, no part of the body may penetrate the ground, etc. One constraint that was desired but never implemented was interpenetration. A player, if told to, will run through another player. Though the effect was humorous, it was not considered desirable in most situations. The author was responsible for avoiding such collisions.

The football player (shown in Figure 3.1a) consisted of two arms jointed at the shoulder, no elbows or hands, two legs jointed at the hip with no feet of knees, and the torso jointed at the hips with a head fixed to the shoulders without a neck. The player was not only very easy to manipulate because of the limited degrees of freedom, but also symbolically represented a football player very well. The same player would certainly not be adequate for basketball, tennis, etc., nor would the simple activities chosen for the player be adequate.

Figure 3.1a - Football Player

The Running activity was the most challenging and interesting and sparked further interest in animating a walking figure. The parameter of the running activity specified the distance to be moved by the player each frame. The program then computed the angles of the two legs based on the pinned foot constraint. The angle of the drive foot was computed and the angle of the other foot was forced to be the negative of that foot. The drive foot had two cases: (1) the new angle was less than the maximum angle allows for the foot and, (2) the angle was greater and the drive foot must be changed. Algorithm 1 shows the computation of these two cases.

Algorithm 1: Football Walk - Football Player

3.1.2 Early Walk Programs

The MAN1 and MAN2 systems were designed to allow the author to describe walk cycles. These systems permitted the design of one-half of the cycle which then could be mirrored and reproduced ad infinitum. Knees were introduced in MAN1 and ankle joints in MAN2 adding to the complexity of the pinned foot constraint and the realism of the walk. The half cycle was arbitrarily broken into 50 CELs.

Figure 3.1b - MAN1

The author can specify the state of the entire figure or any of the joints at any (or all) CELs. The system will fill in the unspecified CELs by interpolation. The author can then watch the walk cycle on a line drawing display performed by a stick figure. He may then change or further specify the cycle. Finally, the author can ask for a frame by frame specification file to produce a Visible Surface film.

Figure 3.1c - MAN2

The state is specified as an angle of the joint. In MAN2 there were 11 joints: (1) Torso, (2) R. Arm, (3) R. Forearm, (4) L. Arm, (5) L. Forearm, (6) R. Leg, (7) R. Thigh, (8) R. Foot, (9) L. Leg, (10) L. Thigh, (11) L. Foot.

The structure of the elements of the body is identical to that shown in the Figure 2.2. Since a complete walk cycle is built by mirroring a half cycle, the first and the 50th CEL should be right to left mirror images.

The only constraint imposed by these programs was the pinned foot constraint. The right leg was assumed to be the power leg; based on the angles specified by the author, the program would calculate the position of the body in space so that the right foot remained in place. The change from right to left foot as the power foot occurred at the 50th CEL. The author had to make sure that both legs were on the ground at the 50th CEL in order to provide continuity in the reflection.

The two systems produced some interesting films of walk cycles. The pinned foot constraint produced a believable walk. The next step was to vary the angle parameters so as to produce walks which conveyed information such as the mood or emotion of the figure walking.

Very little information has been found in the open literature concerning how people walk under various emotional conditions. The best source of information found thus far has been from practicing animators. Many animators, however, are only able to describe the walk in two-dimensional terms (as their art form demands}; some, whose perceptions come from the real world rather than practice in animation, are able to describe very succinctly many qualities of walks.

One example related by Glen Fleck is a way to characterize the difference between a heavy walk and a light walk. The heavy walker struggles to lift his mass higher and higher during the walk cycle. Finally, he drops down quickly and starts his effoot over again. The light walker, on the other hand, has little effort raising his body up, then floats back down until he is ready for his next step. Translating this perception into angles for the leg is the job of the animator. The system should help the animator in constructing and viewing the walk rather than limiting him to one particular Light or Heavy walk. There are infinite variations on the theme described above which may add second order information to the character's mood.

Figure 3.2 - Example of Walks

Redrawn to make it easier to see what lines belong to which leg. Pink button attempts to show the animation.

The character's overall effect on the observer is sensitive to the surroundings and from where he is viewed. As an example, the bulky boxy figure of MAN1 viewed from below looks like an enormous menacing robot, while viewed from high above he looks more like a toy. The context will thus have a great effect on the observer's visualization of the character.

The MAN2 figure is much closer to reality, yet it is still very simple in terms of the number of polygons used. The greater realism was felt to be important in helping the observer identify with mood or emotion being shown. Care was taken with facial features, particularly the eyes and mouth. Though they are two-dimensional, they can he controlled like fluid objects. The pinned foot constraint for MAN2 is described in Algorithm 2.

Algorithm 2: MAN2 Pinned Foot

3.2 MAN3 System

Although the MAN1 and MAN2 programs generated reasonable walks, they were deficient in three major areas: (1) user interface; (2) number of objects and their degrees of freedom, and (3) the constraints available for the designer to apply. In order to solve these problems the program for specifying the walks was completely redone and the definition of the figure was extended (see Figure 3.5). This section describes the new program in terms of the user interface, degrees of freedom, and constraints available; a sample session is then presented; finally several walks are described which can be seen in the film that accompanies this document. Chapter IV describes in detail the accompanying movie.

Figure 3.1d - MAN3

3.2.1 User Interface

The basic approach taken in the MAN1 and MAN2 programs was to prepare the walk cycle data before hand, run the data through the program, watch a stick figure perform the walk, then go back and correct the original data. The feedback loop was long, cumbersome, and error prone. Increasing the degrees of freedom and the constraint options available could only make the situation worse. The problem was therefore ripe for an interactive graphics solution.

A program was written to use a calligraphic display and a coordinate input device (mouse, tablet, etc.) to design the walk cycles. Once again, the walk cycle was arbitrarily broken into 50 CELs. The designer uses the mouse and scope to design the first and last CELs and any or all CELs in between.

The designer constructs a CEL by modifying an existing CEL. The first selects a CEL of interest and is presented with a front and side view of the figure. If he is not satisfied with the CEL he may select a joint to be modified by pointing at it with the mouse, then choose a new position for the limb, again, with the mouse. Figure 3.3 shows this process being done; algorithm 3 shows the mathematics of this modification process. Modification of angles in the direction of motion are done on the side view; modification of angles to the left and right are done on the front view, and twist angles are indicated on a scale at the bottom of the picture.

Figure 3.3: Selecting Object to be Changed

Algorithm 3: Finding the New Angle

   B=ARC TAN(L1/L2)                                               (1)
   C=ARC TAN(L3/L4)                                               (2)
Therefore:
   A=ARC TAN(L1/L2)-ARC TAN(L3/L4)                                (3)
which can be reduced to
   A=ARC TAN((L1/L2-L3/L4)/(1+L1*L3/L2*L4))                       (4)
   A=ARC TAN((L1*L4-L3*L2)/(L2*L4+L1*L3))                         (5)
Where (4) is derived from:
   TAN(A+B)=SIN)A+B)/COS(A+B)
    =(SIN(A)*COS(B)+COS(A)*SIN(B))/(COS(A)*COS(B)-SIN(A)*SIN(B))  (6)
dividing(6) by COS(A)COS(B)
   TAN(A+B)=(TAN(A)+TAN(B))/(1-TAN(A)*TAN(B))
Setting X=TAN(A) and Y=TAN(B)
   TAN(ARC TAN(X)+ARC TAN(Y))=(X+Y)/(1-XY)
Therefore:
   ARC TAN(X)+ARC TAN(Y)=ARC TAN((X+Y)/(1-XY))
QED

When the designer is happy with the CEL he goes back to the program's top level to continue work. The variables modified are bound to the changed values during the interpolation process but may he modified by the application of constraints.

In addition to the modification process described, the designer may also type in specific values for variables or may use some preset positions. The use of preset positions was found to be important for situations that were hard to visualize in the stick figure; e.g. Hands on hips was necessary because the stick figure has no hips. The preset positions that were preprogrammed were: (1) Arms crossed in front; (2) Arms in back; (3) Hands on cheeks, and (4) Hands on hips. The preset positions were found by adjusting the half-tone man manually until a good position was obtained, then inputting these values into the program.

At the top level, the designer selects functions to be executed by moving the mouse cursor to the function name shown on the screen and pressing the mouse button. The program then either asks for input from the teletype or presents the designer with further options to he selected. The Ready light shown in Figure 3.4 indicates the mouse is armed for use.

The functions available to the designer are shown in Figure 3.4a. The semantics of these operators is described below:

SAVE:: Remembers the state of the current design session for later recall.
RESTORE:: Retrieves a previous session for further design.
FRESH:: Resets the program's data base to a virgin state.
CYCLE:: Presents a complete walk cycle to the designer for review.
GET CEL:: Retrieves a particular CEL from the data base for modification.
CONSTRAINTS:: Presents a choice of constraints to be applied to the data base.
MOVIE:: Generates a frame by frame description file for generating a Visible Surface movie.
PLOT:: Presents a graph of the value of a variable plotted versus the CEL number.
TYPE OUT:: Outputs the values of all of the variables for the first, last, and any bound CEL in test form.
UNBIND ALL:: Leaves the first and last CEL as is but unbinds all other values.

The CYCLE and PLOT commands were found very important in the feed back process of designing a walk cycle. The cycle command was very similar to the playback command in Baecker's work. Unlike his system, which permitted modification of variables by editing the graphs, this system works more directly with the object and uses the graphs for review. The PLOT command asks the designer which object and variable he would like to see, then presents a graph of that variable versus the CEL number, (see Figure 3.4c and d).

Figure 3.4 - Walk Program Display

When a walk cycle is completed, the designer may ask, by selecting MOVIE, to output a file that is the frame by frame description of that walk. The designer tells the system how many frames per cycle and how many cycles should be generated. The system automatically takes care of the mirroring required and positional information as the figure walks. This computation will be described further in the CONSTRAINTS section.

3.2.2 Degrees of Freedom

In the MAN1 and MAN2 systems only angles in the direction of motion were permitted to change.

In the new system, the shoulder, hip, and ankle joints were permitted 3 axis movements. A simplifying, but anatomically incorrect, assumption of a single point for the axis of all three rotations was made.

The HEAD was also given three axis rotation and height control above the shoulders. The height control was added to permit the implementation of the Head Steady constraint. A TOE joint was added to each foot and the HANDs were permitted to rotate about. the x axis (see Figure 3.5 and Figure 3.6).

Figure 3.5 - Structure of the Man Figure

Figure 3.6 - Coordinate System for Man Figure

In all there are 19 controllable objects and a total of 35 different variables in the MAN3 program. They are listed in Table 3.1. All the variables except HEAD height have a direct anthropomorphic equivalent and do not require explanation. An interesting question is: Have any useful variables been left out? The most important variable not included in this system was the twist component in the forearm. It was originally felt that the shoulder twist would be adequate but for putting the hands on another object such as the hips, cheeks, or the knees when in the sitting position, the forearm twist was sorely needed. In addition, for some exaggerated walks a rotation of the torso about the body's x axis might have been useful.

OBJECT	VARIABLES	CONSTRAINTS	DESCRIPTION
MAN	X	Pinned Foot (Cels 2-50)	Foreword Positional Information
MAN	Y	Pinned Foot (Cels 2-50)	Sideways Positional Infrmation
MAN	Z	Pinned Foot (all Cells)	Height
MAN	ROTZ		Rotation about body's longtitudal axis (for turning corners etc)
TORSO	ROTY	Stops	Rotation for bending obver
HEAD	ROTX	Stops	For cocking head to the side
HEAD	ROTY	Stops	For nodding
HEAD	ROTZ	Stops	For twisting head
HEAD	Z	Head Steady	Determines the height of head above the shoulders (controlled by Head Steady)
RARM	ROTX	Stops, Preset	Forearm to the side
RARM	ROTY	Stops, Preset	Forearm swinging inward
RARM	ROTZ	Stops, Preset	Twist
RFOREARM	ROTY	Stops, Preset	For elbow bending
RHAND	ROTX	Stops	To move hand from side to side
LARM	ROTX
LARM	ROTY
LARM	ROTZ	Same as Right
LFOREARM	ROTY
LHAND	ROTX
RLEG	ROTX	Stops	For sideways movement in walk
RLEG	ROTY	Stops	For forward movement in walk
RLEG	ROTZ	Stops	For twisting
RCALF	ROTY	Stops	Knee movement
RFOOT	ROTX	Pin Twist, Stops	Compensates for RLEG X rotation keeping foot flat on ground
RFOOT	ROTY	Stops	For lift in foot
RFOOT	ROTZ	Pin Twist, Stops	For pigeon toed walks
RTOE	ROTY	Dependent Toe	When weight's on the ball of the foot toe is always flat on floor
LLEG	ROTX	Stops
LLEG	ROTY	Stops, Ground Penetration, both Feet on Ground
LLEG	ROTZ	Stops
LCALF	ROTY	Stops, Ground Penetration, BFONG	Same as right
LFOOT	ROTX	Stops
LFOOT	ROTY	Stops
LFOOT	ROTZ	Stops
LTOE	ROTY	Dependent Toe

Table 3.1 MAN3 Objects

3.2.3 Constraints

As described above the designer uses the system to describe one half of a walk cycle. The system then is able under the designer's control to fix up the walk cycle in a number of ways. The process of fixing the walk cycle is the application of constraints to make the walk anthropomorphically more accurate or to accomplish some feat difficult to describe in the system provided. The following provides a detailed example of how constraints are used:

The half cycle starts with the transfer of weight from the left to the right foot and ends with the weight transfered back to the left foot. To produce the second half of the cycle, the state information for the right foot is used instead for the left foot and vice versa. Unless some special effects are required, a proper transition between half cycles require: (1) that the first and last CEL be mirror images of one another (i.e. the angles of the right foot in CEL 1 be the same as the angles of the left foot in CEL 50); and (2) that both the left and right foot be touching the ground in the first CEL (and therefore in the 50th CEL also.) Two special constraint operators were provided to accomplish these two requirements. One operator, called BFONGRND, adjusts the angles of the unweighted foot to raise or lower it to ground level. The weighted foot is put on the ground by another constraint to be described later. This operation is in the CEL design program, see Figure 3.3. The procedure for adjusting the foot is described in Algorithm 4. The second operator, called REFLECT, is in the constraints section (see Figure 3.4). This operator deposits the mirror image of the current state of CEL 1 into CEL 50. If the designer aplied the BFONGRND operator while designing CEL 1 then applied the REFLECT constraint before interpolating, the program will provide a smooth transition between cycles. Both these constraints were made optional rather than automatic because they were felt to be overly restrictive otherwise.

Algorithm 4: Unweighted Foot

Problem: Locate unweighted foot on the ground

Plan: Adjust the thigh and knee angle so that the foot is moved vertically up (or down) to the ground.

Solution: The lengths shown are easily computable from the given information. The problem is to find angles A and B.

    A=ANGLE(T ⨂ R)+ANGLE(R ⨂ LTH)
    B=(180-ANGLE(LTH ⨂ L2) + ANGLE(L2 ⨂ LCF)
Therefore:
    A=ARC COS(T/R)+ARC COS((R↑2+LTH↑2-L2↑2)/(2*R*LTH))
    B=(180-ARC COS((L2↑2+LTH↑2-R↑2)/(2*L2*LTH)))+ARC COS((L2↑2+LCF↑2-LF)/(2*L2*LCF))

INTERPOLATION

The overall philosophy of the system was to allow the designer to specify as little as possible about the walk yet produce the desired effect. A very early decision was therefore made to be able to interpolate the state of variables in CEL's in which they are not specified. The method used for interpolation was to force the user to specify the first and last CEL completely. The user could then specify any variable in a particular CEL which would then be considered by the system to be bound to that value. The system upon request (by picking the INTERPOLATE operator) would interpolate all unbound values using a natural cubic spline. The values of the variable in the first, fiftieth, and all bound CELs were used as knots in the spline. Cubic polynomial coefficients were found for each span between bound values. The intermediate values were found by evaluating the cubic at each CEL number in the span.

The value (V) and the CEL number (CN) were passed to a spline routine which produced the cubic coefficients. For n bound variables:

V[1],1           a, b, c, d 
......
V[i],CN[i]       a[i-1], b[i-1], c[i-1], d[i-1]
......
V[n],50          a[n-1], b[n-1], c[n-1], d[n-1]

The coefficients were then given to the span evaluator which would find all values for all 50 CELs:

V[j]=a[i]*j↑3+b[i]*j↑2+b[i]*j+d[i]
  for all j such that 1 < j < 50 where CN[i] < j < CN[i+1]

One exception was made in the interpolation routine: If only the first and fiftieth CELs were specified the variable was COSINE interpolated rather than linearly interpolated. This provided better transitions between cycles of the walk and less wooden motion by the figure since the joints started and stopped smoothly.

A COPY operator was added to help the designer control the spline. It was found that forcing a variable to have the same value on two successive frames provided smooth transitions between complex motions. The Paranoid Walk described below is an excellent example of this behavior.

PINNED FOOT

The Pinned Foot constraint was similar to the MAN2 computation. The difference was the addition of the twist and spread angles in the joints of the leg. Since the twist angle in the leg did not effect the direction of motion the only concern in twist was the angle of the foot when the weight was transfered from heel to toe or vice versa. The spread angle created sway in the figure (see the Drunk Walk for example). These variables add sufficient complexity to the constraint so that a new computation mechanism was created; rather than inverting the computation from a known foot position the foot position was computed from a known body position. The computed foot position was then compared with the expected position and the difference was used to translate the body. This was particularly convenient because the basic computation was required to produce the figure on the calligraphic display.

PINNED TWIST:

The body, hip and ankle have twist angles. The body twist is used to turn corners, etc.; hip twist is used to get the knees in or out for a knock kneed or bow legged effect; and the ankle twist is used for a pigeon toed effect. The ankle twist can also be used to keep the foot from twisting with respect to the ground while the other twist angles are free to change. By applying this constraint the twist angle of the ankle will be modified in CELs 2 through 50 to keep the foot in the same orientation with respect to the ground as in CEL 1.

DEPENDENT TOE

The weight of the body can be supported on either the heal or the ball of the foot. When the weight is on the ball the toe part of the foot must be adjusted so that the toe does not penetrate the ground. For the weighted foot this computation is very simple: if the ball of the foot is lower than the heal then the angle is the sum of the angles of the hip, knee, and ankle. The computation for the use of unweighted foot is slightly more difficult because the ball of the foot may be near but not on the ground yet need some adjustment to prevent penetration. In this case, the normal toe correction is modified by ARC SIN(H/LT); when H is the height of the ball above the ground and LT is the length of the toe. Proof is left as an exercise for the reader.

GROUND PENETRATION

The unweighted foot may accidently penetrate the ground in CELs that are interpolated. Although this may he desirable for some humorous effect, in general the designer would like the figure to remain above the ground at all times. Several different algorithms were tried to bring the unweighted foot up to the ground. The main problem was to find an algorithm which would provide smooth corrections as the foot moved through the walk cycle. The best solution found was the procedure shown in Algorithm 4. The forward motion of the foot is maintained while the foot is brought straight up to the ground. Although this procedure does produce a first derivitive discontinuity in the angle of both the knee and the hip, the effect is not noticeable during the walk. It looks like the unweighted foot is sliding along the floor.

HEAD STEADY:

After viewing early films it was felt that the head of the figure bobbed up and down more than natural. A constraint was therefore added to adjust the height of the head above the shoulders (by stretching the neck) in order to keep the head at a constant height above the floor. The use of this constraint on the original walks caused exaggerated neck stretching however. The real problem was in the original specification of the walk. By adjusting the key CELs, particularly by controlling the ankle angle, the amount of bobbing was significantly reduced making the walks seem more natural. This constraint was therefore less important than originally envisioned but was used in the Military walk (see description below).

FLYING:

The pinned foot constraint forces the weighted foot to be on the ground. For activities like running, jumping, skipping, etc. both feet must leave the ground. A facility was added to modify the height component of the figure by specifying the CELs at which the foot leaves and returns to the ground and the maximum height above the ground attained. A parabolic approximation is then added to the height for physically accurate jumps.

STOPS:

An approximation of the natural limits of each of the joints was provided to the programs; e.g. the elbow and knee bend in only one direction. The designer again can choose to apply the STOPS constraint or not at his own discretion. The limit angles were simple, uncorrelated minimum and maximum constraints on each degree of freedom. The model is not anthropomorphically correct but was adequate for producing the realistic walks.

3.2.4 Sample Session

This section presents a sample session of designing a walk film. The session is not for a specific walk but tries to show generally the process of designing walk cycles.

Start the program: 'The program requires a calligraphic display and mouse with the terminal. The data base is initialized so that all variables have a value of zero for all CELs. The program displays all the top level choices (Figure 3.4).
Design CEL 1: Select the GET CEL option and type the CEL number desired (e.g. number 1). Use the mouse to design the desired body position as described above; apply the STOPS constraint if desired; apply a preset position for the arms if desired; then force both feet to be on the ground. The TYPEIN option can then be used to position the figure in the Object Space by typing in coordinate values and orientation angle. The CEL data can then be saved in the data base and control returned to the top level by pressing mouse button 3.
Interpolate the basic cycle: At the top level select the CONSTRAINTS option; select the REFLECT operator; if necessary go back and modify CEL 50 to correct the reflection e.g. if the arms are folded in front CEL 50 would be modified so that the arms were a copy of CEL 50 rather than the reflection then select the INTERP ALL option.
Review the cycle: The cycle can now he viewed by selecting CYCLE. The values of variables can also be reviewed by using the PLOT command (see Figure 3.4).
Correcting the cycle: After reviewing the cycle the basic cycle can be changed by going back to Step 2. If the basic cycle is adequate then individual CELs can be corrected by getting a particular CEL and correcting it. The unweighted foot invariably must be modified to bend the knee significantly in the middle of the cycle. The weighted foot usually must be corrected because of the interpolation routine chosen. If there are no intermediate bound variables, the angles of the weighted foot are interpolated by the cosine function. This will cause the figure to start and stop each half cycle. In general, walks tend to maintain constant momentum in the forward direction. If a value for the weighted hip and knee is bound to a reasonable value, reinterpolation will result in a nearly straight line fit.
Reapply the Constraints: After the modifications have been finished, the data base must be adjusted to reflect those changes throughout. The data base must be reinterpolated and constraints must be applied such as PIN TWIST, FIX FT, HEAD STDY, etc.
Iterate until satisfied: Return to Step 4 until the walk cycle is satisfactory.
Output frame by frame description: The data base can now be used to create a Visible Surface movie by selecting the MOVIE command. The system requests the number of frames of film desired for each half cycle, the number of half cycles desired, and whether to start the walk on the left or right foot. A file is created which can be merged with other motion specifications, such as camera control, and used as input to the Frame Generator described in Chapter II.
Save the data base: The data base can be saved for later modification after reviewing the Visible Surface movie generated.

3.2.5 Examples of Walks

The system described above was used to generate several walk cycles in order to: (1) Show the capabilities of the system and (2) Verify the design philosophy used. Twelve walks were designed and are included in the accompanying movie. The walks are briefly described below:

Regular Walk: A walk was designed as a basis for comparison with other walks. No special emphasis was made with the walk. The head looks straight ahead. The arms swing normally, left arm out with right leg. A small twist is included in the arms so that the hand twists in towards center when in front of the body. Both legs are twisted out slightly adding a small sway in the walk. The foot stays nearly parallel to the ground in order to avoid excessive lift of the body.
Light Walk: The light walk is achieved by creating a long stride that goes up very high on the ball of the figure's foot. The high point of the stride was designed to occur in CEL 15 to give the appearance of the light stride described by Glen Fleck.
Heavy Walk: The torso is bent over and and the weighted knee bent throughout the walk. The figure is never able to achieve full height. The figure seems to be constantly struggling to get up.
Tired Walk: The torso is bent over, the head is tilted to one side, and bent down. The arms are left dangling straight down from the torso moving very little. The steps are very short with little rise and fall of the body. The unweighted foot is dragged back along the ground. The eyes are closed throughout the walk.
Pensive Walk: The same basic foot motion was used from the Tired Walk. The torso is bent over hut not as much as in the Tired Walk. The arms are folded behind the figure's back.
Sad Walk: Again the basic walk is identical to the Tired Walk. The arms are folded in front of the body and the mouth of the figure is given an unhappy look.
Angry Walk: The emotion in this walk was conveyed by having the right elbow up and in front and having the right hand move back and forth. The hand has only one finger up in the pointing position. The lips are frowning and the head is shaking slightly. To obtain a menacing affect the figure is observed from a position near the floor. The camera was placed there to create the illusion of a small child's parent angrily approaching him.
Drunk Walk: The principle affect of this walk was to introduce considerable weaving back and forth by the figure. The arms flail rather than move purposefully and the head nods up and down.
Happy Walk: The illusion of a happy walk is created primarily by accentuating the motion of the head. The head rocks from the left to the right side and hobs up and down at the same time. The lips are in a grin position and the eyebrows are raised. The arms swing very high like in the light walk. Thoe weighted foot bounces up and down twice during each half cycle further excentuating the bobbing affect.
Paranoid Walk: The figure takes a quick step then pauses while looking from side to side. The forward motion of the figure can be seen graphed in Figure 3.4c. The weighted leg position was designed in CEL 1 then copied into CEL 2 to give the smooth beginning. CEL 1 was then reflected into CEL 50 which was then copied into CELs 20, 21, and 49 in order to obtain the spline shown in Figure 3.4c. The figure's lips are pursed together and both hands are out in front and fisted throughout the walk.
Military Walk: The arms are swung very high with the elbows stiff. The weighted leg motion is similar to the light walk with a very long stride. The unweighted leg is brought through very high in an abbreviated Goose Step. This walk is observed from above looking down in an attempt to make the figure look like a toy soldier.
Groucho Walk: This walk attempted to mimic the walk of Groucho Marx. The torso is bent over considerably but the head is up. The arms are folded behind the figure's back. 'The eyebrows move up and down and the eyes move right and left.

Figure 3.1d - MAN3

CHAPTER 4: THE FILM "NOT JUST REALITY"

This chapter is presented in the style of a planning document for an educational film. The section consists of the planning tasks of an animated film described in Chapter I. They are: (1) Task Analysis; (2) Synopsis; (3) Research; (4) Treatment; (5) Script; and (6) Story Board.

4.1 Task Analysis

The purpose of the film is to introduce people from a wide background to the technology of dynamic visible surface pictures. They should gain an understanding of the potential of this new medium. They should be left with the feeling that a wide variety of actions can be specified for both the camera and the objects in the scene. The film should have a low key humorous vein to maintain interest and to re-emphasize the value of the combination of entertainment and education. Finally, my thesis committee should be shown the various walk cycles produced in the course of this thesis effort.

4.2 Synopsis

TARGET AUDIENCE:

Primary target: Thesis Committee. Other audiences: Graphics group at University of Utah to spur on further research in dynamic visual communications; ARPANET computer research community to incite other themes for films; Research groups and corporations in the other application areas for dynamic visible surface images enumerated in Chapter I; and the general public to show a humane and beneficial application of computer technology.

PURPOSE:

An educational vehicle to show the capabilities of dynamic visible surface images. To show the ability of the film to convey technical ideas to a wide audience. To demonstrate the ability of the Visual Communications environment to controlling the actions of a human-like figure in both talking and walking.

SUBJECT:

The use of dynamic visible surface images to communicate technical ideas.

CORE IDEA:

The medium is (truly) the message. The goal of the film is to inform people about this new technology; this is to be accomplished in two ways: (1) By describing the technology and (2) By using the technology itself as the medium of presentation of the ideas and concepts.

PLOT:

The film will use a single character to describe the Visual Communication environment created at the University of Utah. The style will he narrative form. The film willl begin with an introduction of the work. The narrator will then describe the process of making a film in the environment while live action slides are presented. (This is the dual of the normal use of animated film, e.g. the narration is usually live action with the process visualized using animation.) After the process is described, the narrator will show some of his body movements and walking ability. The fourth sequence in the film will show the walk cycles produced for the thesis. The final sequence will describe the goal of the work.

IDENTIFICATION:

The identification will be with the narrator for people who have had research ideas and have had to explain them over and over again to interested parties. These people will hopefully recognize the value of the film medium and the unique advantage of animated films of:

Complete control over the character and environment.
Identification with a character who can represent a group rather than an individual.
Visual appeal and entertainment.
A character personality can he developed and used in further films with quick group identification.

CINEMATIC FORM:

Because the goal of the film is to show the use of dynamic visible surface images, the form of the film will naturally be dynamic visible surface images. The second sequence in the film will he stills of the University of Utah's facility to show how the rest of the film was made.

4.3 Research

The main part of the research necessary to generate this film was the development of the environment (system, structure, and motion) for visual communication described in Chapter 2. Also necessary was the walking studies reported in Chapter 3. In addition, several new developments were necessary for the production of the film. They were: the specification and construction of the optical bench for the cathode ray tube and animation camera; the specification, interfacing, debugging, and calibration of the CRT; the calibration and color balancing of the CRT and filters for color work; the design of the character and scene used in the film; and the understanding of lip synchronization between voice and film and the development of the tools to input the lip synchronization data. The above tasks, with the exception of the final one were straight forward engineering and design problems. The lip synchronization problem is discussed in Appendix C.

4.4 Treatment

The film begins with the narrator asking the first person camera to he seated. The camera motion through the room shows one of the most effective uses of dynamic visible surface images: moving through space while keeping the objects in proper perspective. The narrator sits as the camera approaches. After the camera sits down the camera leaves first person. The camera positions must now be careful not to show the empty chair. A monologue is given introducing the subject matter, the character, and the locale (University of Utah).

The narrator rises and a movie screen automatically lowers in the background. As the narrator continues to describe the process of making movies the camera moves into the movie screen.

Eleven slides are then shown for approximately three seconds per slide with dissolves between them. The slides are roughly synchronized to the voice. The slides show: (1) A drawn sketch of the narrator, (2) Part of the text that produces the narrator's image in the system; (3) A line drawing of the narrator on the CRT; (4) A picture of the time-sharing PDP-10 console; (5) A picture of the single user PDP-10 console; (6) The Gould scope and amplifier; (7) The animation camera and controls; (8) the optical bench with CRT and camera in place; (9) A picture of me working on the Univac scope; (10) A close up of the image from the WALK program; and (11) A graph of one of the MAN parameters.

The final photograph fades to black and the image of the narrator standing fades in. The narrator describes his ability to move his body. He moves his arm, bends his elbow, hand, moves his head, torso, leg, bends his knee and foot. He then says that coordinating this motion into a walk is the hard thing to do. Then he walks forward toward the camera. Camera switches to side view for the second part of the walk.

The narrator then explains that the interesting problem is not in moving from here to there but in conveying some mood or emotion to the observer through the walk (This problem was identified by the Chairman of the Board of the Disney Corporation in a private conversation). The camera goes back to the movie screen for some black and white sequences of walks portraying the variability of the walks permitted.

Twelve walks are shown each one held for two seconds while a number is shown. Each walk is eight steps, each step is 26 frames long. The twelve walks are called: Regular, Light, Heavy, Tired, Pensive, Sad, Angry, Drunk, Happy, Paranoid, Military and Groucho (Marx). The final walk fades to black.

The image then fades in (back in color) to the narrator sitting on the chair. He describes the goal of this research in Visual Communication. He identifies the lack of real world physical constraint as a virtue of animated films and simultaneously floats in the air and turns his head 360 degrees around. He finishes the narration sitting in his seat; the camera pulls back with the titles burned into the images.

The film title is Not Just Reality. It is a parody on the IBM advertising campaign which claims that IBM is not just interested in data but in reality. This film presents an application of computer technology which goes beyond reality.

4.5 Script

4.6 and Story Board

(1)First person camera moves from doorway towards seat. Narrator jestures to sit, sits himself.	(1) Hello. Please sit down.
(2) First person camera sits down - Narrator's head follows.	(2) I'd like to tell you about some work being done at the University of Utah in Computer animation.
(3) Cut to far shot of Narrator.	(3) We built a system to take descriptions of real and imaginary objects and produce continuous tone drawings
(4) Cut to close up of Narrator.	(4) In case you haven't guessed, I am one such object, but we'll talk about my favorite subject, me, later.
(5) Truck out to far shot as Narrator stands up and jestures toward screen. Camera moves into screen.	(5) Let me show you briefly how the animation is done.
(6) Slide of sketch of Narrator.	(6) A description of an object is entered into the computer
(7) Slide of text.	(7) along with a description of the
(8) Slide of line drawing	(8) frame by frame action desired.
(9) Slide of T.S. console.	(9) These descriptions are
(10) Slide of S.U. console	(10) special computer program which automatically generates the movie.
(11) Slide of CRT	(11) Each frame is painted on a Cathode Ray Tube
(12) Slide of camera.	(12) and photographed by an animation camera.
(13) Slide of bench.	(13) The camera is advanced automatically when the frame is done.
(14) Slide of univac scope.	(14) The frame by frame action description can either be obtained by direct animation, input
(15) Slide of WALK program.	(15) or by a program which receives more simple input,
(16) Slide of graph.	(16) like key frame information, then automatically fills in the in-between frames.
(17) Fade into Narrator standing between the chairs. Full body shot. Narrator moves body parts.	(17) But enough of that. I would like to show you some of the things that I can do. Let me identify some of my movements. my arm goes up and out, my elbow bends, and my hand bends.
(18) Closeup of head movements.	(18) My head moves in all directions including up and down.
(19) Full body shot for leg movements.	(19) My torso twists and bends. I can lift my leg, bend my knee and move my foot. See, I told you I like talking about myself.
(20) Far shot for walk.	(20) Moving things are really pretty easy. The hard thing is to coordinate all this motion in some activity like:
(21) Side shot for walk.	(21) walking. (pause)
(22) Closeup of Narrator's face. Move toward screen. Fade to black	(22) Once you've got that problem licked, the next problem is to try to convey some mood or emotion to the observer through the walk using both motion cues and especially hand and face cues. For example, let me show you movies of some walks. Try writing down how you think I feel during each walk.
(23) Do twelve walks.	(23) This is walk number 1 This is walk number 2 - - - - - This is walk number 12
(24) Fade in with Narrator sitting	(24) Im afraid you'll have to read Barry Wessler's thesis to find out if you guessed correctly. That's his way of getting you to read his thesis. I wanted to tell you directly hut Barry wouldn't let me. Before I'm done I'd like to tell you why we are doing this work. The goal of this research is to create a communication medium for conveying ideas and information. Animation has been an extremely effective tool for communication as evidenced by both the Disney and Sesame Street successes.
(25) Narrator turns head 360 degrees and floats off the chair.	(25) Because of the ability to defy the laws of nature in an animated film and its ability to characterize the essence of an idea, feeling or emotion, it is both entertaining and informationally rich. But animation has always been very expensive. At the University of Utah we are trying to create the tools that will enable people to economically create a communication device (the film) which will present their ideas.
(26) Truck camera back for closing titles.	(26) in this exciting and unique way.

APPENDIX A: IMAGE QUALITY

This Thesis was made possible by a succession of developments in the Visible Surface algorithm and later efforts to improve the image quality obtained. This Appendix is a compendium of the results of that effort. The goal is to show the reader tho effect of these developments rather than to describe them in detail.

A.1 Lines Vs. Visible Surface Pictures

The most important development in producing high quality visible surface pictures must be the development of the Visible Surface Algorithms [3]. Figure A.1 shows the Focker triplane in both line drawing and Visible Surface form. Although some detail is lost in the Visible Surface picture due to occlusion, it has a greater Sense of Reality. The data base is courtesy of Bui Phong.

Figure A.1 - Line Drawing versus Visible Surface Pictures

A.2 The Jaggies

The Watkins Algorithm computes the length of a visible segment on a particular scan line and decides on which picture element to terminate the segment. This discrete decision causes the stairstep effect, called the Jaggies, apparent in the left half of Figure A.2. The effect of the Jaggies grows more severe as the angle subtended by a picture element gets larger. The Jaggies are most annoying in films; the steps move up and down the edges, distracting the observer.

Figure A.2 - The Jaggies

One technique tried to remove the Jaggies was to defocus the individual picture elements in order to create a fuzzy line. The result was the ultimate disappointment; the picture quality was reduced while the Jaggies were still visible. The procedure used in the right half of Figure A. was to create a transition segment whose length is dependent on the slope of the edge. An alternative to this method is presented in Section A.6. The Airport Data base is courtesy of Evans and Sutherland Computer Corporation.

A.3 Smooth Shading

The Visible Surf ace algorithms use object descriptions consisting of a collection of polygons. The intensity computation described in Section 2.2.2 results in the lower picture in Figure A.3; all the facets of the polygons are visible. Gouraud [22] found that linearly interpolating the intensity between adjacent polygons produced objects whose interior appeared to be curved. The boundary of an object still retains the flat characteristic.

Figure A.3 - Smooth Shading

A.4 Blurring

Normally the smooth shading can be computed from the data base coordinate information by averaging the normals of the polygons formed with a particular vertex. The average is then used as an approximate normal for that vertex. If care is not taken to allow some sharp edge boundaries in each object it may appear out of focus. The dome in the Capitol Building in Figure A.4 exhibits this property.

Figure A.4 - Blurring

A.5 Smooth Shading Basis

A problem can arise in computing the averaged normal and using one coordinate of that normal as the basis for shading the object. This problem was first identified by Gouraud [22] in what seemed to he a pathological case. Gouraud, in his example, showed how the letter W would be shaded incorrectly. He suggested that flat areas he added to the W to improve its appearance.

The problem appeared later in an important data base. The left side of Figure A.5 shows the line drawing and Visible Surface picture of Newman Mountain in Arizona. The line drawing is an orthographic projection of the data as seen from above the mountain. The data were obtained by finding local maximum and minimum points on a topographical map. Many of the vertices computed normals pointing nearly straight up, The result of smooth shading this data base is an extremely flat looking image.

The right half of Figure A.5 shows the result of algorithmically enhancing the data base by converting each edge and vertex in the original data base i nto a polygon. This process significantly increases the amount of data required (the new data base contains one polygon for each polygon, edge, and vertex in the original data base). The Visible Surface picture, however, has been significantly improved.

Figure A.5 - Smooth Shading Basis

A.6 Resolution

The Visible Surface algorithm samples the three-dimensional environment at each picture clement and decides which polygon is visible at that element. The algorithm decides on one polygon even though more than one may he visible in that element. This may occur because an edge passed through the element or because a polygon fits entirely between picture element. The result is that as resolution is reduced, detail is lost.

This phenomenon is particularly bothersome in movies, since small objects will he displayed in some frames hut not in others. The accompanying film Not Just Reality, demonstrated this phenomenon. The pictures in the center and on the left show the man at 1024, 512, and 256 element resolution, respectively; they are all antijaggied.

The two pictures on the right of Figure A.6 demonstrate a technique for making higher quality images. In both cases the Visible Surface Algorithm was executed at 1024 resolution but the picture was output at 512 and 256 resolution by averaging 4 and 16 elements, respectively. This technique helps soften the flickering in and out of small detail and reduces the jaggies problem without additional processing. The best choice of resolution is believed to be 2048 for the algorithm and 512 picture elements per scan line for output. This would provide 16 point averaging.

The averaging technique is certainly not without cost. It is estimated that 16 point averaging would slow the generation of a picture by a factor of 4 to 16. Its attractiveness is that it produces high quality pictures with very small addition to the basic Watkins algorithm. Real time pictures of lower quality can he produced for feedback on a film's accuracy, impact, etc., then a high quality film or video tape can be produced at slower speeds.

Figure A.6 - Resolution

A.7 Gamma Correction

Much higher quality images are possible by carefully calibrating the output display and recording medium to correct for intensity distortions. The traditional method for correcting these distortions involves functionally correcting the intensity in order to make it look correct; e.g., a picture element of one-half the intensity looks one half as bright. Generally an exponential function is used such as i[out]=i[in)^(gamma). The top picture in Figure A.7 has no correction. The middle picture has a gamma of .333 which has empirically been found to be the best functional correction.

The digital nature of the Visible Surface algorithm permits much better control of intensity correction than functional correction. The technique used was to create a correction table for each intensity value available by measuring a test pattern and changing the table values until the test pattern is accurate. The bottom picture in Figure A.7 shows the image quality using the table correction technique. Data base courtesy of James Clark.

Figure A.7 - Gamma Correction

A.8 Variable Light Source

The technique described in Section 2.2.2 allows the position of the light source to he located under user control. The pictures in Figure A.8 show the Volkswagen with different light source position. The images with light sources not at the eye have been found to he disturbing to some people because of the absence of the expected shadows.

Figure A.8 - Variable Light Source

APPENDIX B: THE FRAME GENERATOR

The frame generator module has two input channels and one output channel. The output channel is described elsewhere [29]. The purpose of the section is to describe tho two input channels sufficiently to allow others to use the module. The Object description channel is specified, then the Frame by Frame description channel. One caveat is in order: The system was written in SAIL [30] and the syntax of the two channels was greatly affected by the SAIL string scanning facilities.

B.1 Object Description

The purpose of this channel is to describe the objects that are to be manipulated by the Frame by Frame description channel. There are two types of objects that can be described: primitive objects and rigid objects. The primitive object is the simplest entity that can be manipulated by the system. It consists of the polygons and vertices that comprise the object. The rigid object is a construction of primitive objects and other rigid objects. All of the substructure of rigid objects is forgotten by the system after the definition is completed, thus making it impossible to modify its internal organization (hence it is rigid).

B1.1 Primitive Objects

The purpose of the primitive object is to describe the polygons that form the object by specifying the vertices that comprise the polygon. In addition, several attributes of the objects and the polygons can he specified.

The vertices in the object can have arbitrary six-character names and may have coordinate and/or intensity information provided. The vertex name (VN) can take two forms: simple and compound. The compound form is used to make creases in smooth shaded objects. It is specified by a period in the VN; e.g., VNC.VNI, where VNC specified the coordinate name and VNI specified an intensity name. All vertices of the name VNC will have identical coordinates and all vertices of the name VNC.VNI will have identical intensity values. For example, a crease can he formed by using the vertex name VNC.1 for the polygons on one side of the crease and VNC.2 for the polygons on the other side. The simple form of the VN, such as VNC, is equivalent to the name VNC.NIL. A vertex has eight values associated with it, the four homogeneous coordinates x, y, z, w and the four intensity components a, b, c, d. If the intensity values are not provided by the user they will be computed automatically by the system based on the geometry of the object. The user may describe a vertex by 3, 4, 6, 7, or 8 values. The meaning of these alternatives is shown in the table below.

	3	4	5	6	7	8
Simple	Coordinates x,y,z,w=1	Coordinates x,y,z,w	Illegal	x,y,z,w=1 a,b,c,d=0	x,y,z,w=1 a,b,c,d	x,y,z,w a,b,c,d
Compound	Intensities a,b,c,d=0	Intensities a,b,c,d	Illegal	Same	Same	Same

Polygons are specified as an ordered sequence of vertex names. There can he as many polygons as desired in an object. A polygon has three attributes: Color, Priority, and Hole.

COLOR:color: Color is a description of color of the polygon as a mixture of the three primary colors: Red, Green, and Blue.
PRIORITY:α: The priority of a polygon allows layers of paint to he put on other polygons. It tells the system that if it cannot differentiate the depth of two polygons; it should paint the higher priority polygon.
HOLE:: If Hole is specified the polygon after the attribute is cut out of the polygon preceding the attribute. Non overlapping holes are valid polygons.

The overall object can also be given attributes such as: Inside, Facet, Power, Diffuse, and Basis.

INSIDE:α: Inside tells the system whether the polygons are described normally clockwise or normally counterclockwise, so that the system can discover backward facing polygons.
FACET:: The Facet command tells the system the object is not to be smooth shaded. A considerable amount of storage is saved if an object is faceted.
POWER:α;DIFFUSE:α: After the intensity of a vertex is transformed, a further computation is made according to the following formula:; I[O]←i[I]↑POWER*(1-DIFFUSE)+DIFFUSE
BASIS:Basis Name(16 numbers): The basis operator specified a new coordinate system for the object. A convenient coordinate system for specifying an object may not he convenient for controlling its motion. The basis matrix translates the specification coordinate system into the use coordinate system.
INTERNAL:lvn gvn
EXTERNAL:lvn gvn: The final commands in the primitive object description are used to specify semi-rigid objects. An INTERNAL point is a point to he made known globally so that other objects can use it as EXTERNAL. Internal points have their coordinates defined within that object; external points have their coordinates defined in some other object. The first VN specified is the local name, while the second is the global name of the vertex in both commands. Complex object designers must take care to insure global names are unique.

The complete syntax of primitive objects can be found in Figure B.1.

B1.2 Rigid Objects

The rigid object description was created to allow the construction of simple primitives into more complex structures. The rigid object description format is nearly identical to a subset of the frame by frame description format. Construction of a rigid object looks like the specification of one frame of a movie. The basic idea is to create a list structure of objects and instances of objects, then possibly modifying some of the attributes of those objects and instances; e.g., color, transformation matrices, etc. The basic commands available are: ADD, TRANSF, OBASIS, OCOLOR, REFLECT, and SUBS.

ADD:Structure: The Add operator is used to create the data structure in the frame generator. The structure is specified in linear form using nested parentheses. The object is specified within the parenthesis. A name must be provided in front of the open parenthesis which is the name of that particular instances of that object. The object itself is not named but can be referred to by any instance name that points to the object. Thus multiple references can be made to a single object.; Primitive and rigid objects can he added to the structure by the use of brackets. Within the brackets a file name and object selector are included to specify where to obtain the object definition. An example of an add statement is shown below:; ADD:AIRPLN (LFTSID(WNG[PL.HTL,WING],FUS[PL.HTL,FUS],; CANOPY[PL.HTL,CANP]),RTSID(LETSID),TAIL[PLHTL,EMPG ]);; where WING, FUS, and CANP are primitive or rigid objects, describing the left side only of the airplane, and the EMPG describes the whole tail. RTSID is a second instance of the object pointed at by the instance named LFTSID. The mechanism for reflecting tho RTSID instance will be described below.
TRANSF:INS SUBOP: The TRANSF operation forms a 4 × 4 transformation matrix and assigns it to the instance named INS. The transformation matrix is used to position, scale, or rotate the object pointed to by the instance. The transformation matrix is formed by the SUBOP expression which is described below:; ROTX α; ROTY α; ROTZ α : Rotate the object about its origin around the x, y, or z axis by α degrees.; SX α; SY α; SZ α; SCALE α: Scale the object from an axis by multiplying the distance from the origin by α. SCALE operates on all three axes.; ABOUT (3 numbers): Move the object's origin to the point specified for rotation or scaling.; AT (3 numbers): Move the object from the origin to the position specified.; BY (16 numbers): The sixteen numbers form a 4 × 4 transformation matrix.; WITH Basis Name: The named basis is inverted and premultiplied into the existing transformation matrix.; AXIS Basis Name: The existing matrix is premultiplied by the inverse and postmultiplied by the basis itself.; CALKAT (7 numbers): Forms a transformation matrix which will map the object space into the view space required by the Visible Surface processor. The seven numbers consist of: the three coordinates of the position the object space is viewed from; the three coordinates of a point in object space in the center of the desired picture; and the field of view desired (in degrees of aperture).; LTSRC (6 numbers): Forms a transformation matrix for finding the intensities of a light source at an arbitrary point in space. Two points are specified, forming a vector; all light is parallel to that vector.
OBASIS:Basis Name (16 numbers): OBASIS creates a basis matrix for the rigid object.
OCOLORLINS color: All of the polygons in the object pointed by the instance INS which do not have a color are given the color specified.
REFLECT:INS axis: If an object is mirrored (by scaling by -1) a crease may appear along the mirror axis in a smooth shaded object. This command removes the crease.
SUBS:INS1 x y z zc INS2: Tho SUBS (substitution) operation allows more and more complex objects to be substituted for simpler ones. If upon transforming (x,y,z,1), zt<zc, then the second instance name will be used for picture generation rather than the first. This procedure can be continued indefinitely using more complex objects as one gets closer to the object. The direction of the test was chosen because more time is taken in chaining down through objects so that the simplest object is executed fastest.

B1.3 Library Files

Rigid object and primitive object definitions can be combined into a single file to reduce system overhead. The primitive object must start with: BEGIN:Name and finish with END:Name. The rigid object must start with CONSTRUCT:Name and finish with CEND:Name. The Name in both cases is the object selector name described in the ADD operation. The creator of a library must be aware that the file is sequentially scanned each time a call is made so that very long library files are inefficient. The file scanning mechanism is fast enough so that most logical library organizations will be of reasonable size.

B2. Frame by Frame Description

The Frame by Frame description is a superset of the rigid Object format. The command set at this level is considered very open ended. More and more commands will be added as required. The ADD, OBASIS, TRANSF, OCOLOR, REFLECT, and SUBS commands are identical in syntax to their description in Figure 3, the use of the TRANSF is for both static positioning for rigid objects and dynamic positioning in between frames of a movie.

In addition to the Rigid Object commands several other commands are included for output and dynamic control. Below is listed each command and a short description of its uses.

DELETE: INS

The DELETE command frees storage in memory. DELETE should not be used to remove objects from the picture unless storage is at a premium. Otherwise the selectiveness of the display command should be used for removing objects.

INTERPOLATE: INS1 INS2 INS3 PERC

This command causes the points (coordinators and intensities) of INS1 to he replaced by the interpolation of the points of INS2 and INS3 by the following formula: INS1 ← INS2*PERC + INS3#(1-PERC). Interpolate is used to modify fluid objects.

PICPAR subcommands

PICP AR is used to set some variables that affect the picture.

HI:α LO:α:: Normally 1 and 0, respectively, these parameters can create special effects, such as fades; the spacing between high and low is a scaling constant for intensities and LO is an offset.
BACKG:α:: Normally 0 (black) BACKG must lie between 0 and 1 (white), specifies an intensity for the screen if no polygons are present at a picture element.
XRES:α;YRES:α;RES:α:: Affects the resolution of the picture on the display. Normally set to 512 resolution in both axes.
REMOVE:α:: If α=1 the Frame Generator will remove all back facing polygons before transferring them to the Image Generator. Normally set to 0.
FILM: type:: Type can be COLOR or B & W; appropriate action will be taken to produce the correct image for that film.
FOG (3 numbers): Sets the fog parameters.
HITHER:α:: Sets the Hither clipping plane between the camera (C) and look at (L) point by: H=C(1-α)+Loc.

DISPLAY: INS

This command walks through the list structure starting at INS and outputs the data it finds to the image generator. No initilization is done so that successive calls to display appends information in the image generator.

FLASH: INTEGER

This command displays the picture contained in the visible surface processor without reinitializing it. This permits several copies of the same picture to be output without walking through the structure again. The movie camera is automatically advanced. The output is controlled by the FILM command; Black and White film is assumed.

FRAME: INTEGER

The FRAME command outputs the picture and reinitialized the Half Tone buffer. The integer value is used for frame count reference. It is optional.

FACET: INS

This command causes the selected INS to be output in facet form. The data is changed so that INS is permanently faceted.

PAUSE:

The PAUSE command waits for any integer to be input from the controlling TTY then reinitializes the Half Tone buffer. This command is necessary for using the line drawing scope and outputing single half tone pictures on film.

SAVE: File Name

The structure created is stored in image form in the specified file.

RESTORE: File Name

A structure that has been previously saved is restored. This operation is much more efficient than recreating the structure from primitive object descriptions.

Figure B.1 - BNF for Primitive Object Description

<primitive>::=     <command><primitive>|<command>
<command>::=       TITLE: Any String <←>|
                   INTERNAL: N1 N2 <←>|
                   EXTERNAL: N1 N2 <←>|
                   POWER: α <←>|
                   DIFFUSE: α <←>|
                   BASIS: Name (16 numbers) |
                   FACET: <←>|
                   INSIDE: α <←>|
                   <c&p>||
                   POLYS: <←><polys>|
                   POINTS:  <←><points>|
                   / Any String <←>

<polys>::=         <vn><vn><mvn><←><poly>|
                   <c&p>*lt;polys>|
                   nil
<mvn>::=           <vn><mvn>|nil
<points>::=        <vn><c&i><←><points>|nil
<c&i>::=           (3,4,6,7, or 8 numbers)
<c&p>::=           COLOR:<color spec><←>|
                   PRIORITY: α<←>|
                   HOLE:

<vn>::=            Name | N1.N2
<color spec>::=    <clr nm>α|BLACK|BROWN|RED|
                   ORANGE|YELLOW|GREEN|BLUE|VIOLET|GREY
<clr nm>::=        R|G|B|W|S
<←>::=             Carriage Return|;

Figure B.2 - Rigid Object Syntax

<rigid>::=             ADD: <add expression><commands>
<rcommands>::=         <rcomm><tarcommands>|nil
<rcomm>::=             ADD: Ins <add expression>;|
                       TRANSF:Ins <transf expression>|
                       OBASIS:Name (16 numbers)|
                       OCOLOR:Ins <color spec><←>|
                       REFLECT:Ins <axis>|
                       SUBS:Ins1 (4 numbers) Ins2 <←>
<axis>::=              SMX|SMY|SMZ
<transf expression>::= <transf>|<calkat>
<transf>::=            <transf factor><transf>|nil
<transf factor>::=     ABOUT (3 numbers)|
                       AT (3 numbers)|
                       ROTX α|ROTY α|ROTZ α|
                       SX α|SY α|SZ α|SCALE α|
                       BY (16 numbers)|
                       WITH <basis nm>|AXIS<basis nm>
<basisnm>::=           Basis Name| Ins. Basis  Name                      
<calkat>::=            CALKAT (7 numbers)|
                       CALKAT (7 numbers) LTSRC (numbers)
<add expression>::=    <object>
<object>::=            <sub-objects>
<sub-objects>::=       <sub-object>,<sub-objects>|<sub-object>
<sub-object>::=        Ins<object>|Ins|Ins[<file-spec>]
<file-spec>::=         File Name | File Name Selector

Figure B.3 - Syntax for Library File

<library>::=           <primitive>|<lib>
<lib>::=               <prim><lib> | <rig><lib>
<prim>::=              BEGIN:Selector<←><primitive>END:Selector<←>
<rig>::=               CONSTRUCT:Selector<←>|<rigid> REND:Selector<←>

Figure B.4 - BNF for Frame by Frame Description

<frame by frame description>::= <fcommands>
<fcommands>::=         <rcomm><fcommands>|<fcomm><fcommands>
<fcomm::=              DELETE:Ins<←>|
                       INTERP:Ins Ins2 Ins3 α<←>|
                       DISPLAY:Ins<←>|
                       FLASH:α<←>|
                       FRAME:α<←>|
                       PAUSE:<←>|
                       OFACET:<subpar><←>|
                       PICPAR:Ins<←>|
                       SAVE: File Name <←>|
                       RESTORE: File Name <←>

<subpar>::=            HI α |
                       LO α |
                       LO α |
                       BACKC α |
                       YRES α |
                       XRES α |
                       RES α |
                       REMOVE α |
                       FILM String |
                       FOG (3 numbers)  |
                       HITHER α

APPENDIX C: LIP SYNCHRONIZATION

The lip synchronization problem remains barely touched by this thesis. The approach taken here was modeled after the traditional approach to lip synchronization. That is: a voice track was recorded on sprocketed magnetic tape; the tape was then manually analyzed with a sound reader on a film synchronizer; and the analysis was recorded on bar charts (see Figure C.1). The analysis was done using a model offered hy Madsen of looking for special sounds like: B, M, P (lip closure); F, V (lower lip touches top teeth); O, W (oval mouth, no teeth showing); all other sounds; and no sound. The voice track was searched for these sounds and their time (i.e. frame number) of occurrence recorded.

Figure C.1 - Bar Chart

The lip formation for each sound was designed for the narrator and intermediate positions checked. The lips were made Fluid objects permitting smooth interpolation between the extreme positions.

A program was written to graphically resemble the bar chart for easy input and editing of the lip synchronization data. The program displayed two seconds of data representing the occurrence of a lip position at a particular time with a triangle (see Figure C.2). the program used the mouse to edit the data. The end result was a frame by frame description of the configuration of the lips.

Figure C.2 - Speech Analysis Input Program

The lip data was then used to produce a film with only a closeup of the narrator's face to verify the lip synchronization data. The result was less than exciting! The data was input to provide slow transitions, wherever possible, to display the interpolation technique. The result was an eerie, dumb look by the speaker. The data was edited considerably with better but far from outstanding results.

One explanation is offered by Madsen [13] Realistic or semirealistic cartoon characters offer the greatest problem in precise lip movements, since they invite comparison with real people. The more fanciful the character, the greater latitude we have in animating facial expressions to suit our convenience and purposes. Further improvements could have been made (but weren't) by redesigning the extreme positions of the lips and by reanalyzing the voice track utilizing the experience of the first attempt.

More exciting is to envision better ways of doing the lip synchronization in the future. One way is to use the speech analysis research going on in the ARPA community [31] to obtain the lip synchronization data automatically. The task is far easier than the generalized analysis problem because the text of the speech can he given to the program before hand. The program need only locate the sounds from the waveform.

Another way to obtain the data is to have a puppet animator input the data in real time by sensing the animator's finger positions. Jim Henson's MUPPETS from Sesame Street have well synchronized lip movements as almost every pre-schooler can tell you. The information can then be reviewed and edited by existing techniques.

BIBLIOGRAPHY

1. Webster's Third New International Dictionary. Springfield, Mass.: Merriam Publishing Co., 1971.

2. Baecker, Ronald. Interactive Computer Mediated Animation. Project MAC Report MAC-TR-61, June 1969 and UAIDE 1970.

3. Watkins, Gary S. A Real Time Visible Surface Algorithm. University of Utah Computer Science Technical Report UTECH-CSc-70-101. Salt Lake City, Utah, June 1970.

4. Winkless, N. and Honore, P. What Good is a Baby?. AFIPS Conference Proceedings, Fall 1968.

5. Weiner, D., and Anderson, S. A Computer Animation Movie Language for Educational Motion Pictures. AFIPS Conference Proceedings, Fall 1968.

6. Baecker, Ronald. Picture-Driven Animation. AFIPS Conference Proceedings, Spring 1969.

7. Gombrich, E.H. Art and Illusion. New Jersey: Princeton University Press, 1961.

8. Knowlton, K.C. A Computer Technique for Producing Animated Movies. AFIPS Conference Proceedings, Spring 1964.

9. Bouknight, W.J. A Procedure for Generation of Three-Dimensional Half-Toned Computer Graphics Representations. CACM, September 1970.

10. Wild, E.C; Rougelot, R.S; and Schumacker R.A. Computing Full Color Perspective Images. General Electric Technical Information Series R71ELS-26, May 1971.

11. Catmull, Edwin. A System for Computer Generated Movies. ACM 72 Conference Proceedings, July 1972.

12. Parke, Fredric. Computer Generated Animation of Faces. ACM 72 Conference Proceedings, July 1972.

13. Madsen, Roy. Animated Film: Concepts, Methods, Uses. New York: Interland Publishing, 1969.

14. Christiansen, H.N. Displays of Kinematic and Elastic Systems. Proceedings Matrix Methods in Structural Mechanics, Wright Patterson AFB, Dayton, Ohio, October 1971.

15. Parke, Fredric. Computer Generated Animation of Faces. University of Utah Computer Science Technical Report UTEC-CSc-72-120, June 1971.

16. Stehl, W.A. Notes on the Preparation of Data for Use in Picture Generation. E&S Internal Memo, March 1971.

17. Archuleta, Michael. Generation of Objects for the Watkins Hidden Line Algorithm. University of Utah Computer Science Memo 7002, November 1970.

18. Roberts, L.G., and Wessler, B.D. Computer Network Development to Achieve Resource Sharing. AFIPS Conference Proceedings, May 1970.

19. Newman, William, and Sproull, Robert F. Principles of Interactive Computer Graphics. Now York: McGraw-Hill, 1973.

20. Sutherland, I.E. Sketchpad: A Man-Machine Graphical Communication System. MIT Lincoln Laboratory Report 296, January 1963.

21. Roberts, L.G. Homogeneous Matrix Representation and Manipulation of N-Dimensional Constructs. MIT Lincoln Laboratories, MS 1405, May 1965.

22. Gouraud, Henri. Computer Display of Curved Surfaces. University of Utah Computer Science Technical Report UTEC-CSc-71-113, June 1972.

23. Greene, C. Cordell. Application of Theorem Proving to Problem Solving. IFIPS Conference Proceedings, 1968.

24. Hewitt, Carl. Procedural Embedding of Knowledge in PLANNER. Second International Conference on A.I., September 1971.

25. Riesenfeld, Richard. Applications of B-Spline Approximation to Geometric Problems of Computer-Aided Design. University of Utah Computer Science 'Technical Report UTEC-CSc-73-126, March 1973.

26. Greville, T.N.E. Introduction to Spline Functions. in "Theory and Applications of Spline Functions". New York: Academic Press, 1969.

27. Plagenhoef, Stanley. Patterns of Human Motion: A Cinematographic Analysis. Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1971.

28. Burton, R.P. Real-Time Measurement of Multiple Three-Dimensional Positions. University of Utah Computer Science Technical Report UTEC-CSc-72-122, June 1973.

29. Catmull, Edwin. The SAIL Interface to the Watkins Processor. Internal Memorandum, July 1972.

30. Swinchart, C.D, and Sproull, R.F. SAIL Manual. SAILON No. 57.2, AI Project, Stanford University, 1971.

31. Newell, Alan. Speech Understanding Systems: Final Report of a Study Group. Carnegie Mellon University, May 1971.