CS 335 Project Guideline

Goal

The purpose of the project is to have hands on some aspects of multimedia systems and to gain experience with primary research methods. Projects will usually involve some implementation and experimentation as well as a written report (2-5 pages long). It is important to clearly identify a specific problem you want to tackle and present solutions by combining existing methods or proposing new solutions. The proposal is due in the 6th week of the semester. The project is due in the last week of the class.

Group Project Policy

Group projects will be allowed, but in this case a report must also be handed in describing the work that each student contributes to the overall project.

Possible Topics

The following project examples are meant to provide ideas and guidelines. You are free to develop your own topics. It is better to focus on small interesting problems instead of trying to accomplish everything.

1) Human Computer Interaction using Cameras and Gestures

Keyboards and mouses are standard computer input devices. In many cases, these standard input schemes do not perform as well as other more flexible methods. Controlling computers using cameras and gestures is one of such approaches, which can be quite useful for many real world applications. For example, using cameras and computer vision algorithms, people can control the cursor on a computer screen by simply moving faces or eyes. You can also write a game that is controlled by body gestures instead of traditional game pads.

2) Video Database for Office Surveillance

In this project, you will design a small scale automatic video archiving system for office (indoor) surveillance applications. The system may have functions such as being able to detect the number of people and their locations in the office; automatically indexing and storing video into databases; allowing searching or browsing for specific events, etc. Video processing is not required to be done online.

3) Automatic Lecture Archiving System

Videos and PowerPoint slides are often both available for lectures and conference talks. Some media player can in fact play synchronized video and PowerPoint slides. But aligning videos with PowerPoint slides is a hard problem if both of the media sources are obtained separately. You need to find out some possible solutions to automate this process by capturing video and PowerPoint slide show actions simultaneously. After data capturing is completed, the system should be able to show a video and PowerPoint slides with proper synchronization.

4) Video Content Analysis for Sports

Multimedia has been widely applied to sports game broadcasting and partially to sports training and sports data archiving. In this project, you study multimedia applications that may be used for different aspects of sports games. For example, one application is to track players on sports field and automatically log information about their movements and actions.

5) Automatic Video Parsing (Movies or TV Programs)

TV programs are often shown in a linear fashion. If computers can automatically parse the content of a TV program, index the video and generate a cross-linked structure, it will greatly save people's time in finding useful information in these TV programs. Automatically recognizing contents of video and audio is still an unsolved problem. But some relatively simple tasks such as video shot detection, small class audio classification, speaker identification can be done reasonably well. There are also many other clues you can use, such as close captions, texts embedded in the images, etc. The content analysis results do not have to be 100% accurate. Partially correct results can still be used to achieve good video summary. You can also choose to work on other types of videos, such as movies.

6) Video Based Animation

Animating cartoon characters is currently mostly replying on complex manual composition or by motion capturing using expensive equipments. The project is to study possibilities of using widely available equipments such as camcorders and simple markers to automate the motion capturing process.

7) Remote Collaboration Based on Videos, Audio and Graphics

Remote collaboration enables us to work together even if we do not physically stay at the same place. Teleconferencing is one of such applications, in which people can talk to each other, see each other and exchange ideas by writing or drawing on virtual white board. Current systems still have a lot of problems. For example, users often complain there is no eye contact. Because of the position of cameras mounted on a screen or a table, it is hard to look at the person on the screen and the camera at the same time. Also, if there are several people in a teleconference, it is often quite hard to distinguish which two people are talking to each other. One current solution is building a virtual environment. People are shown as avatars around a virtual table. Now, the problem is how to convert eye contact information into the actions of these avatars. We will study possible solutions to address these problems.

8) Error Concealment or Inpainting for Videos

For noisy communication channels, packets can be lost during the transmission. Packet loss results in one or multiple blocks missing during image decoding. Instead of retransmitting the missing blocks, it is in fact possible to reconstruct these missing blocks based only the image blocks correctly received. Such techniques are called error concealment. Image concealment has been studied intensively. Video concealment could be done better if we use both the temporal continuity information and the structural constraint of images. Inpainting, a related technique, can be used to remove objects from videos (It has been used to make stunts in movies look more appealing.)

9) Compression of Multiple View Video Data

Multiple view video compression can achieve higher compression ratio because of the redundancy in multiple images taken from similar view angles. In this project, we study methods for efficiently compressing such videos. Except the compression ratio, the encoder also needs to support functions such as allowing fast random access of each video frame.

10) Object Tagging and Applications

Recognizing tags (a special pattern or marker) has become a mature technique. Many different tags can be used to embed identities of objects which can be extracted in realtime using cameras and image processing methods. In this project, we study possible applications of image based tagging.

11) Embedded Multimedia Applications

Most hand-held devices are now equipped with cameras and audio input devices. Their processing power is also increasing rapidly which makes it possible to develop multimedia applications in these small devices. For instance, most current hand-held devices can be connected to wireless networks and be used as universal remote controls. Other multimedia applications using hand-held devices include face recognition and voice recognition, etc.