Skip to main content

CAP6412 – Spring 2024

Advanced Computer Vision (3 Credit Hours)

Course Content

This is an Advanced Computer Vision course which will expose graduate students to the cutting-edge research in Computer Vision. We will discuss research papers on visual-language  models (VLM)  and cover different vision foundation models  including textually prompted  and visually promoted models, different architectural styles e.g. dual encoder, encoder-decoder, adapted LLM;  CLIP and its variation, SAM (Segment Anything), LLaVA,  Video ChatGPT, Video-LLaVA, instruction tunning etc.

Computer vision has been very active area of research for many decades and researchers have been working on solving important challenging problems. During the last few years, Deep Learning involving Artificial Neural Networks has been disruptive force in computer vision. Employing deep learning, tremendous progress has been made in a very short time in solving difficult problems and very impressive results have obtained in image and video classification, localization, semantic segmentation, etc. New techniques, datasets, hardware, and software libraries are emerging almost every day. Deep Computer vision is impacting research in Robotics, Natural Language understanding, Computer Graphics, multi-modal analysis etc. One of the most important and impactful works during the last year has been Chat GPT, which has been used and it is impacting our daily life. Input and output of conversational Large Language Model (LLM) is text. Currently new frontier in this context is Large Multimodal Model (LMM), where besides text  input and output can be  images, videos, speech, music etc. In this course, we will mainly focus on visual-language models involving images and videos.

Grading Policy

  • Reports (you have to do only 50% of the papers): 10%
  • Replication of papers (5 papers): 15%
  • Presentation (roughly two): 25%
  • Attendance and Discussion: 5%
  • Projects: 45%

Late Policy

  • 0 for late Reports 
  • 20% off per day, up to 4 days, for  Presentations/Projects

Student Learning Outcomes

After the completion of the course, the students should be able to:

  • Read and understand a research paper.
  • Write a comprehensive review of the paper.
  • To identify strong and weak points of the papers.
  • To generate own ideas to solve the same problem.
  • To work on research project and write a research paper
  •  Review/rehearsal of power point presentation meeting:
    • For Monday presentation
      • Slide Review: Wednesday  4:15 a week before the scheduled presentation
      • Rehearsal: Friday a week before the scheduled presentation  1:00PM during Office hours
    • For Wednesday presentation
      • Slide Review : A week before the scheduled presentation :  Friday  1:00PM during Office hours
      •  Rehearsal: A week of presentation on Monday 2:00PM during Office hours

Important Dates:


Statement on Academic Integrity:

The UCF Golden Rule will be observed in the class. Plagiarism and Cheating of any kind on an examination, quiz, or assignment will result at least in an “F” for that assignment (and may, depending on the severity of the case, lead to an “F” for the entire course) and may be subject to appropriate referral to the Office of Student Conduct for further action. I will assume for this course that you will adhere to the academic creed of this University and will maintain the highest standards of academic integrity. In other words, don’t cheat by giving answers to others or taking them from anyone else. I will also adhere to the highest standards of academic integrity, so please do not ask me to change (or expect me to change) your grade illegitimately or to bend or break rules for one person that will not apply to everyone.