Skip to main content

CAP6411 – Fall 2024

Computer Vision Systems (3 Credit Hours)

Course Content

Hi students:

Welcome to the Fall 2024 semester, and thank you for choosing this class.

As a start, I am asking that you reply to these questions:

  1. Have you taken CAP5415? This is a hard pre-requisite class because the goal of this class is not to go through all the computer vision topics, which are covered in CAP5415, but to hone your skills in building computer vision systems. Anyone who is found to not have taken CAP5415 yet WILL NOT be given a passing grade. Please drop this class if you have not taken CAP5415. The only exceptions are as follow:
    • You are enrolled in a PhD program, and your dissertation is on computer vision. For such students, please have your PhD advisor send a email vetting your skills to do well in this course.
    • You have 2 years of industrial/research experience in computer vision.
      • You have contributed significantly to at least two githubs that are on computer vision, or,
      • You are the first author of a CSRanking paper that was published (ARXIV will not count).
      • Please send links to the above if you are such students.
    • You have taken another graduate computer vision class (5000 and above) and received a A- or above grade.
    • NO OTHER EXCEPTIONS WILL BE ACCEPTED.
  2. Do you have any experience in coding and training deep learning models, including Convolutional Neural Networks (CNN), transformers, etc.?
  3. By signing up for this class, you agree to the following (reply YES, if you do not agree, please drop the class):
    1. You agree that assignments are to be submitted on or before the deadline indicated. You agree that you will not email/text/ping the Professor nor the grader about leniency for late submission under any circumstances. All assignments are submitted to webcourse so you need to accommodate for a situation where the system is down — this will not be a reason to ask for leniency for any late submission.
      • Any email received above automatically mean 0 points for the assignment.
    2. You agree that all assignments and team projects are to be submitted to webcourse and not directly to the professor or the grader.
      • Any assignments emailed to the professor or grader directly will receive a 0 grade immediately.
    3. You agree that any dishonesty and plagiarism once discovered will be given a F grade for the course immediately. This include copying individual assignments from other students in the course.
    4. You agree to pull your weight in any team projects. You agree that if discovered you are riding the coat tail of your teammates in team project, you deserve a 0 for the team project even though the other team members get higher scores, i.e., it is not necessary all team members get the same score on a team project.
    5. You agree to complete this assignment 0 by submitting a PDF with your answers to this list of questions before the first class.

Syllabus

The goal of this course is to equip the students with the abilities to understand the state of the art techniques in computer vision, replicate results that have been reported in papers (taking open-sourced github code mostly), making recommendations how to improve and/or deploy to real-world settings. Note that we won’t be able to cover all the topics in computer vision, particularly from the perspective of the goal of this course to learn how to build computer vision systems.

Topics covered (subject to some changes):

  1. Model efficiency:
    • One of the key things in building CV systems is that many models these days are not deployable on the edge as they are too big.
  2. Foundational Models, e.g.:
  3. Self Supervised Learning (SSL), e.g.:
  4. Generative Models, e.g.:

Class Format

  1. There are about 16 weeks of classes, twice a week on Tue and Thu, 3-4:15pm. We will have online and face to face format. There will be a week from Sep 28-Oct5, when I will be traveling to the ECCV conference, so that week will be online or used for working on team project and assignments. Any weeks we have to go online due to unforeseen circumstances will be announced.
  2. Reading papers before class will be beneficial, but not required. Before each class, a paper will be posted for you to read ahead of the class. I will go through the paper technique (with slides) and corresponding results that were reported in class.
  3. Individual assignments (70% of your grade):
    • After we go through a paper (which can take 1-3 classes), each student will submit a video demo a week after the paper is taught, where the student will record a video of how the code is ran on example inputs. Each student will provide a short report (1-3 pages) on what was learned, and what have been some issues trying to run the code. Finally, the report should also contain some thoughts on how the model can be improved and/or deployed to the real world.
    • This is a computer vision system class, which means that each assignment will be given heavy emphasis on the speed and memory requirement of the model. Each assignment will thus require you to make modifications to speed up and lower the memory requirements of the models we are discussing. To be specific, credit will be given for faster and more memory efficient inference in the assignment, while points can be taken off for slow inference. We will spend our first or second class (depending on how much time we will take in the first class to clarify logistics) talking about model efficiency.
    • Each assignment will be given a score out of 100. We will then divide the total score by the number of assignments we managed to do multiplied by 100. 70% of this final score goes towards the final grade score.
    • Plagiarism: Students should not be sharing or copying code and/or report. The goal of these assignments is to ensure each student get first hand knowledge of running these state of the art techniques.
    • NOTE: all submissions need to be uploaded to webcourse, and no assignment submitted to the professor or grader will be accepted or acknowledged. All submissions CANNOT contain any links (e.g., google drive) where you upload your submission files or anything. Lastly, the assignment due date is the hard stop date, all late submissions will be given a zero score.
  4. Team project (30% of your grade):
    • Due on Dec 4th. Each team will submit a detailed report and a video of demo. In the next two weeks, we will have 4 days, Tue and Thu of the following two weeks, where each team will present their project and a live demo. To ensure fairness (otherwise the team doing the last demo has the most time to finish the project), the live demo must be as close to the submitted video as possible. Projects will be judged according to:
      • Speed and memory-efficiency (50%)
      • Creativity and novelty (20%)
      • Code clarity and correctness (20%)
      • Clarity of report (10%)
        • Motivation of the idea is clearly articulated
        • Literature review is comprehensive
        • Experiments clearly showing good performance
        • A section on what was accomplished each week and the designation of tasks
      • Credit will be given to project that can execute the models efficiently say on your phone.
      • Outcome of the team project can be an interesting system or a publishable paper at CVPR or ICCV. For latter, there is potential to work with me further to develop it into a full paper.
      • In addition, we will also look at the size of the project (on a scale of 0 to 1, where 1 is worthy of about 3.5 months of efforts from start of course to Dec 4th). This will be used as a multiplier – let’s say you score 80 on the first three bullets but your project size is graded at 0.6 as it is not fully needing 3.5 months, then the final project score is 48.
    • We will have groups of 4-5 students for each project team.
    • Also, to ensure fairness that each team member has carried his/her own weight, each team member would also need to furnish an individual report, which will be very similar to the individual assignment report:
      • Report on the part you did, and how it impact the team project. (25%)
      • Code zip of the part you coded. (25%)
      • Video of YOU live demoing, explaining and presenting the system. (25%)
      • New ideas/insights/hustle you provided to the team. (25%)
      • IN ADDITION, you individual report needs to be signed off by ALL team members, i.e., their signatures must be on your report. I usually use pdf signatures but any form is accepted.
      • There will be zero tolerance on academic dishonesty. If we detect possible academic dishonesty, it will be immediately submit to UCF’s academic integrity team to deal with.
    • The final team project grade will be 70% of individual report as described above and 30% of team level grade. So if at the team level report/video/demo/code, the team score a 80, but at the individual level you score a 70, then your score will be 80 x 0.3 + 70 x 0.7 = 73. Conversely if the team did an amazing project and score 100 and you have not been carrying your weight and score a 10 individually, then you will be given a team project score of 37.
    • Potential team project topics: model efficiency, video generation, AR/VR, etc.
    • Plagiarism: Teams are welcome to share code and knowledge.
  5. Attendance will be taken and my policy is that you should not miss more than 20% of the classes.
    • Please ensure you have the UCF Here mobile app as the attendance will be taken via you scanning QR code with your app twice.
    • We are all grownups so if there are attenuating situations, this is not a hard rule. Please come talk to me if you are facing difficulties.

Potential Valuable Resources

HuggingFaceLinks to an external site. has a large repo of valuable models and code in computer vision, language and multimodal. It may be helpful for your assignments and team projects to find code snippets, models, etc.

Final Grading Scheme

Individual Assignments: The contribution of your assignments to the final grade will be S_{ind} = 0.7 x S_{total}/S_{max}. S_{total} = \sum_{i \in N} S_i, where S_i is the score of the ith assignment and N is the total number of assignments.  S_{max} = \sum{i \in N} T_i, where T_i is the maximum possible score of the ith assignment (usually 100). Extra bonus points that students earn clearly gives you an edge. A student may do so well that he/she has perfect score each time plus bonus and possibly S_{total}>S_{max}, in which case, S_{total} := S_{max}.

Team Project: The contribution of your team project to the final grade will be S_{proj} = 0.3 x S_{project}. This is self-explanatory.

Final score: S_{ind} x 100 + S_{proj}

Grading Scheme:

A: X >= 95

A-: 91<= X <=94

B+: 87<= X <= 90

B: 83 <= X <= 86

B-: 80 <= X <= 82

C+: 75 <= X <=79

C: 70 <= X <= 74

D: 60 <= X <= 69

F: X <= 59

Course Summary: