CAP6412 – Spring 2013
Advanced Computer Vision (3 Credit Hours)
Instructor: | Imran Saleemi |
Email: | Imran@eecs.ucf.edu |
Office: | HEC 256 |
Location: | MAP 204 |
Office Hours: | Tuesday & Thursday 6:00 - 7:00pm |
Course webpage: | https://www.crcv.ucf.edu/courses/cap6412-spring-2013/ |
Course Description:
Review recent advances in computer vision.
Course Goals:
To prepare students for graduate research in computer vision.
Exam and Grading Policy:
Reports: 30%
Paper Presentations: 10%
Discussion and Attendance: 20%
Programming Projects + presentation (30+10): = 40%
No exam!
Reports:
Summary, strengths, weaknesses, ideas, questions, tools employed.
Useful Links:
How to read a research paper (by Dr. Shah)
Lecture List | ||||
---|---|---|---|---|
Lectures 1 & 2 - Jan 8 & 10 | Course Introduction | Template for Paper Review | Boqing Gong [Slides] | |
Lecture 3 - Jan 15 H. Pirsiavash, D. Ramanan, C. Fowlkes, "Globally-optimal greedy algorithms for tracking a variable number of objects" , CVPR 2011. Presenter: Wenhui Li | Fundamentals of CNN | Fareeha Irfan [Slides] | Preferred topics due 01/14, 1:00pm Sign up here for presentations | |
Lecture 4 - Jan 17 F. Yu, Rongrong Ji, Ming-Hen Tsai, Guangnan Ye, Shih-Fu Chang, "Weak attributes for large-scale image retrieval" CVPR 2012. Presenter: Kutalmis Akpinar | CNN & Object recognition | Dustin Morley [Slides] | ||
Lecture 5 - Jan 22 Zheng Wu, A. Thangali, Stan Sclaroff, M. Betke, "Coupling detection and data association for multiple object tracking ", CVPR 2012. Presenter: Douglas Cooper | Understanding CNN | Jason Tiller [Slides] | Paper review of [Visualization] due 01/21, 3pm | |
Lecture 6 - Jan 24 Bangpeng Yao, Xiaoye Jiang, A. Khosla, A. Lin, L. Guibas, Li Fei-fei, "Human action recognition by learning bases of action attributes and parts " ICCV 2011. Presenter: Yicong Tian | Detection proposals | {Major} [Detection Proposals] J. Hosang, R. Benenson, P. Dollár, and B. Schiele. What makes for effective detection proposals? PAMI 2015. {Major} [Faster R-CNN] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in Neural Information Processing Systems, pp. 91-99. 2015. | Samer Iskander [Slides] | Paper review of [Detection Proposals] due 01/26, 12pm |
Lecture 7 - Jan 29 Ali Borji, "Boosting bottom-up and top-down visual features for saliency estimation", CVPR 2012. Presenter: Dong Zhang | R-CNN | [Fast R-CNN] Girshick, Ross. "Fast R-CNN." arXiv preprint arXiv:1504.08083 (2015). | Syed Ahmed [Slides] | |
Lecture 8 - Jan 31 Lingqiao Liu, Lei Wang, "What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection" , CVPR 2012. Presenter: Rui Hou | Image captioning | Mao, Junhua, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille. “Explain images with multimodal recurrent neural networks.” arXiv preprint arXiv:1410.1090 (2014). Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. “Long-term recurrent convolutional networks for visual recognition and description.” arXiv preprint arXiv:1411.4389 (2014). Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. “Show and tell: A neural image caption generator.” arXiv preprint arXiv:1411.4555 (2014). Lebret, Rémi, Pedro O. Pinheiro, and Ronan Collobert. “Phrase-based image captioning.” arXiv preprint arXiv:1502.03671 (2015). Chen, Xinlei, and C. Lawrence Zitnick. “Mind’s eye: A recurrent visual representation for image caption generation.” Neural computation 9, no. 8 (1997): 1735-1780. Kiros, R., Salakhutdinov, R. and Zemel, R.S., 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539. | Harish Ravi Prakash [Slides] | Paper review of [Image captioning] due 02/02, 12pm Project I posted! Due: 02/28, 11:59pm |
Lecture 9 - Feb 5 Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-Jiang Zhang, "Tag ranking", Proceedings of the 18th International Conference on World Wide Web, 2009. Presenter: Shervin Ardeshir | Attention modeling | Karan Daei-Mojdehi [Slides] | Paper review of [Attention Modeling] due 02/04, 12pm | |
Lecture 10 - Feb 7 Programming assignment # 1 | Low-level vision: Super-resolution | Riegler, Gernot, Samuel Schulter, Matthias Ruther, and Horst Bischof. "Conditioned Regression Models for Non-Blind Single Image Super-Resolution." In Proceedings of the IEEE International Conference on Computer Vision, pp. 522-530. 2015. Liao, Renjie, Xin Tao, Ruiyu Li, Ziyang Ma, and Jiaya Jia. "Video Super-Resolution via Deep Draft-Ensemble Learning." In Proceedings of the IEEE International Conference on Computer Vision, pp. 531-539. 2015. | Jose Sanchez [Slides] | Paper review of [Super-resolution] due 02/09, 12pm |
Lecture 11 - Feb 12 Reyes Rios Cabrera, Tinne Tuytelaars and Luc Van Gool,"Efficient Multi-Camera Detection, Tracking, and Identification using a shared set of Haar features", CVPR 2011. Presenter: Tung Khuc | Lecture 9 - Feb 5 Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-Jiang Zhang, "Tag ranking", Proceedings of the 18th International Conference on World Wide Web, 2009. Presenter: Shervin Ardeshir | Lecture 9 - Feb 5 Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-Jiang Zhang, "Tag ranking", Proceedings of the 18th International Conference on World Wide Web, 2009. Presenter: Shervin Ardeshir | Lecture 9 - Feb 5 Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-Jiang Zhang, "Tag ranking", Proceedings of the 18th International Conference on World Wide Web, 2009. Presenter: Shervin Ardeshir | Lecture 9 - Feb 5 Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-Jiang Zhang, "Tag ranking", Proceedings of the 18th International Conference on World Wide Web, 2009. Presenter: Shervin Ardeshir |
Lecture 12 - Feb 14 X. Zhu, and D. Ramanan,"Face detection, pose estimation and landmark localization in the wild", CVPR 2012. Presenter: Andres Vargas | Optical flow | Fleet, David, and Yair Weiss. "Optical flow estimation." In Handbook of mathematical models in computer vision, pp. 237-257. Springer US, 2006. Revaud, Jerome, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. "EpicFlow: Edge-preserving interpolation of correspondences for optical flow." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164-1172. 2015. | Abdullah Jamal [Slides] | Algorithm sketch of [Optical flow] due 02/16, 12pm |
Lecture 13 - Feb 19 Jie Feng, Yichen Wei, Litian Tao, Chao Zhang, and Jian Sun,"Salient Object Detection by Composition", ICCV 2011. Presenter: Desmond Persaud | Pose estimation | Zhang, Dong, and Mubarak Shah. "Human Pose Estimation in Videos." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2012-2020. 2015. Seguin, Guillaume, Karteek Alahari, Josef Sivic, and Ivan Laptev. "Pose estimation and segmentation of multiple people in stereoscopic movies." Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, no. 8 (2015): 1643-1655. | Amar Kelu Nair [Slides] | Algorithm sketch of [Pose estimation] due 02/18, 12pm |
Lecture 14 - Feb 21 Lei Ding and Alper Yilmaz,"Inferring Social Relations from Visual Concepts", ICCV 2011. Presenter: Behnaz Nojavan | Visual question answering | Gao, Haoyuan, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. "Are you talking to a machine? dataset and methods for multilingual image question answering." arXiv preprint arXiv:1505.05612 (2015). | Suhas Nithyanandappa [Slides] | |
Lecture 15 - Feb 26 Liangliang Cao, Yadong Mu, Apostol Natsev, Shih-Fu Chang, Gang Hua, and John R. Smith,"Scene Aligned Pooling for Complex Video Recognition", ECCV 2012. Presenter: Venkatanagavalli Sidhamsetti | Visual question answering | Lin, Xiao, and Devi Parikh. "Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2984-2993. 2015. | Nandakishore Puttashamachar [Slides] | Project I due on 02/28, 11:59pm |
Lecture 16 - Feb 28 Recap of past papers; Potential ideas | Visual question answering | Sadeghi, Fereshteh, C. Lawrence Zitnick, and Ali Farhadi. "VISALOGY: Answering Visual Analogy Questions." In Advances in Neural Information Processing Systems, pp. 1873-1881. 2015. | Javier Lores [Slides] | |
Lecture 17 - Mar 12 Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake,"Real-Time Human Pose Recognition in Parts from Single Depth Images", CVPR 2011. Presenter: Oliver Nina | OCR in the wild | Zhu, Yingying, Cong Yao, and Xiang Bai. "Scene text detection and recognition: Recent advances and future trends." Frontiers of Computer Science 10, no. 1 (2016): 19-36. | Aisha Urooji [Slides] | |
Lecture 18 - Mar 14 Bolei Zhou, Xiaogang Wang, and Xiaoou Tang,"Understanding Collective Crowd Behaviors: Learning a Mixture Model of Dynamic Pedestrian-Agents", CVPR 2012. Presenter: Salman Khokhar | Spring break | Spring Break |
_______________________________________________________________________________________________________________________________________________________
List of Papers to choose from:
This will be updated through the semester. Email me to sign up.
Motion Patterns Modeling and Estimation:
Xuemei Zhao and Gerard Medioni, “Robust Unsupervised Motion Pattern Inference from Video and Applications”, ICCV 2011
Visual Saliency:
Ali Borji, “Boosting bottom-up and top-down visual features for saliency estimation”, CVPR 2012.
Detection/Categorization:
Yuning Jiang, Jingjing Meng, Junsong Yuan, “Randomized visual phrases for object search,” CVPR 2012
N. Payet, S. Todorovic, “From contours to 3D object detection and pose estimation,” ICCV 2011
Scene classification/segmentation, Image Retrieval:
Liujuan Cao, Rongrong Ji, Yue Gao, Yi Yang, Qi Tian, “Weakly Supervised Sparse Coding with Geometric Consistency Pooling”, CVPR 2012
Lingqiao Liu, Lei Wang, “What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection”, CVPR 2012
F. Yu, Rongrong Ji, Ming-Hen Tsai, Guangnan Ye, Shih-Fu Chang, “Weak attributes for large-scale image retrieval,” CVPR 2012
D. Parikh, K. Grauman, “Relative attributes”, ICCV 2011
Tracking:
Zheng Wu, A. Thangali, Stan Sclaroff, M. Betke, “Coupling detection and data association for multiple object tracking”, CVPR 2012
L. Leal-Taixe, G. Pons-Moll, B. Rosenhahn, “Branch-and-price global optimization for multi-view multi-target tracking,” CVPR 2012
One of
H. Pirsiavash, D. Ramanan, C. Fowlkes, “Globally-optimal greedy algorithms for tracking a variable number of objects”, CVPR 2011
or
J. Berclaz, F. Fleuret, E. Turetken, P. Fua, “Multiple Object Tracking Using K-Shortest Paths Optimization”, PAMI 2011
Action, Activity, Event Recognition:
Lixin Duan, Dong Xu, I. Tsang, Jiebo Luo, “Visual Event Recognition in Videos by Learning from Web Data”, PAMI 2012
Wen Li, Lixin Duan, Dong Xu, I. Tsang, “Text-based image retrieval using progressive multi-instance learning”, ICCV 2011
Lixin Duan, Dong Xu, Shih-Fu Chang, “Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach”, CVPR 2012
W. Brendel, S. Todorovic, “Learning spatiotemporal graphs of human activities,” ICCV 2011
Bangpeng Yao, Xiaoye Jiang, A. Khosla, A. Lin, L. Guibas, Li Fei-fei, “Human action recognition by learning bases of action attributes and parts,” ICCV 2011