Adrian Ulges
PhD Student, Computer Science
IUPR (Image Understanding and
Pattern Recognition) Research Group
DFKI (German Research Center
for Artificial Intelligence)
My general fields of interest are pattern recognition, machine learning, and processing of visual information. In the past, my work was focused on document image processing, text recognition, and camera-based document capture. Now that I work on my PhD, my interest has shifted towards video annotation and learning methods to improve recognition in video.
Table of Contents
CV
Research
Teaching
Publications
Other
Stuff
Personal
Links
Contact
Curriculum Vitae
a short resume in pdf format.
Research
Automatic Youtube Video Tagging
Online Video is taking off and becoming a serious competitor for traditional broadcast. Supporting content-based retrieval in online video portals is an extraordinarily difficult challenge due to the enormous variability of the content and production of online video. On the other hand, online video content offers a highly interesting source for visual learning, since videos are tagged by users when uploaded. This makes it possible to learn automatic video tagging without manual annotation of training sets.
We have developed a system that autonomously learns to tag videos with high-level semantic concepts by watching videos from online portals like http://youtube.com. Our system fuses evidence from several visual features like color, texture, motion, and patches in a robust manner. Results on a large-scale dataset with 22 tags can be checked out at http://demo.iupr.org/videotagging/.
|
|
|
Our video tagging prototype in action: for a video showing a car race, it suggests the tags „crash“, „race“, and „soccer“. To achieve this, the system has been trained on videos downloaded from the online video portal youtube.com. |
Learning from Weakly Labeled Videos - Telling Relevant Material from Irrelevant One
Acquiring training data for automatic video tagging is a difficult problem due to the high number and visual complexity of tags. It would be great to reduce the labeling effort for training tagging systems to a single global label per video. The problem with this is that we don't know when in the video the target concept appears. I have developed a statistical framework for inferring this knowledge, i.e. a system that infers which training content is relevant and which is irrelevant. The method models relevance as a latent random variable that is estimated in an EM fashion and has been published at CIVR 2008.
![]() |
![]() |
![]() |
|
The system tells relevant training content for the concept „basketball“ (left) from irrelevant one (center). In this way, the robustness of automatic video taggers with respect to irrelevant training content can be increased, and training can take place on weakly labeled videos (right), for example downloaded from online video portals like youtube.com. |
Motion Segmentation and Object Recognition
Another interest of mine is the interaction between motion segmentation and object recogntion. While segmentation of still images cannot be expected to meaningful object regions, motion segmentation can segment moving objects from their background even in cases where a segmentation due to color and texture fails. Yet, motion segmentation has not been used to its full potential for recognition. We have developed a simple recognition approach that uses motion segmentation to discard background clutter and to improve the generalization capabilities of object recognition. In combination with a patch-based voting excellent robustness properties are achieved.
![]() |
![]() |
|
An object recognition approach combining motio segmentation and a patch-based voting. Background is discarded based on motion information, which increases voting stability. |
|
Motion-Based Video Retrieval
My interest is in retrieval that makes use of the motion aspects of video. I developed a compressed-domain motion desciptor for video retrieval based on spatiograms. The method collects statistics of the joint spatial and frequency distribution of motion vectors. It works well in an interview detection scenario using NN classification, and was introduced at TRECVID 2006.
![]() |
![]() |
|
|
The spatial and frequency distribution of motion in video can be captured using our compressed-domain spatiogram descriptor. In this scenario, the characteristics of motion are used to identify interview scenes in a TRECVID06 database. |
||
Document Image Dewarping
Document images delivered by digital cameras come with low resolution, noise, and geometric distorsion. This distorsion has a negative impact on OCR recognition rates, a problem that becomes particularly severe for curled book surfaces. To make information extraction from such images possible, we developed two novel approaches of dewarping text into an upright representation. To make a proper dewarp possible, depth models of the paper surface are extracted using stereo vision or text-based constraints.
For a demo of the dewarping software, see http://demo.iupr.org
![]() |
|
|
|
A book surface captured with a digital camera showing strong perspective distorsion (left). We build a depth model of the book surface (right) based on stereo vision or text constraints, and dewarp the text using a mesh aligned to the text. The result is a dewarped version (center) that is visually more pleasing and can be fed to a standard OCR system. |
||
OCR for Camera-based Document Capture
The dewarping approach to information extraction from document images captured with a digital camera compensates for geometric distorsion, but not for other shortcomings like low resolution and photometric distorsion. For this purpose, we developed an OCR system for noisy, warped, and linked characters as they occur in document images captured by digital consumer cameras. The approach uses a neural network classifier and uses methods from handwriting recognition to solve the segmentation problem.
The method was integrated with a user interface called DIVR (Document Image Viewing and Retrieval) that allows users to browse their camera-captured document snapshots and search for text in them.
![]() |
|
Our OCR for camera document images in action: integrated with the DIVR user interface, the approach allows to google for text in snapshots of documents. |
Teaching
Lab Course „Pattern Recognition Systems“, WS 07/08
Lecture „Pattern Recognition“, SS 07 (tutorials, few lectures)
Lecture „Image and Video Processing“, SS 07 (tutorials, few lectures)
Lecture „Pattern Recognition“, SS 06 (tutorials)
Lecture „Image and Video Processing“, SS 06 (tutorials)
Lab Course „Pattern Recognition Systems“, WS 06/07
Internship on camera-supported HCI, WS 06/07
Master's Thesis on Document Dewarping, WS 06/07
Bachelor Thesis on Text Extraction from Online Videos, WS 07/08
for more information, see the IUPR courseware site at http://courses.iupr.org
Publications
You can find all publications at http://pubs.iupr.org
A. Ulges, C. Schulze, T. Breuel. The Challenge of
Tagging Online Video.
Computer Vision and Image
Understanding (submitted for publication).
D. Borth, C.
Schulze, A. Ulges, T. Breuel. Navidgator - Similarity Based
Browsing
for Image & Video Databases. KI
'08 (accepted for publication), 2008.
A. Ulges, T. Breuel.
Segmentation by Combining Optical Flow
with
a Color Model. ICPR (accepted for publication), 2008.
A.
Ulges, C. Schulze, T. Breuel. Identifying Relevant Frames
in Weakly Labeled Videos
for Training Concept
Detectors. CIVR, 2008.
A. Ulges, T. Breuel. A
Local Discriminative Model for
Background
Subtraction. DAGM-Symposium, 2008.
A. Ulges, C.
Schulze, D. Keysers, T. Breuel. A System that Learns to Tag
Videos
by Watching Youtube. ICVS
2008.
D. Borth, A. Ulges, C. Schulze, T. Breuel. Keyframe
Extraction for
Video Taggging and
Summarization. GI-Informatiktage, 2008.
A. Ulges,
C. Schulze, D. Keysers, T. Breuel. Content-based Video
Tagging
for Online Video Portals.
MUSCLE/ImageClef Workshop 2007.
A. Ulges, C.
Lampert, D. Keysers, T. Breuel. Dominant Motion Estimation
using
Adaptive Search of Transformation
Space. DAGM07, 2007.
A. Ulges. Motion
Interpretation using Adaptive Search of Transformation
Space
Technical Report, TU Kaiserslautern, 2007.
A.
Ulges, C. Lampert, D. Keysers. Spatiogram-Based Shot Distances
for Video Retrieval.
TRECVID Workshop (unreviewed
workshop paper), 2006.
A. Ulges, C. Lampert, T. Breuel. Document Image
Dewarping using Robust
Estimation of Curled
Text Lines. ICDAR05, 2005.
A. Ulges. Recognizing Objects in Still Images and Video
Stream.
Technical Report, TU Kaiserslautern, 2006.
A.
Ulges, C. Lampert, T. Breuel. Document Image Dewarping
using Robust
Estimation of Curled Text Lines.
ICDAR05, 2005.
A. Ulges. Indexing and
Recognition of Documents Captured with a Handheld Camera.
Diploma
Thesis, Technical University of Kaiserslautern, 2005.
C. Lampert, T. Braun, A. Ulges, D. Keysers, T.
Breuel.
Oblivious Document Capture and Real-Time Retrieval.
CBDAR05, 2005.
A. Ulges, C. Lampert, T. Breuel. Document
Capture using Stereo Vision.
ACM Symposium on Document
Engineering, 2004.
A. Ulges. StereoBook -
Document Capture using Stereo Vision.
Project Thesis,
TU Kaiserslautern, 2004.
Other Stuff
Honors
award for diploma Thesis „Indexing and
Recognition of Documents Captured with a Handheld Camera“ by
Sparkasse Kaiserslautern
Peer Reviewing
IEEE TPAMI, CVPR07, MVA07, VISAPP 08,
ELCVIA CVIA, ECCV08, CVPR08, DAS08
Personal Links
Handball
Team TV
Bad Ems (Oberliga Rheinland)
Google05 internship
pics
of California 2005.
Contact me
phone: (+49) 631-20575-419
fax: (+49) 631-20575-402
email:
adrian.ulges at dfki.de
DFKI, Trippstadter Str. 122
67663
Kaiserslautern
Germany