KeyNote and Courses
Clustering is a fundamental problem in data science, used in myriad of applications. Despite significant research in different fields, clustering remains a major challenge. Most traditional approaches to designing and analyzing clustering algorithms have mainly focused on one shot clustering, where the goal is to design an algorithm to cluster a one-time dataset well. Unfortunately, from a theoretical standpoint, there are major impossibility results for such scenarios; first, in most applications it is not clear what notion of similarity or what objective function to use in order to recover a good clustering for a given dataset; second even in cases where the similarity function and the objectives can be naturally specified, optimally solving the underlying combinatorial clustering problems is typically intractable.
In this talk, I will describe a lifelong transfer clustering approach to address these challenges. Motivated by the fact that in many modern settings, we often need to solve not only one, but many clustering problems arising in a given application domain, we consider algorithms that adaptively learn to cluster. In particular, given a series of clustering instances to be solved from the same domain, we show how to learn a good parameter setting for clustering algorithms that perform well on instances coming from that domain. We provide formal guarantees on the number of typical problem instances that are sufficient to ensure that a clustering algorithm that does well on these typical instances, will also do well on new instances coming from the same domain, as a function of the complexity of the underlying parametrized family of clustering algorithms. We also show a significant benefit of our approach experimentally on datasets such as MNIST, CIFAR, and Omniglot.
Maria Florina Balcan is an Associate Professor in the School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Faculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She was a program committee co-chair for the Conference on Learning Theory in 2014 and for the International Conference on Machine Learning in 2016. She is currently board member of the International Machine Learning Society (since 2011), a Tutorial Chair for ICML 2019, and a Workshop Chair for FOCS 2019.
There is a high global demand for the learning of English as an additional language. Automatic assessment systems can help meet this need by reducing human assessment effort and enabling learners to independently monitor their progress when/wherever they choose. To properly determine a candidate’s spoken English proficiency the auto-marker should be able to accurately assess the learner’s ability level from spontaneous, prompted, speech. This should be independent of L1 language and audio recording quality which vary considerably making this a challenging task. This talk will look at the application of deep learning to spontaneous spoken English assessment. Examples of tasks that will be discussed include:
These tasks make use of a range of deep-learning techniques including: recurrent sequence models; sequence ensemble distillation (teacher-student training); attention mechanisms; and Siamese networks.
Mark Gales is a Professor of Information Engineering in the Machine Intelligence Laboratory (formerly the Speech Vision and Robotics (SVR) group) and a Fellow of Emmanuel College. He is a member of the Speech Research Group together with faculty staff members Phil Woodland and Bill Byrne. Recent past members of the group include Milica Gasic and Steve Young.
In this talk, I will discuss recent machine learning and AI theory, methods, algorithms and systems which we developed in our lab to understand the basis of health and disease, to catalyze clinical research, to support clinical decisions through individualized medicine, to inform clinical pathways, to better utilize resources & reduce costs and to inform public health.
To do this, we are creating what I call Learning Engines for Healthcare (LEH’s). An LEH is an integrated ecosystem that uses machine learning, AI and operations research to provide clinical insights and healthcare intelligence to all the stakeholders (patients, clinicians, hospitals, administrators). In contrast to an Electronic Health Record, which provides a static, passive, isolated display of information, an LEH provides dynamic, active, holistic & individualized display of information including alerts.
In this talk I will focus on 3 steps in the development of LEH’s:
Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Faculty Fellow at The Alan Turing Institute in London, where she leads the effort on data science and machine learning for personalized medicine. Prior to this, she was a Chancellor's Professor at UCLA and MAN Professor of Quantitative Finance at University of Oxford. She is an IEEE Fellow (2009). She has received the Oon Prize on Preventative Medicine from the University of Cambridge (2018). She has also been the recipient of an NSF Career Award, 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents. Her current research focus is on data science, machine learning, AI and operations research for medicine.
Artificial intelligence (AI) and algorithms based on machine/deep learning (ML/DL) are witnessing tremendous resurgence in healthcare with the promise to transform the practice of medicine reducing its cost and improving treatment outcomes. Biomedicine is considered to be the leading venue of AI/ML/DL effort in the foreseeable future with applications ranging from process automation, diagnosis, and prognosis using imaging and genetic information. However, progress has been slower than its anticipated promise thus far. In this course, we will discuss the current applications of DL in biomedicine and draw attention to its specific challenges and prospects. We will present example applications and highlight future potentials.
Issam El Naqa received his B.Sc. (1992) and M.Sc. (1995) in Electrical and Communication Engineering from the University of Jordan, Jordan. He worked as a software engineer at the Computer Engineering Bureau (CEB), Jordan, 1995-1996. He was awarded a DAAD scholarship to Germany, where he was a visiting scholar at the RWTH Aachen, 1996-1998. He completed his Ph.D. (2002) in Electrical and Computer Engineering from Illinois Institute of Technology, Chicago, IL, USA, receiving highest academic distinction award for his PhD work. He completed an M.A. (2007) in Biology Science from Washington University in St. Louis, St. Louis, MO, USA, where he was pursuing a post-doctoral fellowship in medical physics and was subsequently hired as a Instructor (2005-2007) and then an Assistant Professor (2007-2010) at the departments of radiation oncology and the division of biomedical and biological sciences and was an adjunct faculty at the department of Electrical engineering. He became an Associate Professor at McGill University Health Centre/Medical Physics Unit (2010-2015) and associate member of at the departments of Physics, Biomedical Engineering, and Experimental medicine, where he was a designated scholar. He is currently an Associate Professor of Radiation Oncology at the University of Michigan at Ann Arbor and associate member in Applied Physics. He is a certified Medical Physicist by the American Board of Radiology. He is a recognized expert in the fields of image processing, bioinformatics, computational radiobiology, and treatment outcomes modeling and has published extensively in these areas with more than 150 peer-reviewed journal publications and 3 edited textbooks. He has been an acting member of several academic and professional societies. His research has been funded by several federal and private grants and serves as a peer-reviewer and editorial board member for several leading international journals in his areas of expertise.
Deep learning, and machine learning in general, has become one of the most widely used tools in modern science and engineering, leading to breakthroughs in a number of areas and disciplines ranging from computer vision to natural language processing to medical outcome analysis. This mini-course will introduce the basics of machine learning theory and classification theory based on statistical learning and describe two classes of popular algorithms in depth: decision-based methods (decision trees, decision rules, bagging and boosting, random forests) and deep neural network-based models of various types. The course will focus on practical applications in analysis of large scientific data, interpretability, uncertainty estimation and how to best extract meaningful features with autonomous feature extraction and feature engineering and how to implement realtime deep learning in software and hardware. No previous machine learning background is required.
Sergei Gleyzer is a particle physicist and university professor, working at the interface of particle physics and machine learning towards more intelligent systems to extract meaningful information from the data collected by the Large Hadron Collider (LHC), the world’s highest-energy particle physics experiment located at the CERN laboratory, near Geneva Switzerland. He is the a co-discover of the Higgs Boson and founder of several major machine learning initiatives such as the Inter-experimental Machine Learning Working Group and Compact Muon Solenoid experiment’s Machine Learning Forum. Professor Gleyzer is working on applying advanced machine learning methods to searches for new physics, such as dark matter.
Deep learning has become a major enabling technology for computer vision. By exploiting its multi-level representation and the availability of big data, deep learning has led to dramatic performance improvements for certain tasks. Despite the significant progresses they have made, existing deep learning methods are deterministic and cannot effectively quantify their prediction uncertainty. Uncertainty quantification is not only important to improve the algorithm’s performance but is also essential for my practical applications. Furthermore, existing deep learning methods perform point-based prediction. Point-based prediction not only requires a time-consuming and heuristic training process but also suffers from overfitting and poor adaptation to novel conditions. Through this lecture, I will introduce probabilistic deep learning, where the deep models capture the probabilistic distribution of inputs and outputs, and produce not only a prediction but also its probability distribution. The lecture consists of 4 parts. In part 1, I will review the basic probability calculus and fundamental concepts in machine learning. Part 2 will cover deep probabilistic neural networks and deep Bayesian neural networks. In part 3, I will discuss deep probabilistic graphical models. The lecture will conclude with a discussion of applications of probabilistic deep learning to different computer vision and image processing tasks.
Prior knowledge in probability, calculus, linear algebra, and optimization methods. Familiarity with basic machine learning and computer vision techniques.
Qiang Ji received his Ph.D degree in Electrical Engineering from the University of Washington. He is currently a Professor with the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute (RPI), Troy, NY, USA. He previously served as a program director at the US National Science Foundation (NSF), where he managed NSF’s computer vision and machine learning programs. He also held teaching and research positions with the Beckman Institute at University of Illinois at Urbana-Champaign, Urbana, IL, USA; the Robotics Institute at Carnegie Mellon University, Pittsburgh, PA, USA; the Dept. of Computer Science at University of Nevada, Reno, Nevada, USA; and the Air Force Research Laboratory, Rome, NY, USA. Prof. Ji currently serves as the director of the Intelligent Systems Laboratory (ISL) at RPI.
Prof. Ji's research interests are in computer vision, probabilistic graphical models, machine learning, and their applications in various fields. He has published over 300 papers in peer-reviewed journals and conferences, and has received multiple awards for his work. Prof. Ji is has served as an editor on several related IEEE and international journals and as a general chair, program chair, technical area chair, and program committee member for numerous international conferences/workshops. Prof. Ji is a fellow of the IEEE and the IAPR.
Deep neural networks has been hugely successful in various domains, such as computer vision, speech recognition, and natural language processing. Though powerful, the large number of network weights leads to space and time inefficiencies in both training and storage. Recently, attempts have been made to reduce the model size. They include sparsification using pruning and sparsity regularization, quantization to replace the weights and activations with fewer number of bits, low-rank approximation, distillation and the use of more compact structures. These attempts greatly reduce the network size, and allows possibility of deploying deep models in resource-constrained environments, such as embedded systems, smart phones and other portable devices.
A general knowledge of machine learning and neural networks is requested.
Prof. Kwok is a Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He received his B.Sc. degree in Electrical and Electronic Engineering from the University of Hong Kong and his Ph.D. degree in computer science from the Hong Kong University of Science and Technology. Prof. Kwok served/is serving as an Associate Editor for the IEEE Transactions on Neural Networks and Learning Systems, Neurocomputing and the International Journal of Data Science and Analytics. He has also served as Program Co-chair of a number of international conferences, and as Area Chairs in conferences such as NIPS, ICML, ECML, AAAI and IJCAI. He is an IEEE Fellow.
In this tutorial, we will describe the basics of machine learning and artificial neural networks when applied to the natural language processing tasks. The tutorial will have three parts: first, we will introduce neural networks and discuss popular training algorithms such as stochastic gradient descent, as well as related topics such as learning rate, regularization, backpropagation, and various neural architectures.
In the second part, we will cover representational learning from text: this includes algorithms such as word2vec and fastText. We will describe the differences between various algorithms for learning representations. Efficient supervised text classification with the fastText algorithm will also be discussed. In the last part of the tutorial, statistical language models based on neural networks will be introduced, and certain advanced topics such as vanishing and exploding gradients, as well as learning the longer term memory in recurrent networks will be explained. We will also talk about the limitations of the current learning algorithms, and discuss the limitations of generalization in the context of sequential data, and learning from language in general.
Basic knowledge in linear algebra and programming.
Tomas Mikolov: I am a research scientist at Facebook AI Research since May 2014. Previously I have been member of Google Brain team, where I developed and implemented efficient algorithms for computing distributed representations of words (word2vec project). I have obtained my PhD from Brno University of Technology (Czech Republic) for my work on recurrent neural network based language models (RNNLM). My long term research goal is to develop intelligent machines capable of learning and communication with people using natural language.
Armand Joulin: I am a research scientist at Facebook Artificial Intelligence Research. I obtained my PhD in 2012 from the INRIA and the Ecole Normale Superieure. My advisors were Francis Bach and Jean Ponce. Before joining Facebook, I was a postdoctoral fellow at Stanford University, working with Daphne Koller and Fei-Fei Li.
Piotr Bojanowski: I am a research scientist at Facebook AI Research, working on machine learning applied to computer vision and natural language processing. My main research interest revolve around large-scale unsupervised learning. Before joining Facebook, in 2016, he got a PhD in Computer Science at the Willow team (INRIA Paris) under the supervision of Jean Ponce, Cordelia Schmid, Ivan Laptev and Josef Sivic. He graduated from Ecole polytechnique in 2013 and received a Masters Degree in Mathematics, Machine Learning and Computer Vision (MVA).
The last 40 years have seen a dramatic progress in machine learning and statistical methods for speech and language processing like speech recognition, handwriting recognition and machine translation. Many of the key statistical concepts had originally been developed for speech recognition and language translation. Examples of such key concepts are the Bayes decision rule for minimum error rate and sequence-to-sequence processing using approaches like the alignment mechanism based on hidden Markov models and the attention mechanism based on neural networks. Recently the accuracy of speech recognition and machine translation could be improved significantly by the use of artificial neural networks, such as deep feedforward multi-layer perceptrons and recurrent neural networks (incl. long short-term memory extension). We will discuss these approaches in detail and how they form part of the probabilistic approach.
Hermann Ney is a full professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of statistical classification, machine learning, neural networks and human language technology and specific applications to speech recognition, machine translation and handwriting recognition.
In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on machine translation. His work has resulted in more than 700 conference and journal papers (h-index 95, 50000+ citations; estimated using Google scholar). He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects.
Hermann Ney is a fellow of both IEEE and ISCA (Int. Speech Communication Association). In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior DIGITEO chair at LIMIS/CNRS in Paris, France. In 2013, he received the award of honour of the International Association for Machine Translation. In 2016, he was awarded an advanced grant of the European Research Council (ERC).
I-Requisites for a Cognitive Architecture (intermediate)
II- Putting it all together (intermediate)
III- Current work (advanced)
Jose C. Principe is a Distinguished Professor of Electrical and Computer Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs). He is Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu. The CNEL Lab innovated signal and pattern recognition principles based on information theoretic criteria, as well as filtering in functional spaces. His secondary area of interest has focused in applications to computational neuroscience, Brain Machine Interfaces and brain dynamics. Dr. Principe is a Fellow of the IEEE, AIMBE, and IAMBE. Dr. Principe received the Gabor Award, from the INNS, the Career Achievement Award from the IEEE EMBS and the Neural Network Pioneer Award, of the IEEE CIS. He has more than 38 patents awarded over 800 publications in the areas of adaptive signal processing, control of nonlinear dynamical systems, machine learning and neural networks, information theoretic learning, with applications to neurotechnology and brain computer interfaces. He directed 97 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, “Kernel Adaptive Filtering”, Wiley and “System Parameter Adaption: Information Theoretic Criteria and Algorithms”, Elsevier. He has received four Honorary Doctor Degrees, from Finland, Italy, Brazil and Colombia, and routinely serves in international scientific advisory boards of Universities and Companies. He has received extensive funding from NSF, NIH and DOD (ONR, DARPA, AFOSR).
This tutorial aims to introduce the fundamentals of adversarial machine learning, presenting a well-structured review of recently-proposed techniques to assess the vulnerability of machine-learning algorithms to adversarial attacks (both at training and at test time), and some of the most effective countermeasures proposed to date. We consider these threats in different application domains, including object recognition in images, biometric identity recognition, spam and malware detection.
This tutorial motivates and explains a topic of emerging importance for AI, and it is particularly devoted to:
No knowledge of the tutorial topics is assumed. A basic knowledge of machine learning and statistical pattern classification is requested.
Fabio Roli is a Professor of Computer Engineering at the University of Cagliari, Italy, and Director of the Pattern Recognition and Applications laboratory that he founded from scratch in 1995 and it is now a world-class research lab with 30 staff members, including five tenured Faculty members. He is the R&D manager of the company Pluribus One that he co-founded. He has been doing research on the design of pattern recognition systems for 30 years. Prof. Roli has published 86 journal articles and more than 250 conference articles on pattern recognition and machine learning and many of his papers are frequently cited. His current h-index is 59 according to Google Scholar (March 2019). He has been appointed Fellow of the IEEE and Fellow of the IAPR. He was the President of the Italian Group of Researchers in Pattern Recognition and the Chairman of the IAPR Technical Committee on Statistical Techniques in Pattern Recognition. He was a member of the NATO advisory panel for Information and Communications Security, NATO Science for Peace and Security (2008 – 2011). Prof. Roli is one of the pioneers of the use of pattern recognition and machine learning for computer security. He is often invited to give keynote speeches and tutorials on adversarial machine learning and data-driven technologies for security applications. He is (or has been) the PI of dozens of R&D projects, including the leading European projects on Security & Privacy CyberRoad and ILLBuster.
This course will deal with deep learning algorithms into multimodal and multisensorial signal analysis such as from audio, video, text, or physiological signals. Methods shown will, however, be applicable to a broad range of further signal types. We will first deal with pre-processing for denoising or dereverberation. This will be followed by representation learning such as by convolutional neural networks or sequence-to-sequence encoder-decoder architectures as basis for end-to-end learning from raw signals or symbolic representation. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units including compensation of dynamics by connectionist temporal classification. This will also include discussion of the usage of attention on different levels. We will also elaborate on the impact of topologies including multiple targets with shared layers and bottlenecks, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with data efficiency, such as by weak supervision with the human in the loop based on data augmentation, active and semi-supervised learning, transfer learning, or generative adversarial networks. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will come from the domains of Affective Computing, Multimedia Retrieval, and mHealth.
Attendees should be generally familiar with Machine Learning and Neural Networks in general. They should further have basic knowledge of Signal Processing.
Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM - the Group on Language Audio & Music - at Imperial College London/UK, Full Professor and ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING - an Audio Intelligence company based near Munich and in Berlin/Germany, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Before, he was Full Professor at the University of Passau/Germany, with Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 800+ publications (23000+ citations, h-index=70), and was Editor in Chief of the IEEE Transactions on Affective Computing, is General Chair of ACII 2019, ACII Asia 2018, and ACM ICMI 2014, and a Program Chair of Interspeech 2019, ACM ICMI 2019/2013, ACII 2015/2011, and IEEE SocialCom 2012 amongst manifold further commitments and service to the community. His 30+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 10+ European Projects, is an ERC Starting Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.
Dive into Deep Learning
For details see http://www.d2l.ai and the associated UC Berkeley STAT-157 class at http://courses.d2l.ai . The course will cover a set of chapters from the book / website. All slides, notebooks and text are available on Github at https://github.com/diveintodeeplearning/d2l-en.P
Alex Smola is Distinguished Scientist/VP at Amazon Web Services in Palo Alto and Adjunct Professor at UC Berkeley. Prior to that he was full professor at Carnegie Mellon University. He worked at Google, Yahoo, the Australian National University and National ICT Australia. Alex received his PhD at TU Berlin. His research interests are Kernel Methods, Bayesian Nonparametrics, Deep Learning, Systems and Machine Learning, and algorithms on Graphs. He has written (or edited) 5 books and has published over 200 papers.
Today’s AI approaches based on deep learning perform perceptual and other tasks exceedingly well. However the methods optimize the solution to each task, without considering the interpretability of the solution by humans. In tasks where human judgment is a necessary component, as in medicine and the courtroom, it is necessary for the decision by the AI system to be accompanied by an explanation. We will describe different types of explainability. Then go over several approaches to explainability in AI, emphasizing probabilistic approaches. We will take the example of forensic comparison and show how a high performance deep learning system and an explainable system can coexist.
A course on Introductory machine learning covering the main topics such as described in https://cedar.buffalo.edu/~srihari/CSE574/index.html
Srihari is a SUNY Distinguished Professor in the Department of Computer Science and Engineering at the University at Buffalo, The State University of New York. He teaches a sequence of three courses in artificial intelligence and machine learning: (i) introduction to machine learning, (ii) probabilistic graphical models and (iii) deep learning. Srihari’s work led to the world’s first automated system for reading handwritten postal addresses. It was deployed by the United States Postal Service saving hundreds of millions of dollars in labor costs. A side-effect was that it led to the task of recognizing handwritten digits to be considered the fruit-fly of AI methods. Srihari also spent a decade developing AI and machine learning methods for forensic pattern evidence such as latent prints, handwriting and footwear impressions. In particular, quantifying the value of handwriting evidence-- to allow presenting such testimony in US courts. Srihari's honors include: Fellow of the IEEE, Fellow of the International Association for Pattern Recognition and distinguished alumnus of the Ohio State University College of Engineering Srihari received a B.Sc. from the Bangalore University, a B.E. from the Indian Institute of Science and a Ph.D. in Computer and Information Science from the Ohio State University.
This presentation will primarily focus on learning algorithms with reduced iterations or no iterations at all. Some of the algorithms have closed form solutions while some of the algorithms do not adjust the structures once constructed. The main algorithms considered in this talk are randomized neural networks, kernel ridge regression and random forest. These non-iterative methods have attracted attention of researchers due to their high performance in terms of accuracy as well as their ability to train fast due to their non-iterative properties or closed form solutions. For example, the random forest delivers the top classification performance. The presentation will also include the basic methods as well as their state of the art variants. These algorithms will be benchmarked using classification, time series forecasting and visual tracking datasets. Future research directions will also be suggested.
(Additional References will be included in the lecture materials)
Basic knowledge of neural networks, pattern classification, kernel methods, and decision trees will be advantageous.
Ponnuthurai Nagaratnam Suganthan (or P N Suganthan) received the B.A degree, Postgraduate Certificate and M.A degree in Electrical and Information Engineering from the University of Cambridge, UK in 1990, 1992 and 1994, respectively. After completing his PhD research in 1995, he served as a pre-doctoral Research Assistant in the Dept of Electrical Engineering, University of Sydney in 1995–96 and a lecturer in the Dept of Computer Science and Electrical Engineering, University of Queensland in 1996–99. He moved to NTU in 1999. He was an Editorial Board Member of the Evolutionary Computation Journal, MIT Press (2013-2018) and an associate editor of the IEEE Trans on Cybernetics (2012 - 2018). He is an associate editor of an IEEE Trans on Evolutionary Computation (2005 -), Information Sciences (Elsevier) (2009 - ), Pattern Recognition (Elsevier) (2001 - ) and Int. J. of Swarm Intelligence Research (2009 - ) Journals. He is a founding co-editor-in-chief of Swarm and Evolutionary Computation (2010 - ), an SCI Indexed Elsevier Journal. His co-authored SaDE paper (published in April 2009) won the "IEEE Trans. on Evolutionary Computation outstanding paper award" in 2012. His former PhD student, Dr Jane Jing Liang, won the IEEE CIS Outstanding PhD dissertation award, in 2014. His research interests include swarm and evolutionary algorithms, pattern recognition, big data, deep learning and applications of swarm, evolutionary & machine learning algorithms. He was selected as one of the highly cited researchers by Thomson Reuters in 2015, 2016, 2017 and 2018 in computer science. He served as the General Chair of the IEEE SSCI 2013. He has been a member of the IEEE since 1991 and Fellow since 2015. He is an IEEE CIS distinguished lecturer (DLP) in 2018-2020. He was an elected AdCom member of the IEEE Computational Intelligence Society (CIS) in 2014-2016. Google Scholar: http://scholar.google.com.sg/citations?hl=en&user=yZNzBU0AAAAJ&view_op=list_works&pagesize=100
Neural networks & Deep learning and Support vector machines & Kernel methods have been among the most powerful and successful techniques in machine learning and data driven modelling. Initially, in artificial neural networks, the use of one hidden layer feedforward networks was common because of their universal approximation property. However, the existence of many local minima solutions in the training process was encountered as a drawback. Therefore, support vector machines and kernel methods became widely used, relying on solving convex optimization problems in classification and regression. In the meantime, computing power has increased and data have become abundantly available in many applications. As a result, currently one can afford training deep models consisting of (many) more layers and interconnection weights. Examples of successful deep learning models are convolutional neural networks, residual neural networks, stacked autoencoders, deep Boltzmann machines, deep generative models and generative adversarial networks. However, recent works related to understanding generalization, efficient training and adversarial networks, indicate that achieving complementary insights from kernel-based approaches and deep learning will become increasingly important.
In this course we will explain several synergies between neural networks, deep learning, least squares support vector machines and kernel methods. A key role at this point is played by primal and dual model representations and different duality principles. Recent developments on Restricted Kernel Machines will be highlighted, revealing new insights between neural networks, deep learning and kernel methods. In this way the bigger and unifying picture will be obtained and future perspectives will be outlined.
The material is organized into 3 parts:
In Part I a basic introduction is given to support vector machines (SVM) and kernel methods with emphasis on their artificial neural networks (ANN) interpretations. The latter can be understood in view of primal and dual model representations, expressed in terms of the feature map and the kernel function, respectively. Related to least squares support vector machines (LS-SVM) such characterizations exist for supervised and unsupervised learning, including classification, regression, kernel principal component analysis (KPCA), kernel spectral clustering (KSC), kernel canonical correlation analysis (KCCA), and other. Primal and dual representations are also relevant in order to obtain efficient training algorithms, tailored to the nature of the given application (high dimensional input spaces versus large data sizes). Application examples are given e.g. in black-box weather forecasting, pollution modelling, prediction of energy consumption, and community detection in networks.
In Part II we explain how to obtain a so-called restricted kernel machine (RKM) representation for least squares support vector machine related models. By using a principle of conjugate feature duality it is possible to obtain a similar representation as in restricted Boltzmann machines (RBM) (with visible and hidden units), which are used in deep belief networks (DBN) and deep Boltzmann machines (DBM). The principle is explained both for supervised and unsupervised learning. Related to kernel principal component analysis a generative model is obtained within the restricted kernel machine framework. In such a generative model the trained model is able to generate new data examples. The use of tensor-based models is also very natural within this new RKM framework.
In Part III deep restricted kernel machines (Deep RKM) are explained which consist of restricted kernel machines taken in a deep architecture. In these models a distinction is made between depth in a layer sense and depth in a level sense. Links and differences with stacked autoencoders and deep Boltzmann machines are given. The framework enables to conceive both deep feedforward neural networks (DNN) and deep kernel machines, through primal and dual model representations. In this case one has multiple feature maps over the different levels in companion with multiple kernel functions. By fusing the objectives of the different levels (e.g. several KPCA levels followed by an LS-SVM classifier) in the deep architecture, the training process becomes faster and gives improved solutions. Different training algorithms and methods for large data sets will be discussed.
Finally, based on the newly obtained insights, future perspectives and challenges will be outlined.
Belkin M., Ma S., Mandal S., To understand deep learning we need to understand kernel learning, Proceedings of Machine Learning Research, 80:541-549, 2018.
Bengio Y., Learning deep architectures for AI, Boston: Now, 2009.
Bietti A., Mialon G., Chen D., Mairal J., A Kernel Perspective for Regularizing Deep Neural Networks, arXiv:1810.00363.
Binkowski M., Sutherland D.J., Arbel M., Gretton A., Demystifying MMD GANs, ICLR 2018.
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative Adversarial Networks, pp. 2672-2680, NIPS 2014.
Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge, MA: MIT Press, 2016.
Hinton G.E., What kind of graphical model is the brain?, In Proc. 19th International Joint Conference on Artificial Intelligence, pp. 1765-1775, 2005.
Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527-1554, 2006.
Houthuys L., Suykens J.A.K., Tensor Learning in Multi-View Kernel PCA, in Proc. of the 27th International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, pp. 205-215, Oct. 2018.
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 436-444, 2015.
Mall R., Langone R., Suykens J.A.K., Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks, PLOS ONE, e99966, 9(6), 1-18, 2014.
Mehrkanoon S., Suykens J.A.K., Deep hybrid neural-kernel networks using random Fourier features, Neurocomputing, Vol. 298, pp. 46-54, July 2018.
Mhaskar H., Liao Q., Poggio T., Learning Functions: When is Deep Better than Shallow, CBMM Memo No. 045, 2016.
Montavon G., Muller K.-R., Cuturi M., Wasserstein Training of Restricted Boltzmann Machines, pp. 3718-3726, NIPS 2016.
Salakhutdinov R., Hinton G.E., Deep Boltzmann machines, Proceedings of Machine Learning Research, 5:448-455, 2009.
Salakhutdinov R., Learning deep generative models, Annu. Rev. Stat. Appl., 2, 361-385, 2015.
Scholkopf B., Smola A., Muller K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10:1299-1319, 1998.
Scholkopf B., Smola A., Learning with kernels, Cambridge, MA: MIT Press,
Schreurs J., Suykens J.A.K., Generative Kernel PCA, ESANN 2018.
Suykens J.A.K., Vandewalle J., Training multilayer perceptron classifiers based on a modified support vector method, IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 907-911, Jul. 1999.
Suykens J.A.K., Vandewalle J., Least squares support vector machine classifiers, Neural Processing Letters, vol. 9, no. 3, pp. 293-300, Jun. 1999
Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J., Least squares support vector machines, Singapore: World Scientific,
Suykens J.A.K., Alzate C., Pelckmans K., Primal and dual model representations in kernel-based learning, Statistics Surveys, vol. 4, pp. 148-183, Aug. 2010.
Suykens J.A.K., Deep Restricted Kernel Machines using Conjugate Feature Duality, Neural Computation, vol. 29, no. 8, pp. 2123-2163, Aug. 2017.
Vapnik V., Statistical learning theory, New York: Wiley, 1998.
Zhang C., Bengio S., Hardt M., Recht B., Vinyals O., Understanding deep learning requires rethinking generalization, ICLR 2017.
Basics of linear algebra
Johan A.K. Suykens was born in Willebroek Belgium, May 18 1966. He received the master degree in Electro-Mechanical Engineering and the PhD degree in Applied Sciences from the Katholieke Universiteit Leuven, in 1989 and 1995, respectively. In 1996 he has been a Visiting Postdoctoral Researcher at the University of California, Berkeley. He has been a Postdoctoral Researcher with the Fund for Scientific Research FWO Flanders and is currently a full Professor with KU Leuven. He is author of the books "Artificial Neural Networks for Modelling and Control of Non-linear Systems" (Kluwer Academic Publishers) and "Least Squares Support Vector Machines" (World Scientific), co-author of the book "Cellular Neural Networks, Multi-Scroll Chaos and Synchronization" (World Scientific) and editor of the books "Nonlinear Modeling: Advanced Black-Box Techniques" (Kluwer Academic Publishers), "Advances in Learning Theory: Methods, Models and Applications" (IOS Press) and "Regularization, Optimization, Kernels, and Support Vector Machines" (Chapman & Hall/CRC). In 1998 he organized an International Workshop on Nonlinear Modelling with Time-series Prediction Competition. He has served as associate editor for the IEEE Transactions on Circuits and Systems (1997-1999 and 2004-2007), the IEEE Transactions on Neural Networks (1998-2009) and the IEEE Transactions on Neural Networks and Learning Systems (from 2017). He received an IEEE Signal Processing Society 1999 Best Paper Award and several Best Paper Awards at International Conferences. He is a recipient of the International Neural Networks Society INNS 2000 Young Investigator Award for significant contributions in the field of neural networks. He has served as a Director and Organizer of the NATO Advanced Study Institute on Learning Theory and Practice (Leuven 2002), as a program co-chair for the International Joint Conference on Neural Networks 2004 and the International Symposium on Nonlinear Theory and its Applications 2005, as an organizer of the International Symposium on Synchronization in Complex Networks 2007, a co-organizer of the NIPS 2010 workshop on Tensors, Kernels and Machine Learning, and chair of ROKS 2013. He has been awarded an ERC Advanced Grant 2011 and 2017, and has been elevated IEEE Fellow 2015 for developing least squares support vector machines.
Neuroscience and artficial intelligence have long benefited from strong mutual interactions. Neuroscience has provided initial models of neural architectures to solve cognitive problems, such as vision, sound or language processing. Conversely, the spectacular development of artificial architectures in the last decade has brought candidates model to analyse the functional architecture of several brain systems. Importantly, leveraging neuroimaging data requires the analysis of large amounts of brain images or signals, making it a canonical example of larg-escale structured signal analysis. In this course, we will review the interactions between neuroscience and AI, then discuss current challenges regarding the interactions of these fields.
Bertrand Thirion is the leader of the Parietal team, part of INRIA research institute, Saclay, France, that addresses the development of statistics and machine learning techniques for brain imaging. He contributes both algorithms and software, with a special focus on functional neuroimaging applications. He is involved in the Neurospin (CEA) neuroimaging center, one of the leading places on the use of high-field MRI for brain imaging. Bertrand Thirion is also leader of the DATAIA initiative that coordinates data science and AI research in the main French campus (Paris Saclay).
The success of deep-learning hinges on intermediate representations: transformations of the data on which statistical learning is easier. Deep architectures can extract very rich and powerful representations, but it needs huge volumes of data. In this course, we will study the fundamentals of simple representations. Simple representations are interesting because they can be learned in limited data settings. We will also use them to provide didactic cases to understand how to build statistical models from data. The goal of the course is to provide the basic mathematical concepts that underly successful representation extracted in limited data settings.
— Shallow representations: what and why?
— Matrix factorizations and its variants:
— Fisher kernels: vector representations from a data model
Gaël Varoquaux is a computer-science researcher at Inria. His research focuses on statistical learning tools for data science and scientific inference. He has pioneered the use of machine learning on brain images to map cognition and pathologies. More generally, he develops tools to make machine learning easier, with statistical models suited for real-life, uncurated data, and software for data science. He co-funded scikit-learn, one of the reference machine-learning toolboxes, and helped build various central tools for data analysis in Python. Varoquaux has contributed key methods for learning on spatial data, matrix factorizations, and modeling covariance matrices. He has a PhD in quantum physics and is a graduate from Ecole Normale Superieure, Paris.
The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections
Basic understanding of sparse and low-rank representation and non-convex optimization.
Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for "outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.
Big data holds the potential to solve many challenging problems, and one of them is natural language understanding. As an example, big data has enabled the breakthrough in machine translation. However, natural language understanding still faces tremendous challenges. It has been shown that in areas such as question answering and conversation, domain knowledge is indispensable. Thus, how to acquire, represent, and apply domain knowledge for text understanding is of critical importance. In this short course, I will focus on understanding short text, which is crucial to many applications. Short texts do not always observe the syntax of a written language. As a result, traditional natural language processing methods cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text processing such as topic modeling. Third, short texts are usually more ambiguous. I will go over various techniques in knowledge acquisition, representation, and inferencing has been proposed for text understanding, and will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.
Haixun Wang is an IEEE fellow and a VP of Engineering and Distinguished Scientist at WeWork, where he leads the Research and Applied Science division. He was Director of Natural Language Processing at Amazon. Before Amazon, he led the NLP Infra team in Facebook working on Query and Document Understanding. From 2013 to 2015, he was with Google Research, working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. His knowledge base project Probase has created significant impact in industry and academia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 – 2009. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in Computer Science from the University of California, Los Angeles in 2000. He has published more than 150 research papers in referred international journals and conference proceedings. He served PC Chair of conferences such as CIKM’12, and he is on the editorial board of journals such as IEEE Transactions of Knowledge and Data Engineering (TKDE) and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10-year best paper award in ICDM 2013, and best paper award of ER 2009.
Complex semantic meaning in natural language is hard to be mined using computational approach. Deep language models learning a hierarchical representation proved to be a powerful tool for natural language processing, text mining and information retrieval. This course will cover the models for word embedding and learning representations of text for information retrieval and text mining. The topic includes an introduction of language models for word embedding. It is followed by a presentation of recent multi-resolution models that represent documents at multiple resolutions in term of abstract levels. More specifically, we first form a mixture of weighted representations across the whole hierarchy of a given word embedding model, so that all resolutions of the hierarchical representation are preserved for the downstream model. In addition we combine all mixture representations from various models as an ensemble representation. Finally, the application for information retrieval and other text mining tasks is presented in the course.
1.1. Vector space model
2.1. Multi-resolution word embedding
2.2. Ensemble models
Basic knowledge of linear algebra and machine learning.
Xiaowei Xu, a professor of Information Science at the University of Arkansas, Little Rock (UALR), received his Ph.D. degree in Computer Science at the University of Munich in 1998. Before his appointment in UALR, he was a senior research scientist in Siemens, Munich, Germany. His research spans data mining, machine learning, bioinformatics, database management systems and high-performance computing. Dr. Xu is a recipient of 2014 ACM SIGKDD Test of Time award for his contribution to the density-based clustering algorithm DBSCAN.
The goal is to introduce the recent advances in object tracking based on deep learning and related approaches. Performance evlaution and challenging factors in this field will be discussed.
Y. Wu, J. Lim, and M.-H. Yang, Object Tracking Benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
H. Nam and B. Han, Learning Multi-domain Convolutional Neural Networks for Visual Tracking, CVPR, 2016.
M. Danelljan, G. Bhat, F. Khan, M. Felsberg, ECO: Efficient Convolution Operators for Tracking. CVPR, 2017.
Basic knowledge in computer vision and intermediate knowledge in deep learning
Ming-Hsuan Yang is a Professor of Electrical Engineering and Computer Science at University of California, Merced, and a Research Scientist at Google Cloud. He serves as a program co-chair of IEEE International Conference on Computer Vision (ICCV) in 2019, program co-chair of Asian Conference on Computer Vision (ACCV) in 2014, and general co-chair of ACCV 2016. He has served as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2007 to 2011, and currently serves as an associate editor of the International Journal of Computer Vision (IJCV), Computer Vision and Image Understanding (CVIU), Image and Vision Computing (IVC) and Journal of Artificial Intelligence (JAIR). Yang received the Google Faculty Award in 2009 and the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012. In 2015. He received paper awards from UIST 2017, CVPR 2018 and ACCV 2018. He is an IEEE Fellow.
This course aims at exposing the audience an introduction to the fundamental theories and advanced methods on knowledge discovery from complex data with deep learning. It is a well-accepted fact that today we are drown with data. Further typically the data we face are complex data. Complex data refer to the most comprehensive data formats we encounter. By complex data, it is meant that the data may be non-structural, media data such as text, imagery, video, audio, and graphics/animation, and/or that the data may be in the other modalities including time-series data, sequential data, and relational data that violate the i.i.d. assumption such as the social network data, e-commerce data, financial interaction/transaction data, and cyber communication/attack data where the data can be represented as a multi-type node graph involving multiple players, and further that the data may be noisy, meaning that not only the data per se can be noisy, but also the given training labels, if there are any, can also be noisy, such as in the scenario of image annotation or classification where the given training labels can be imperfect (incorrect and/or incomplete). Consequently, complex data represent the most commonly encountered data in our daily life and also in almost all the real-world applications and thus, it is also extremely challenging to develop theories on complex data learning.
The course begins with an extensive introduction to the fundamental concepts and theories of knowledge discovery from complex data, as well as the relevant deep learning theories required for knowledge discovery from complex data; then the course showcases several important applications as case studies in the real-world as examples for knowledge discovery from complex data with deep learning.
The course consists of three an hour and half sessions. The syllabus is as follows:
College math, fundamentals about computer science
Zhongfei (Mark) Zhang is a full professor of Computer Science at State University of New York (SUNY) at Binghamton, and directs the Multimedia Research Computing Laboratory in the University. He has also served as a QiuShi Chair Professor at Zhejiang University, China, and as the Director of the Data Science and Engineering Research Center at the university, and as a CNRS Chair Professor at the University of Lille 1, France, while he was on leave from State University of New York (SUNY) at Binghamton, USA. He has received a B.S. in Electronics Engineering (with Honors), an M.S. in Information Sciences, both from Zhejiang University, China, and a PhD in Computer Science from the University of Massachusetts at Amherst, USA. His research interests include machine learning and artificial intelligence, data mining and knowledge discovery, multimedia information indexing and retrieval, computer vision, and pattern recognition. He is the author and co-author of the first monograph on multimedia data mining and the first monograph on relational data clustering, respectively. His research is sponsored by a wide spectrum of government funding agencies, industrial labs, as well as private agencies. He has published over 200 papers in premier venues in his areas and is an inventor for more than 30 patents. He has served in several journal editorial boards and received several professional awards including best paper awards in the premier conferences in his areas.