Home
Videos uploaded by user “KDD2016 video”
KDD2016 paper 573
 
02:55
Title: "Why Should I Trust You?": Explaining the Predictions of Any Classifier Authors: Marco Túlio Ribeiro*, University of Washington Sameer Singh, University of Washington Carlos Guestrin, University of Washington Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 51262 KDD2016 video
KDD2016 paper 819
 
02:20
Title: DopeLearning: A Computational Approach to Rap Lyrics Generation Authors: Eric Malmi*, Aalto University Pyry Takala, Aalto University Hannu Toivonen, University of Helsinki Tapani Raiko, Aalto University Aristides Gionis, Aalto University Abstract: Writing rap lyrics requires both creativity to construct a meaningful, interesting story and lyrical skills to produce complex rhyme patterns, which form the cornerstone of good flow. We present a rap lyrics generation method that captures both of these aspects. First, we develop a prediction model to identify the next line of existing lyrics from a set of candidate next lines. This model is based on two machine-learning techniques: the RankSVM algorithm and a deep neural network model with a novel structure. Results show that the prediction model can identify the true next line among 299 randomly selected lines with an accuracy of 17%, i.e., over 50 times more likely than by random. Second, we employ the prediction model to combine lines from existing songs, producing lyrics with rhyme and a meaning. An evaluation of the produced lyrics shows that in terms of quantitative rhyme density, the method outperforms the best human rappers by 21%. The rap lyrics generator has been deployed as an online tool called DeepBeat, and the performance of the tool has been assessed by analyzing its usage logs. This analysis shows that machine-learned rankings correlate with user preferences. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 18866 KDD2016 video
"Why Should I Trust you?" Explaining the Predictions of Any Classifier
 
24:26
Author: Marco Tulio Ribeiro, Department of Computer Science and Engineering, University of Washington Abstract: Despite widespread adoption, machine learning models re- main mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 10287 KDD2016 video
KDD2016 paper 461
 
03:01
Title: Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering Authors: Steven H. H. Ding, McGill University Benjamin C. M. Fung*, McGill University Philippe Charland, Defence Research and Development Canada Abstract: Assembly code analysis is one of the critical processes for detecting and justifying software plagiarism and software patent infringements when the source code is unavailable. It is also a common practice to discover exploits and vulnerabilities in existing software. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process, since it can identify the cloned parts that have been previously analyzed. The assembly code clone search problem belongs to the field of software engineering. However, it strongly depends on practical nearest neighbor search techniques in data mining and database. By closely collaborating with reverse engineers and Defence Research and Development Canada (DRDC), we study the concerns and challenges that make existing assembly code clone approaches not practically applicable from the perspective of data mining. We propose a new variant of LSH scheme and incorporate it with graph matching to address these challenges. We implement an integrated assembly clone search engine called Kam1n0. It is the first clone search engine that can efficiently identify the given query assembly function’s subgraph clones from a large assembly code repository. Kam1n0 is built upon the Apache Spark computation framework and Cassandra-like key-value distributed storage. The deployed system is publicly available and readers can try out its beta version on Google Cloud. Extensive experimental results suggest that Kam1n0 is accurate, efficient, and scalable for handling large volume of assembly code. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 3282 KDD2016 video
KDD2016 paper 683
 
03:29
Title: MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Authors: Prithu Banerjee*, UBC Pranali Yawalkar, IIT Madras Sayan Ranu, IIT Madras Abstract: In this paper, we study the problem of mining temporally anomalous sub-trajectory patterns from trajectory streams. Given the prevailing road conditions, a sub-trajectory is temporally anomalous if its travel time deviates significantly from the expected time. Mining these patterns requires us to delve into the sub-trajectory space, which is not scalable for real-time analytics. To overcome this scalability challenge, we design a technique called MANTRA. We study the properties unique to anomalous sub-trajectories and utilize them in MANTRA to iteratively refine the search space into a disjoint set of sub-trajectory islands. The expensive enumeration of all possible sub-trajectories is performed only on the islands to compute the answer set of maximal anomalous sub-trajectories. Extensive experiments on both real and synthetic datasets establish MANTRA as more than 3 orders of magnitude faster than baseline techniques. Moreover, through trajectory classification and segmentation, we demonstrate that the proposed model conforms to human intuition. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 3828 KDD2016 video
KDD2016 paper 392
 
02:48
Title: Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks Authors: Jung-Woo Ha*, NAVER LABS Hyuna Pyo, NAVER LABS Jeonghee Kim, NAVER LABS Abstract: Precise item categorization is a key issue in e-commerce domains. However, it still remains a challenging problem due to data size, category skewness, and noisy metadata. Here, we demonstrate a successful report on a deep learning-based item categorization method, i.e., deep categorization network (DeepCN), in an e-commerce website. DeepCN is an end-to-end model using multiple recurrent neural networks (RNNs) dedicated to metadata attributes for generating features from text metadata and fully connected layers for classifying item categories from the generated features. The categorization errors are propagated back through the fully connected layers to the RNNs for weight update in the learning process. This deep learning-based approach allows diverse attributes to be integrated into a common representation, thus overcoming sparsity and scalability problems. We evaluate DeepCN on large-scale real-world data including more than 94 million items with approximately 4,100 leaf categories from a Korean e-commerce website. Experiment results show our method improves the categorization accuracy compared to the model using single RNN as well as a standard classification model using unigram-based bag-of-words. Furthermore, we investigate how much the model parameters and the used attributes influence categorization performances. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 1569 KDD2016 video
KDD2016 paper 958
 
02:19
Title: Scalable Betweenness Centrality Maximization via Sampling Authors: Ahmad Mahmoody*, Brown University Eli Upfal, Brown University Charalampos Tsourakakis, Harvard Abstract Betweenness centrality is a fundamental centrality measure in social network analysis. Given a large-scale network, how can we find the most central nodes? This question is of great importance to many key applications that rely on BWC, including community detection and understanding graph vulnerability. Despite the large amount of work on scalable approximation algorithm design for BWC, estimating BWC on large-scale networks remains a computational challenge. In this paper, we study the Centrality Maximization problem. We present an efficient randomized algorithm that provides approximation with high probability. Our results improve the current state-of-the-art result. Furthermore, we provide the first theoretical evidence for the validity of a crucial assumption in betweenness centrality estimation, namely that in real-world networks shortest paths pass through the top-k central nodes, where k is a constant. This also explains why our algorithm runs in near linear time on real- world networks. We also show that our algorithm and analysis can be applied to a wider range of centrality measures, by providing a general analytical framework. On the experimental side, we perform an extensive experimental analysis of our method on real-world networks, demonstrate its accuracy and scalability, and study different properties of central nodes. Then, we compare the sampling method used by the state-of-the-art algorithm with our method. Furthermore, we perform a study of BWC in time evolving networks, and see how the centrality of the central nodes in the graphs changes over time. Finally, we compare the performance of the stochastic Kronecker model to real data, and observe that it generates a similar growth pattern. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 1907 KDD2016 video
KDD2016 paper 303
 
03:15
Title: Multi-layer Representation Learning for Medical Concepts Authors: Edward Choi*, Georgia Institute of Technology Mohammad Taha Bahador, Georgia Institute of Technology Elizabeth Searles, Children Healthcare of Atlanta Catherine Coffey, Children Healthcare of Atlanta Jimeng Sun, Georgia Institute of Technology Abstract: Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification. Proper representations of medical concepts such as diagnosis, medication, procedure codes and visits will have broad applications in healthcare analytics. However, in Electronic Health Records (EHR) the visit sequences of patients include multiple concepts (diagnosis, procedure, and medication codes) per visit. This structure provides two types of relational information, namely sequential order of visits and co-occurrence of the codes within each visit. In this work, we propose Med2Vec, which not only learns distributed representations for both medical codes and visits from a large EHR dataset with over 3 million visits, but also allows us to interpret the learned representations confirmed positively by clinical experts. In the experiments, Med2Vec displays significant improvement in key medical applications compared to popular baselines such as Skip-gram, GloVe and stacked autoencoder, while providing clinically meaningful interpretation. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 1863 KDD2016 video
KDD2016 paper 351
 
02:49
Title: Transferring Knowledge between Cities: A Perspective of Multimodal Data and A Case Study in Air Qual Authors: Ying Wei*, Hong Kong University of Science and Technology Yu Zheng, Microsoft Research Qiang Yang, Hong Kong University of Science and Technology Abstract: The rapid urbanization has motivated extensive research on urban computing. It is critical for urban computing tasks to unlock the power of the diversity of data modalities generated by different sources in urban spaces, such as vehicles and humans. However, we are more likely to encounter the label scarcity problem and the data insufficiency problem when solving an urban computing task in a city where services and infrastructures are not ready or just built. In this paper, we propose a FLexible multimOdal tRAnsfer Learning (FLORAL) method to transfer knowledge from a city where there exist sufficient multimodal data and labels, to this kind of cities to fully alleviate the two problems. FLORAL learns semantically related dictionaries for multiple modalities from a source domain, and simultaneously transfers the dictionaries and labelled instances from the source into a target domain. We evaluate the proposed method with a case study of air quality prediction. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 563 KDD2016 video
KDD2016 paper 283
 
02:18
Title: When Social Influence Meets Item Inference Authors: Hui-Ju Hung*, The Pennsylvania State University Hong-Han Shuai, Academia Sinica De-Nian Yang, Academia Sinica Liang-Hao Huang, Academia Sinica Wang-Chien Lee, The Pennsylvania State University Jian Pei, Simon Fraser University Ming-Syan Chen, National Taiwan University Abstract: Research issues and data mining techniques for product recommendation and viral marketing have been widely studied. Existing works on seed selection in social networks do not take into account the effect of product recommendations in e-commerce stores. In this paper, we investigate the seed selection problem for viral marketing that considers both effects of social influence and item inference (for product recommendation). We develop a new model, Social Item Graph (SIG), that captures both effects in form of hyperedges. Accordingly, we formulate a seed selection problem, called Social Item Maximization Problem (SIMP), and prove the hardness of SIMP. We design an efficient algorithm with performance guarantee, called Hyperedge-Aware Greedy (HAG), for SIMP and develop a new index structure, called SIG-index, to accelerate the computation of diffusion process in HAG. Moreover, to construct realistic SIG models for SIMP, we develop a statistical inference based framework to learn the weights of hyperedges from data. Finally, we perform a comprehensive evaluation on our proposals with various baselines. Experimental result validates our ideas and demonstrates the effectiveness and efficiency of the proposed model and algorithms over baselines. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 2217 KDD2016 video
KDD2016 paper 679
 
04:29
Title: Structural Neighborhood Based Classification of Nodes in a Network Authors: Sharad Nandanwar*, Indian Institute of Science Musti Narasimha Murty, Indian Institute of Science Abstract: Classification of entities based on the underlying network structure is an important problem. Networks encountered in practice are sparse and have many missing and noisy links. Even though statistical learning techniques have been used for intra-network classification based on local neighborhood, they perform poorly as they exploit only local information. In this paper, we propose a novel structural neighborhood based learning using a random walk. For classifying a node we take a random walk from the corresponding node, and make a decision based on how nodes in the respective k th - level neighborhood are getting classified. We observe that random walks of short length are helpful in classification. Emphasizing role of longer random walk may cause under- lying markov-chain to converge towards stationary distribu- tion. Considering this, we take a lazy random walk based ap- proach with variable termination probability for each node, based on its structural properties including degree. Our ex- perimental study on real world datasets demonstrates the superiority of the proposed approach over the existing state- of-the-art approaches. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 2497 KDD2016 video
KDD2016 paper 1036
 
02:10
Title: Gemello: Creating a Detailed Energy Breakdown from just the Monthly Electricity Bill Authors: Nipun Batra*, Indraprastha Institute of Information Technology, Delhi Amarjeet Singh, Indraprastha Institute of Information Technology, Delhi Kamin Whitehouse, University of Virginia Abstract: The first step to saving energy in the home is often to create an energy breakdown: the amount of energy used by each individual appliance in the home. Unfortunately, current techniques that produce an energy breakdown are not scalable: they require hardware to be installed in each and every home. In this paper, we propose a more scalable solution called Gemello that estimates the energy breakdown for one home by matching it with similar homes for which the breakdown is already known. This matching requires only the monthly energy bill and household characteristics such as square footage of the home and the size of the household. We evaluate this approach using 57 homes and results indicate that the accuracy of Gemello is comparable to or better than existing techniques that use sensing infrastructure in each home. The information required by Gemello is often publicly available and, as such, it can be immediately applied to many homes around the world. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 8412 KDD2016 video
Learning to learn and compositionality with deep recurrent neural networks
 
01:23:45
Author: Nando de Freitas, Department of Computer Science, University of Oxford Abstract: Deep neural network representations play an important role in computer vision, speech, computational linguistics, robotics, reinforcement learning and many other data-rich domains. In this talk I will show that learning-to-learn and compositionality are key ingredients for dealing with knowledge transfer so as to solve a wide range of tasks, for dealing with small-data regimes, and for continual learning. I will demonstrate this with three examples: learning learning algorithms, neural programmers and interpreters, and learning communication. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 15018 KDD2016 video
KDD2016 paper 890
 
02:46
Title: Just One More: Modeling Binge Watching Behavior Authors: William Trouleau*, EPFL Azin Ashkan, Technicolor Research Weicong Ding, Technicolor Research Brian Eriksson, Technicolor Research Abstract: Easy accessibility can often lead to over-consumption, as seen in food and alcohol habits. On video on-demand (VOD) services, this has recently been referred to as “binge watching”, where potentially entire seasons of TV shows are consumed in a single viewing session. While a user viewership model may reveal this binging behavior, creating an accurate model has several challenges, including censored data, deviations in the population, and the need to consider external influences on consumption habits. In this paper, we introduce a novel statistical mixture model that incorporates these factors and presents a “first of its kind” characterization of viewer consumption behavior using a real-world dataset that includes playback data from a VOD service. From our modeling, we tackle various predictive tasks to infer the consumption decisions of a user in a viewing session, including estimating the number of episodes they watch and classifying if they continue watching another episode. Using these insights, we then identify binge watching sessions based on deviation from normal viewing behavior. We observe different types of binging behavior, that binge watchers often view certain content out-of-order, and that binge watching is not a consistent behavior among our users. These insights and our findings have application in VOD revenue generation, consumer health applications, and customer retention analysis. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 561 KDD2016 video
Serving a Billion Personalized News Feeds
 
39:50
Author: Lars Backstrom, Facebook Abstract: Feed ranking's goal is to provide people with over a billion personalized experiences. We strive to provide the most compelling content to each person, personalized to them so that they are most likely to see the content that is most interesting to them. Similar to a newspaper, putting the right stories above the fold has always been critical to engaging customers and interesting them in the rest of the paper. In feed ranking, we face a similar challenge, but on a grander scale. Each time a person visits, we need to find the best piece of content out of all the available stories and put it at the top of feed where people are most likely to see it. To accomplish this, we do large-scale machine learning to model each person, figure out which friends, pages and topics they care about and pick the stories each particular person is interested in. In addition to the large-scale machine learning problems we work on, another primary area of research is understanding the value we are creating for people and making sure that our objective function is in alignment with what people want. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 2984 KDD2016 video
Attribute Extraction from Product Titles in eCommerce
 
05:22
Author: Ajinkya More, Wal-Mart Stores, Inc. Abstract: This paper presents a named entity extraction system for detecting attributes in product titles of eCommerce retailers like Walmart. The absence of syntactic structure in such short pieces of text makes extracting attribute values a challenging problem. We find that combining sequence labeling algorithms such as Conditional Random Fields and Structured Perceptron with a curated normalization scheme produces an effective system for the task of extracting product attribute values from titles. To keep the discussion concrete, we will illustrate the mechanics of the system from the point of view of a particular attribute - brand. We also discuss the importance of an attribute extraction system in the context of retail websites with large product catalogs, compare our approach to other potential approaches to this problem and end the paper with a discussion of the performance of our system for extracting attributes. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1032 KDD2016 video
Plenary Panel: Is Deep Learning the New 42?
 
01:46:20
Authors: moderator: Andrei Broder, Yahoo! Research panelist: Pedro Domingos, Dept. of Computer Science & Engineering, University of Washington panelist: Nando de Freitas, Department of Computer Science, University of Oxford panelist: Isabelle Guyon, Clopinet panelist: Jitendra Malik, UC Berkeley panelist: Jennifer Neville, Computer Science Department, Purdue University Abstract: The history of deep learning goes back more than five decades but in the marketplace of ideas its perceived value went through booms and busts. We are no doubt at an all time high: in the last couple of years we witnessed extraordinary advances in vision, speech recognition, game playing, translation, and so on, all powered by deep networks. At the same time companies such as Amazon, Apple, Facebook, Google, and Microsoft are making huge bets on deep learning research and infrastructure, ML competitions are dominated by deep learning approaches, open source deep learning software is proliferating, and the popular press both cheerleads the progress and raises the dark specter of unintended consequences. So is deep learning the answer to everything? According to Douglas Adams’s famous “Hitchhiker’s Guide to the Galaxy” after 7.5 millions years of work the “Deep Thought” computer categorically found out that 42 is the “Answer to the Ultimate Question of Life, the Universe, and Everything” (although unfortunately, no one knows exactly what that question was). Rather than wait another 7.5 million years for “Deep Thought” to answer our quest we have assembled a distinguished panel of experts to give us their opinion on deep learning and its present and future impact. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 18084 KDD2016 video
KDD2016 paper 519
 
03:33
Title: A Multiple Test Correction for Streams and Cascades of Statistical Hypothesis Tests Authors: Francois Petitjean*, Monash University Geoff Webb, Monash University Abstract: Statistical hypothesis testing is a popular and powerful tool for inferring knowledge from data. For every such test performed, there is always a non-zero probability of making a false discovery, i.e. rejecting a null hypothesis in error. Familywise error rate (FWER) is the probability of making at least one false discovery during an inference process. The expected FWER grows exponentially with the number of hypothesis tests that are performed, almost guaranteeing that an error will be committed if the number of tests is big enough and the risk is not managed; a problem known as the multiple testing problem. State-of-the-art methods for controlling FWER in a multiple comparison setting require the set of hypotheses to be pre-determined. This renders statistical testing virtually unusable for many modern applications of statistical inference such as model selection, because neither the set of hypotheses that will be tested, nor even the number of hypotheses, can be known in advance. This paper introduces a multiple-testing correction that can be used in applications for which there are repeated pools of null hypotheses from each which a single null hypothesis is to be rejected and neither the specific hypotheses nor their number are known in advance. To demonstrate the importance and relevance of this work to current machine learning problems, we further refine the theory to the problem of model selection and show how to use Stepwise Multiple Testing for learning graphical models. We assess its ability to discover graphical models on more than 7,000 datasets, studying the ability of Stepwise Multiple Testing to outperform the state of the art on data with varying size, dimensionality, as well as with varying density and power of the present correlations. Stepwise Multiple Testing provides a significant improvement in statistical efficiency, often requiring only half as much data to reach the same discovery, while strictly controlling FWER. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 5195 KDD2016 video
KDD2016 paper 450
 
02:16
Title: CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents Authors: Fedor Borisyuk*, LinkedIn Corporation Krishnaram Kenthapadi, LinkedIn Corporation David Stein, LinkedIn Corporation Bo Zhao, LinkedIn Corporation Abstract: User experience at social media and web platforms such as LinkedIn is heavily dependent on the performance and scalability of its products. Applications such as personalized search and recommendations require real-time scoring of millions of structured candidate documents associated with each query, with strict latency constraints. In such applications, the query incorporates the context of the user (in addition to search keywords if present), and hence can become very large, comprising of thousands of Boolean clauses over hundreds of document attributes. Consequently, candidate selection techniques need to be applied since it is infeasible to retrieve and score all matching documents from the underlying inverted index. We propose CaSMoS, a machine learned candidate selection framework that makes use of Weighted AND (WAND) query. Our framework is designed to prune irrelevant documents and retrieve documents that are likely to be part of the top-k results for the query. We apply a constrained feature selection algorithm to learn positive weights for feature combinations that are used as part of the weighted candidate selection query. We have implemented and deployed this system to be executed in real time using LinkedIn’s Galene search platform. We perform extensive evaluation with different training data approaches and parameter settings, and investigate the scalability of the proposed candidate selection model. Our deployment of this system as part of LinkedIn’s job recommendation engine has resulted in significant reduction in latency (up to 30%) without sacrificing the quality of the retrieved results, thereby paving the way for more sophisticated scoring models. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 419 KDD2016 video
Making Strides in Quantifying and Understanding Soccer
 
35:49
Author: Sarah Rudd, StatDNA, LLC Abstract: Soccer has a rich history of people using data in an attempt to gain a better understanding of what happened in a game. However, due to its fluid nature, the sport is often assumed to be difficult to quantify and analyze. This talk will highlight some of the progress that has been made in soccer analytics in recent years, including some of the advances being made thanks to rich, full-tracking datasets. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 2780 KDD2016 video
KDD2016 paper 403
 
02:39
Title: Accelerating Online CP Decompositions for Higher Order Tensors Authors: Shuo Zhou*, University of Melbourne Nguyen Vinh, University of Melbourne James Bailey, University of Melbourne Yunzhe Jia, University of Melbourne Ian Davidson, University of California-Davis Abstract: Tensors are a natural representation for multidimensional data. In recent years, CANDECOMP/PARAFAC (CP) decomposition, one of the most popular tools for analyzing multi-way data, has been extensively studied and widely applied. However, today’s datasets are often dynamically changing over time. Tracking the CP decomposition for such dynamic tensors is a crucial but challenging task, due to the large scale of the tensor and the velocity of new data arriving. Traditional techniques, such as Alternating Least Squares (ALS), cannot be directly applied to this problem because of their poor scalability in terms of time and memory. Additionally, existing online approaches have only partially addressed this problem and can only be deployed on third-order tensors. To fill this gap, we propose an efficient online algorithm that can incrementally track the CP decompositions of dynamic tensors with an arbitrary number of dimensions. In terms of effectiveness, our algorithm demonstrates comparable results with the most accurate algorithm, ALS, whilst being computationally much more efficient. Specifically, on small and moderate datasets, our approach is tens to hundreds of times faster than ALS, while for large-scale datasets, the speedup can be more than 3,000 times. Compared to other state-of-the-art online approaches, our method shows not only significantly better decomposition quality, but also better performance in terms of stability, efficiency and scalability. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 375 KDD2016 video
KDD2016 paper 567
 
03:00
Title: Graph Wavelets via Sparse Cuts Authors: Arlei Lopes da Silva*, University of California, Santa Barbara Xuan-Hong Dang, University of California, Santa Barbara Prithwish Basu, Raytheon BBN Technologies Ambuj Singh, University of California, Santa Barbara Ananthram Swami, Army Lab Abstract: Modeling information that resides on vertices of large graphs is a key problem in several real-life applications, ranging from social networks to the Internet-of-things. Signal Processing on Graphs and, in particular, graph wavelets can exploit the intrinsic smoothness of these datasets in order to represent them in a both compact and accurate manner. However, how to discover wavelet bases that capture the geometry of the data with respect to the signal as well as the graph structure remains an open question. In this paper, we study the problem of computing graph wavelet bases via sparse cuts in order to produce low-dimensional encodings of data-driven bases. This problem is connected to known hard problems in graph theory (e.g. multiway cuts) and thus requires an efficient heuristic. We formulate the basis discovery task as a relaxation of a vector optimization problem, which leads to an elegant solution as a regularized eigenvalue computation. Moreover, we propose several strategies in order to scale our algorithm to large graphs. Experimental results show that the proposed algorithm can effectively encode both the graph structure and signal, producing compressed and accurate representations for vertex values in a wide range of datasets (e.g. sensor and gene networks) and outperforming the best baseline by up to 8 times. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 734 KDD2016 video
KDD2016 paper 511
 
02:05
Title: Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta Authors: Michael Madaio*, Carnegie Mellon University Shang-Tse Chen, Georgia Tech Oliver L. Haimson, University of California, Irvine Wenwen Zhang, Georgia Tech Xiang Cheng, Emory University Matthew Hinds-Aldrich, Atlanta Fire Rescue Dept. Duen Horng Chau, Georgia Tech Bistra Dilkina, Georgia Tech Abstract: The Atlanta Fire Rescue Department (AFRD), like many municipal fire departments, actively works to reduce fire risk by inspecting commercial properties for potential hazards and fire code violations. However, AFRD’s fire inspection practices relied on tradition and intuition, with no existing data-driven process for prioritizing fire inspections or identifying new properties requiring inspection. In collaboration with AFRD, we developed the Firebird framework to help municipal fire departments identify and prioritize commercial property fire inspections, using machine learning, geocoding, and information visualization. Firebird computes fire risk scores for over 5,000 buildings in the city, with true positive rates of up to 71% in predicting fires. It has identified 6,096 new potential commercial properties to inspect, based on AFRD’s criteria for inspection. Furthermore, through an interactive map, Firebird integrates and visualizes fire incidents, property information and risk scores to help AFRD make informed decisions about fire inspections. Firebird has already begun to make positive impact at both local and national levels. It is improving AFRD’s inspection processes and Atlanta residents’ safety, and was highlighted by National Fire Protection Association (NFPA) as a best practice for using data to inform fire inspections. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 1400 KDD2016 video
Algorithmic Bias: From Discrimination Discovery to Fairness-Aware Data Mining (Part 1)
 
35:12
Authors: Carlos Castillo, EURECAT, Technology Centre of Catalonia Francesco Bonchi, ISI Foundation Abstract: Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily lives lives (offline and online), as they have become essential tools in personal finance, health care, hiring, housing, education, and policies. It is therefore of societal and ethical importance to ask whether these algorithms can be discriminative on grounds such as gender, ethnicity, or health status. It turns out that the answer is positive: for instance, recent studies in the context of online advertising show that ads for high-income jobs are presented to men much more often than to women [Datta et al., 2015]; and ads for arrest records are significantly more likely to show up on searches for distinctively black names [Sweeney, 2013]. This algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data. These considerations call for the development of data mining systems which are discrimination-conscious by-design. This is a novel and challenging research area for the data mining community. The aim of this tutorial is to survey algorithmic bias, presenting its most common variants, with an emphasis on the algorithmic techniques and key ideas developed to derive efficient solutions. The tutorial covers two main complementary approaches: algorithms for discrimination discovery and discrimination prevention by means of fairness-aware data mining. We conclude by summarizing promising paths for future research. More on http://www.kdd.org/kdd2016/ KDD2016 conference is published on http://videolectures.net/
Views: 1737 KDD2016 video
Disaggregating Appliance-Level Energy Consumption: A Probabilistic Framework
 
09:15
Author: Sabina Tomkins, Jack Baskin School of Engineering, University of California Santa Cruz Abstract: In this work we propose a probabilistic disaggregation framework which can determine the energy consumption of individual electrical appliances from aggregate power readings. Our proposed framework uses probabilistic soft logic (PSL), to define a hinge-loss Markov random field (HL-MRF). Our method is novel in that it can integrate a diverse range of features, is highly scalable to any number of appliances, and makes less assumptions than existing methods. As the residential sector is responsible for over a third of all electricity demand, and delivering appliance level energy consumption information to consumers has been demonstrated to reduce electricity consumption, our framework has the potential to make a significant impact on energy savings. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 361 KDD2016 video
Ranking Relevance in Yahoo Search
 
23:47
Author: Dawei Yin, Yahoo! Inc. Abstract: Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search engine has gone far beyond text matching, and now involves tremendous challenges. The semantic gap between queries and URLs is the main barrier for improving base relevance. Clicks help provide hints to improve relevance, but unfortunately for most tail queries, the click information is too sparse, noisy, or missing entirely. For comprehensive relevance, the recency and location sensitivity of results is also critical. In this paper, we give an overview of the solutions for relevance in the Yahoo search engine. We introduce three key techniques for base relevance – ranking functions, semantic matching features and query rewriting. We also describe solutions for recency sensitive relevance and location sensitive relevance. This work builds upon 20 years of existing efforts on Yahoo search, summarizes the most recent advances and provides a series of practical relevance solutions. The reported performance is based on Yahoo’s commercial search engine, where tens of billions of URLs are indexed and served by the ranking system. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1017 KDD2016 video
Dealing with Class Imbalance using Thresholding
 
18:40
Author: Rumi Ghosh, Robert Bosch LLC. Abstract: We propose thresholding as an approach to deal with class imbalance. We define the concept of thresholding as a process of determining a decision boundary in the presence of a tunable parameter. The threshold is the maximum value of this tunable parameter where the conditions of a certain decision are satisfied. We show that thresholding is applicable not only for linear classifiers but also for non-linear classifiers. We show that this is the implicit assumption for many approaches to deal with class imbalance in linear classifiers. We then extend this paradigm beyond linear classification and show how non-linear classification can be dealt with under this umbrella framework of thresholding. The proposed method can be used for outlier detection in many real-life scenarios like in manufacturing. In advanced manufacturing units, where the manufacturing process has matured over time, the number of instances (or parts) of the product that need to be rejected (based on a strict regime of quality tests) becomes relatively rare and are defined as outliers. How to detect these rare parts or outliers beforehand? How to detect combination of conditions leading to these outliers? These are the questions motivating our research. This paper focuses on prediction of outliers and conditions leading to outliers using classification. We address the problem of outlier detection using classification. The classes are good parts (those passing the quality tests) and bad parts (those failing the quality tests and can be considered as outliers). The rarity of outliers transforms this problem into a class-imbalanced classification problem. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 2240 KDD2016 video
KDD2016 paper 943
 
02:37
Title: Evaluating Mobile App Release Authors: Ya Xu*, LinkedIn Corporation Nanyu Chen, LinkedIn Corporation Abstract: We have seen an explosive growth of mobile usage, particularly on mobile apps. It is more important than ever to be able to properly evaluate mobile app release. A/B testing is a standard framework to evaluate new ideas. We have seen much of its applications in the online world across the industry [9,10,12]. Running A/B tests on mobile apps turns out to be quite different, and much of it is attributed to the fact that we cannot ship code easily to mobile apps other than going through a lengthy build, review and release process. Mobile infrastructure and user behavior differences also contribute to how A/B tests are conducted differently on mobile apps, which will be discussed in details in this paper. In addition to measuring features individually in the new app version through randomized A/B tests, we have a unique opportunity to evaluate the mobile app as a whole using the quasi-experimental framework [21]. Not all features can be A/B tested due to infrastructure changes and wholistic product redesign. We propose and establish quasi-experiment techniques for measuring impact from mobile app release, with results shared from a recent major app launch at LinkedIn. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 974 KDD2016 video
Fast and Accurate Kmeans Clustering with Outliers
 
18:45
Author: Shalmoli Gupta, Department of Computer Science, University of Illinois at Urbana-Champaign More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1602 KDD2016 video
KDD2016 paper 1081
 
03:10
Title: Recurrent Marked Temporal Point Process Authors: Nan Du*, Georgia Tech Hanjun Dai, Max Plank Institute Rakshit Trivedi, Max Plank Institute Utkarsh Upadhyay, Max Plank Institute Manuel Gomez-Rodriguez, MPI-SWS Le Song, MPI-SWS Abstract: Large volumes of event data are becoming increasingly available in a wide variety of applications, such as healthcare analytics, smart city and social network analysis. The precise time interval or the exact distance between two events carries a great deal of information about the dynamics of the underlying systems. These characteristics make such data fundamentally different from independently and identically distributed data and time-series data where time and space are treated as indices rather than random variables. Marked temporal point processes and intensity functions are the mathematical framework for modeling such event data. However, typical point process models, such as Hawkes processes, continuous Markov chains, autoregressive conditional duration processes, are making strong assumptions about the generative processes of event data which may or may not reflect reality, and the parametric assumptions have also restricted the expressive power of temporal point process models. Can we obtain a more expressive model of marked temporal point processes? How can we learn such a model from massive data? In this paper, we propose a novel point process, referred to as the Recurrent Temporal Point Process (RTPP), to simultaneously model the event timings and markers. The key idea of our approach is to view the intensity function of a temporal point process as a nonlinear function of the history, and parameterize the function using a recurrent neural network. We develop an efficient stochastic gradient algorithm for learning RTPP which can readily scale up to millions of events. Using both synthetic and real world datasets, we show that, in the case that the true models are parametric models, RTPP can learn the dynamics of such models without knowing the actual parametric forms; and in the case that the true models are unknown, RTPP can also learn the dynamics, and achieve better predictive performance than other parametric alternatives based on prior knowledge. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 1038 KDD2016 video
Graphons and Machine Learning: Modeling and Estimation of Sparse Massive Networks
 
01:12:17
Author: Jennifer Chayes, Microsoft Research Abstract: There are numerous examples of sparse massive networks, in particular the Internet, WWW and online social networks. How do we model and learn these networks? In contrast to conventional learning problems, where we have many independent samples, it is often the case for these networks that we can get only one independent sample. How do we use a single snapshot today to learn a model for the network, and therefore be able to predict a similar, but larger network in the future? In the case of relatively small or moderately sized networks, it’s appropriate to model the network parametrically, and attempt to learn these parameters. For massive networks, a non-parametric representation is more appropriate. In this talk, we first review the theory of graphons, developed over the last decade to describe limits of dense graphs, and the more the recent theory describing sparse graphs of unbounded average degree, including power-law graphs. We then show how to use these graphons as non-parametric models for sparse networks. Finally, we show how to get consistent estimators of these non-parametric models, and moreover how to do this in a way that protects the privacy of individuals on the network. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 2841 KDD2016 video
Computational Social Science: Exciting Progress and Future Challenges
 
39:37
Author: Duncan Watts, Microsoft Research Abstract: The past 15 years have witnessed a remarkable increase in both the scale and scope of social and behavioral data available to researchers, leading some to herald the emergence of a new field: “computational social science.” Against these exciting developments stands a stubborn fact: that in spite of many thousands of published papers, there has been surprisingly little progress on the “big” questions that motivated the field in the first place—questions concerning systemic risk in financial systems, problem solving in complex organizations, and the dynamics of epidemics or social movements, among others. In this talk I highlight some examples of research that would not have been possible just a handful of years ago and that illustrate the promise of CSS. At the same time, they illustrate its limitations. I then conclude with some thoughts on how CSS can bridge the gap between its current state and its potential. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1408 KDD2016 video
Decoding Fashion Contexts Using Word Embeddings
 
24:23
Author: Deepak Warrier, Myntra Designs Private Ltd. Abstract: Personalisation in e-commerce hinges on dynamically uncovering the user’s context via his/her interactions on the portal. The harder the context identification, lesser is the effectiveness of personalisation. Our work attempts to uncover and understand the user’s context to effectively render personalisation for fashion ecommerce. We highlight fashion-domain specific gaps with typical implementations of personalised recommendation systems and present an alternate approach. Our approach hinges on user sessions (clickstream) as a proxy to the context and explores “session vector” as an atomic unit for personalization. The approach to learn context vector incorporates both the fashion product (style) attributes and the users’ browsing signals. We establish various possible user contexts (product clusters) and a style can have a fuzzy membership into multiple contexts. We predict the user’s context using the skip-gram model with negative sampling introduced by Mikolov et al [1]. We are able to decode the context with a high accuracy even for non-coherent sessions. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 533 KDD2016 video
MXNet (Part 2)
 
59:45
Authors: Mu Li, Computer Science Department, Carnegie Mellon University Tianqi Chen, Department of Computer Science and Engineering, University of Washington Abstract: This hands-on tutorial will work through the pipeline of developing, training and deploying deep learning applications by using MXNet. Multiple applications including recommendation, word embedding will be covered. The participants will learn how to write a deep learning program in a few lines of codes in their favorite language such as Python, Scala, and R and train it on one or multiple GPUs. They will also learn how to deploy a deep learning application in the cloud or in the mobile phones. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 656 KDD2016 video
Node Representation in Mining Heterogeneous Information Networks
 
34:26
Author: Yizhou Sun, Computer Science Department, University of California, Los Angeles, UCLA Abstract: One of the challenges in mining information networks is the lack of intrinsic metric in representing nodes into a low dimensional space, which is essential in many mining tasks, such as recommendation and anomaly detection. Moreover, when coming to heterogeneous information networks, where nodes belong to different types and links represent different semantic meanings, it is even more challenging to represent nodes properly. In this talk, we will focus on two mining tasks, i.e., (1) content-based recommendation and (2) anomaly detection in heterogeneous categorical events, and introduce (1) how to represent nodes when different types of nodes and links are involved; and (2) how heterogeneous links play different roles in these tasks. Our results have demonstrated the superiority as well as the interpretability of these new methodologies. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1991 KDD2016 video
Deep Learning for Financial Sentiment Analysis
 
14:44
Author: Sahar Sohangir, Florida Atlantic University More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1085 KDD2016 video
KDD2016 paper 801
 
02:32
Title: Robust Large-Scale Machine Learning in the Cloud Authors: Steffen Rendle*, Google, Inc. Dennis Fetterly, Google, Inc. Eugene Shekita, Google, Inc. Bor-Yiing Su, Google, Inc. Abstract: The convergence behavior of many distributed machine learning (ML) algorithms can be sensitive to the number of machines being used or to changes in the computing environment. As a result, scaling to a large number of machines can be challenging. In this paper, we describe a new scalable coordinate descent (SCD) algorithm for generalized linear models whose convergence behavior is always the same, regardless of how much SCD is scaled out and regardless of the computing environment. This makes SCD highly robust and enables it to scale to massive datasets on low-cost commodity servers. Experimental results on a real advertising dataset in Google are used to demonstrate SCD’s cost effectiveness and scalability. Using Google’s internal cloud, we show that SCD can provide near linear scaling using thousands of cores for 1 trillion training examples on a petabyte of compressed data. This represents 10,000x more training examples than the ‘large-scale’ Netflix prize dataset. We also show that SCD can learn a model for 20 billion training examples in two hours for about $10. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 547 KDD2016 video
The Wisdom of Crowds: Best Practices for Data Prep & Machine Learning
 
38:21
Author: Ingo Mierswa, Rapid-I GmbH Abstract: With hundreds of thousands of users, RapidMiner is the most frequently used visual workflow platform for machine learning. It covers the full spectrum of analytics from data preparation to machine learning and model validation. In this presentation, I will take you on a tour of machine learning which spans the last 15 years of research and industry applications and share key insights with you about how data scientists perform their daily analysis tasks. These patterns are extracted from mining millions of analytical workflows that have been created with RapidMiner over the past years. This talk will address important questions around the data mining process such as: What are the most frequently used solutions for typical data quality problems? How often are analysts using decision trees or neural networks? And does this behavior change over time or depend on the users experience level? More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 786 KDD2016 video
KDD2016 paper 12
 
02:36
Title: Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data Authors: Payam Siyari*, Georgia Institute of Technology Bistra Dilkina, Georgia Institute of Technology\ Constantine Dovrolis, Georgia Institute of Technology Abstract: Data represented as strings abounds in biology, linguistics, document mining, web search and many other fields. Such data often have a hierarchical structure, either because they were artificially designed and composed in a hierarchical manner or because there is an underlying evolutionary process that creates repeatedly more complex strings from simpler substrings. We propose a framework, referred to as “Lexis”, that produces an optimized hierarchical representation of a given set of “target” strings. The resulting hierarchy, “Lexis-DAG”, shows how to construct each target through the concatenation of intermediate substrings, minimizing the total number of such concatenations or DAG edges. The Lexis optimization problem is related to the smallest grammar problem. After we prove its NP-Hardness for two cost formulations, we propose an efficient greedy algorithm for the construction of Lexis-DAGs. We also consider the problem of identifying the set of intermediate nodes (substrings) that collectively form the “core” of a Lexis-DAG, which is important in the analysis of Lexis-DAGs. We show that the Lexis framework can be applied in diverse applications such as optimized synthesis of DNA fragments in genomic libraries, hierarchical structure discovery in protein sequences, dictionary-based text compression, and feature extraction from a set of documents. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 2070 KDD2016 video
KDD2016 paper 1146
 
02:36
Title: Approximate Personalized PageRank on Dynamic Graphs Authors: Hongyang Zhang*, Stanford University Peter Lofgren, Stanford University Abstract: We propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One, Forward Push, propagates probability mass forwards along edges from a source node, while the other, Reverse Push, propagates local changes backwards along edges from a target. In both variations, we maintain an invariant between two vectors, and when an edge is updated, our algorithm first modifies the vectors to restore the invariant, then performs any needed local push operations to restore accuracy. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 489 KDD2016 video
Opportunities and Challenges for Remote Sensing in Agricultural Applications of Data Science
 
45:38
Author: Melba Crawford, College of Engineering, Purdue University Abstract: Increases in global population, coupled with challenges of climate change require development of technologies to support increased food production throughout the entire supply chain – from plant breeding to delivery of agricultural products. Developments in remote sensing from space-based, airborne, and proximal sensing platforms, coupled with advanced capabilities in computational platforms and data analytics, are providing new opportunities for contributing solutions to address grand challenges related to food, energy, and water. Spaceborne platforms carrying new active and passive sensors are moving from complex, multi-purpose missions to lower cost, measurement specific constellations of small satellites. Advances in materials are leading to miniaturization and mass production of sensors and supporting instrumentation, resulting in advanced sensing from affordable autonomous vehicles. New algorithms to exploit the massive, multi-modality data sets and provide actionable information for agricultural applications from phenotyping to crop mapping and monitoring are being developed. An overview of recent contributions, as well opportunities and challenges for data science in analysis of multi-temporal, multi-scale multi-sensor remotely sensed data will be presented. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1241 KDD2016 video
Gameplay First: Data Science at Blizzard Entertainment
 
39:07
Author: Chaitanya Chemudugunta, Blizzard Entertainment Inc. Abstract: With a focus on gameplay first, Blizzard Entertainment is known for developing premium games like World of Warcraft, Starcraft, Diablo, Hearthstone, Heroes of the Storm and Overwatch. Tens of millions of players login daily and interact with a variety of game features generating massive amounts of rich and diverse data streams. In this talk, I will provide a general overview of data science challenges at Blizzard and discuss two challenges in the area of game design. Specifically, I will discuss challenges and solutions for matchmaking in competitive games; and discuss how gameplay and player segmentation can be used to inform game balance. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 651 KDD2016 video
KDD2016 paper 585
 
02:55
Title: Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Authors: Kai Ming Ting*, Federation University Ye Zhu, Monash University Mark Carman, Monash University Yue Zhu, Nanjing University Zhi-Hua Zhou, Nanjing University Abstract: This paper introduces the first generic version of data dependent dissimilarity and shows that it provides a better closest match than distance measures for three existing algorithms in clustering, anomaly detection and multi-label classification. For each algorithm, we show that by simply replacing the distance measure with the data dependent dissimilarity measure, it overcomes a key weakness of the otherwise unchanged algorithm. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 592 KDD2016 video
A Bayesian Network approach to County-Level Corn Yield Prediction
 
14:58
Author: Vikas Chawla, Department of Computer Science, Iowa State University More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 319 KDD2016 video
CityBES: A Web-based Platform to Support City-Scale Building Energy Efficiency
 
17:26
Author: Tianzhen Hong, Lawrence Berkeley National Laboratory Abstract: Buildings in cities consume 30 to 70% of the cities’ total primary energy. Retrofitting the existing building stock to improve energy efficiency and reduce energy use is a key strategy for cities to reduce green-house-gas emissions and mitigate climate change. Planning and evaluating retrofit strategies for buildings requires a deep understanding of the physical characteristics, operating patterns, and energy use of the building stock. This is a challenge for city managers as data and tools are limited and disparate. This paper introduces a web-based data and computing platform, City Building Energy Saver (CityBES), which focuses on energy modeling and analysis of a city’s building stock to support district or city-scale efficiency programs. CityBES uses an international open data standard, CityGML, to represent and exchange 3D city models. CityBES employs EnergyPlus to simulate building energy use and savings from energy efficient retrofits. CityBES provides a suite of features for urban planners, city energy managers, building owners, utilities, energy consultants and researchers. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 273 KDD2016 video
A Framework of Combining Deep Learning and Survival Analysis for Asset Health Management
 
14:32
Author: Linxia Liao, General Electric Company Abstract: We propose a method to integrate feature extraction and prediction as a single optimization task by stacking a threelayer model as a deep learning structure. The first layer of the deep structure is a Long Short Term Memory (LSTM) model which deals with the sequential input data from a group of assets. The output of the LSTM model is followed by mean-pooling, and the result is fed to the second layer. The second layer is a neural network layer, which further learns the feature representation. The output of the second layer is connected to a survival model as the third layer for predicting asset health condition. The parameters of the three-layer model are optimized together via stochastic gradient decent. The proposed method was tested on a small dataset collected from a fleet of mining haul trucks. The model resulted in the “individualized” failure probability representation for assessing the health condition of each individual asset, which well separates the in-service and failed trucks. The proposed method was also tested on a large open source hard drive dataset, and it showed promising result. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 795 KDD2016 video
Large Scale Machine Learning at Verizon: Theory and Applications
 
34:56
Author: Jeff Stribling, Verizon Communications Abstract: This talk will cover recent innovations in large-scale machine learning and their applications on massive, real-world data sets at Verizon. These applications power new revenue generating products and services for the company and are hosted on a massive computing and storage platform known as Orion. We will discuss the architecture of Orion and the underlying algorithmic framework. We will also cover some of the real-world aspects of building a new organization dedicated to creating new product lines based on data science. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 1131 KDD2016 video
Towards Conversational Recommender Systems
 
23:28
Author: Konstantina Christakopoulou, Department of Computer Science and Engineering, University of Minnesota Abstract: People often ask others for restaurant recommendations as a way to discover new dining experiences. This makes restaurant recommendation an exciting scenario for recommender systems and has led to substantial research in this area. However, most such systems behave very differently from a human when asked for a recommendation. The goal of this paper is to begin to reduce this gap. In particular, humans can quickly establish preferences when asked to make a recommendation for someone they do not know. We address this cold-start recommendation problem in an online learning setting. We develop a preference elicitation framework to identify which questions to ask a new user to quickly learn their preferences. Taking advantage of latent structure in the recommendation space using a probabilistic latent factor model, our experiments with both synthetic and real world data compare different types of feedback and question selection strategies. We find that our framework can make very effective use of online user feedback, improving personalized recommendations over a static model by 25% after asking only 2 questions. Our results demonstrate dramatic benefits of starting from offline embeddings, and highlight the benefit of bandit-based explore-exploit strategies in this setting. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 692 KDD2016 video
Making fashion recommendations with human-in-the-loop machine learning
 
20:53
Author: Brad Klingenberg, Stitch Fix, Inc. Abstract: Most recommendation algorithms produce results without human intervention. Especially in hard-to-quantify domains like fashion combining algorithms with expert human curation can make recommendations more effective. But it can also complicate traditional approaches to training and evaluating algorithms. In this talk I will share lessons from making personalized fashion recommendations with humans in the loop at Stitch Fix, where we commit to our recommendations through the physical delivery of merchandise to clients. More on http://www.kdd.org/kdd2016/ KDD2016 Conference is published on http://videolectures.net/
Views: 634 KDD2016 video
KDD2016 paper 427
 
02:12
Title: Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test Authors: Denis Dos Reis*, Universidade de São Paulo Gustavo Batista, Universidade de Sao Paulo at Sao Carlos Peter Flach, University of Bristol Stan Matwin, Dalhousie University Abstract: Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the fly, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classification research focuses on proposing efficient models that can adapt to concept drift and maintain a stable performance over time. However, specifically for the classification task, the majority of such methods rely on the instantaneous availability of true labels for all already classified instances. This is a strong assumption that is rarely fulfilled in practical applications. Hence there is a clear need for efficient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that tests whether two samples differ. This work has two main contributions. The first one is an Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where a change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to perform the insertion and removal operations in O(log N) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a significant speed-up compared to the O(N log N) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classification algorithms adapted to use the test rely on a limited portion of those labels just to update the classification model after a concept drift is detected. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 223 KDD2016 video

Tcp retransmission windows xp
Chammas windows live messenger
City quiet windows ny
Windows sand clock icon free
P990 apx windows 8