Applied Predictive Modeling by Max Kuhn | (PDF) Free Download

119

 

Ebook Info

  • Published: 2013
  • Number of pages: 613 pages
  • Format: PDF
  • File Size: 12.89 MB
  • Authors: Max Kuhn

Description

Winner of the 2014 Technometrics Ziegel Prize for Outstanding BookApplied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance―all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code for each step of the process. The data sets and corresponding code are available in the book’s companion AppliedPredictiveModeling R package, which is freely available on the CRAN archive. This multi-purpose text can be used as an introduction to predictive models and the overall modeling process, a practitioner’s reference handbook, or as a text for advanced undergraduate or graduate level predictive modeling courses. To that end, each chapter contains problem sets to help solidify the covered concepts and uses data available in the book’s R package. Readers and students interested in implementing the methods should have some basic knowledge of R. And a handful of the more advanced topics require some mathematical knowledge.

User’s Reviews

Reviews from Amazon users which were colected at the time this book was published on the website:

⭐tl;dr: A brilliant book covering Predictive modelling in R. With a strong practical bent it walks the reader through the application of modern classification and regression techniques to a broad number of varied and interesting data sets. It uses existing packages where possible so you can jump straight in (great for Kagglers) but there is a lot here to master. It is especially strong on preprocessing (both unsupervised and supervised), model tuning and model assessment. Should not be your first book on R or data analytics but the best balance of Practical application without foregoing theory that I have seen. It is wonderful to see how professional data analysts approach predictive modelling tasks. The data sets are not toy models to highlight approaches but interesting and complex problems from a wide variety of disciplines.(Note that this book does not cover Time Series, Generalised Additive Models and Ensemble’s of different models).Review:Data science has become very popular due to the increase in computing power (including things like AWS), the amount of data that is accessible on the internet and a number of open-source tools (R and Python for example) that allow even relative beginners to complete quite sophisticated models. Coursera allows for one to complete courses on Machine Learning for free and sites like Kaggle have even turned it into something of a sport where people compete to create predictive models for money or even job interviews. Part of the excitement is that Predictive models can be applied to almost any field you can think of.Given the easy access to predict things using sophisticated techniques, the number of books on machine learning, data mining and predictive data analytics has grown to fill the demand of people looking to learn about the field. As data science is itself a combination of many different disciplines (statistics, computer science, artificial intelligence etc) there are many different points of entry. For this reason books can often be placed on a spectrum from straightforward examples of already constructed programs to theoretical textbooks with lots of mathematical background and constructing approaches from scratch. “Applied Predictive Modeling” tries to find a middle ground between these two approaches though it unashamedly sides with the practical. In contrast to many other works though, it utilises existing packages (notably caret) rather than having the reader construct the approaches themselves in code.Applied Predictive Modeling contains 20 Chapters set out to be quasi-independent whilst still being a coherent book. An abstract opens each chapter followed by sections discussing the approaches used. The writing is excellent, very easy to follow and wonderfully informative with an excellent choice of example data sets. The discussions are not afraid to highlight the problems of different approaches – in one of the latter chapters noise is deliberately added to a data set so the differing impact can be seen on a range of models. Theory is discussed insofar as it is useful for understanding the use of certain approaches and references to further reading are clearly given. The chapters conclude with a summary before containing a computation section which contains all of the R code used for the chapters with some discussion where important. Finally most chapters have a section containing exercises. Usefully these exercises use different data sets so are not merely regurgitation of what one has just read. The chapters also have independent Bibliographies which is a little annoying when reading the book cover to cover, but makes it excellent as a reference book.After a few chapters of overview the chapters largely work through the components in the process of Data Analytics; data-splitting, pre-processing chapters cover transforming, centering, dealing with missing values and setting up the data for the application of models. The next section of the book covers Regression models. It utilises a Pharmaceutical dataset and works through the creation of models of increasing complexity. A chapter then works through an example of concrete strength prediction based on ingredients to show clearly how regression applications work end to end. A number of chapters then look at classification algorithms using the construction of a data model from a kaggle competition from late 2010 on University Grants. This highlights what this book offers that I have not seen in other comparable books – real life examples on the steps a professional analyst takes in the construction of a model. The reader is almost always watching the construction of a real model throughout the discussion of the differing approaches. The book does discuss theory where it is useful. But rather then going into the miniature of constructing things directly in code to highlight the underlying structure, existing packages are used where possible. This lowers the barrier to getting started on using the techniques. Finally the book is rounded out with chapters on model tuning, detecting variable importance, how to handle class imbalances and some broader issues in modelling all again using real data sets from different fields.The authors have created an R package for the book containing the code and data sets used and an excellent website and blog. The book ranges broadly across disciplines and includes separate data sets for the exercises, in all I count 21 data sets ranging from concrete strength to caravan insurance that are either covered in the book or are given as exercises in the chapters.In short I congratulate the authors on an excellent book that I look forward to working through in depth over the coming months. If you are looking to improve your predictive modelling and are short of professional standard, this is the book you are looking for. Whilst there are loads to learn and master – you can jump in and use things from the book very quickly thanks to its use of impressive packages. One area I would love to see added to future editions would be the ensembling of different models.

⭐There are many fine math-oriented predictive modeling books, such as Hastie (

⭐). Kuhn et al consider them “sister texts” and begin immediately to differentiate– their approach is hands on and practical, for the express purpose of demonstrating HOW to sort, structure and predict via Python or R, for the purpose of accuracy and understanding of the DATA and trends, NOT learning the underlying math.For a couple of pharmaceutical guys, (who BTW use R extensively, I’ve been an analyst in that industry), you’d think the examples would be new chemical or biological entities. Not so! The cases are fun and exciting, ranging from the nontrivial compression strength of concrete (want that bridge to hold when you cross?) to fuel economy, credit scoring, success in grant applications (boy their colleagues will love that one!), and cognitive impairment. I evaluate technology for patents at payroy dot com, and we have a log likelihood model using Bayesian and Monte Carlo that their grant section helped translate seamlessly to R! We’re NOT talking pie in the sky pseudo code here, but real life, real results recipes.The authors talk about the “scholarly veil” — meaning we general workers and researchers don’t always “deserve” to see the underlying process, software and data (and, other than open source, often can’t afford it). Wow, do they pop that myth! These authors are relentless in giving every detail, from design and binning to sorting and stacking to ANOVA, regressions, trees, error methods– the whole ball of wax with live data and live R coding– all on a shoestring budget! I guarantee you can start with basic stats and run a very well designed predictive model with the methods they detail, without having to pop for SAP/ IBM or SPSS.One caveat– even though they don’t assume advanced partial differential equations or even probability theory, the R code and methods are at a fast clip. I’d say they are assuming you either have, or will fill in, with R basics and practice or experience. This is NOT a “how to use R” manual, even though it is in a sense– it is a “how to apply R correctly and robustly in a way that will pass a juried look at your methods and conclusions.” Again, REAL WORLD. For comparison, I’d put the math at advanced undergrad and the R at grad level/ professional practice levels. This will make the title excellent both for learning and professional reference. At this writing, the book is hard to find, and being marked up by resellers– a tribute to its value and demand right out of the gate.Springer is never cheap, but also never shabby– the book is typically gorgeous, well edited, combed for errors (the code ran fine on my antique R download– even though it’s free, I’m hesitant to have to learn a new version!), and pedagogically awesome if you’re considering this for a class. We recommend books for our library purchasers and of the 25 actively screened in this category (including a focus on prediction, not just data mining), this is in the top three with Hastie above! Highly recommended for research, augmentation, reference, as well as deep study. Lots of insights, too, about where big data, ML, mining and prediction are now and where they are going– predicting prediction’s future.Library Picks reviews only for the benefit of Amazon shoppers and has nothing to do with Amazon, the authors, manufacturers or publishers of the items we review. We always buy the items we review for the sake of objectivity, and although we search for gems, are not shy about trashing an item if it’s a waste of time or money for Amazon shoppers. If the reviewer identifies herself, her job or her field, it is only as a point of reference to help you gauge the background and any biases.

⭐I cannot recommend this book enough; it is outstanding. I honestly wish someone had handed it to me about 2 years ago and told me to read it cover to cover. I am very new to the world of predictive modeling, but I have a decent maths/stats background. I have found the book easy to pick up, but I do think it is a book that needs to be read from start to finish as a lot of the later chapters often referred back to previous examples in earlier chapters. For example, I was keen to just read about classification, but there were many references to earlier chapters on regression. Given this, I went back to read it from the start. So glad I did this as the data pre processing chapter alone is extremely helpful. The types of examples used are also good and well explained (e.g. health, insurance). The examples used in the book are by no means basic (i .e. a lot of pre procesing is required to handle things like missing values) – I am glad the writer has done this because this is reflective of what real data is like…i get annoyed when books often present you with a neat/tidy dataset without telling you how they got to that! I also like the fact it goes into a lot more depth about REAL issues you will face during model building e.g. class imbalance – this is not something I have found is covered well in any other book…and let’s be honest, rooting through articles and scholarly papers on the Internet is not ideal if you’re pushed for time! The style of writing is also excellent; it’s formal, but honestly, I prefer that to a more informal style of writing. There is enough R code at the end of each chapter if you want to do the book’s exercises or apply the techniques to your own work. I use this a lot with ISLR and I think they both compliment one another, although, this book covers more advanced stuff. If you are new to predictive modeling, then I would recommend getting both and using them together.I was reluctant to buy this book, but it has been so worth it- I know I will refer to it in years to come!

⭐This books fills a useful gap between the basic “cookbooks” and the more advanced theoretical textbooks. The authors do a good job in taking you through realistic case studies to show the issues involved with data analysis. It definitely helped me get a good feel for both how to apply a range of models and how to use a range of R packages. The book is not perfect and some of the data pre-processing work was too tedious for my liking. It helped to have the extended code handy that is included in the package as it is not the same as that published in the book. The latter has some gaps once a while.The book pairs well with the Elements of Statistical Learning (which is what the publisher probably attempted) as it addresses similar methods.I hope that someone writes a similar book but focused on Bayesian machine learning methods.

⭐I dip in and out of this book regularly when solving problems with machine learning. The techniques, and especially their motivations, are very well described. I find the mathematics is at a level I can comfortably comprehend. The early section on preparing data is really good. That’s 80% of the effort involved in getting great regression or classification results.I also have the elements of statistical learning, which I love, but I’m not yet fluent enough in some of the maths for that book to read as easily as this one.

⭐Applied Predictive Modeling by Max Kuhn and Kjell Johnson is a complete examination of essential machine learning models with a clear focus on making numeric or factorial predictions. On nearly 600 pages, the Authors discuss all topics from data engineering, modeling, and performance evaluation.The core of Applied Predictive Modeling consists of four distinct chapters:1. General Strategies on how to manipulate and re-sample data.2. Regression Models for making numeric predictions.3. Classification Models for making factor predictions.4. Other Considerations concerning model quality.Overall, Applied Predictive Modeling is a very informative course on machine learning. It assumes some prior knowledge and might be difficult to access for someone without any knowledge, despite leaving out unnecessary equations (Introduction to Statistical Learning by Robert Tibshirani and Trevor Hastie would be a good read before starting this book.). Some of the book’s examples are taken from the field of medicine and pharmaceuticals which make them hard to understand for people outside of the realm of the health sciences.However, the book does a very good job at making machine learning in R much more systematic. It clearly shows the advantages of using the caret package (written by the book’s author) and how to evaluate and tune your model’s performance.If you are not entirely new to data science, this book will yield a high return for you. It makes your process of training a model more straightforward and thorough.

⭐Can’t fault the content as a very good introduction to modelling, suitable for advanced undergraduate-level teaching. However, like many of these Springer texts (the series with the yellow covers) the binding is poor quality and won’t last through one subject-worth of use let alone constant reference. Hard to see where your money goes on these. And it’s not like they don’t know how to make a good book – e.g. Bishop’s Pattern Recognition and Machine Learning also published by Springer is really solidly bound.

Keywords

Free Download Applied Predictive Modeling in PDF format
Applied Predictive Modeling PDF Free Download
Download Applied Predictive Modeling 2013 PDF Free
Applied Predictive Modeling 2013 PDF Free Download
Download Applied Predictive Modeling PDF
Free Download Ebook Applied Predictive Modeling

Previous articleA Modern Introduction to Probability and Statistics: Understanding Why and How (Springer Texts in Statistics) by F.M. Dekking | (PDF) Free Download
Next articleHyperbolic Conservation Laws and Related Analysis with Applications: Edinburgh, September 2011 (Springer Proceedings in Mathematics & Statistics Book 49) by Gui-Qiang G. Chen | (PDF) Free Download