Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition by Xuedong Huang (PDF)

9

 

Ebook Info

  • Published: 2001
  • Number of pages: 980 pages
  • Format: PDF
  • File Size: 10.35 MB
  • Authors: Xuedong Huang

Description

Remarkable progress is being made in spoken language processing, but many powerful techniques have remained hidden in conference proceedings and academic papers, inaccessible to most practitioners. In this book, the leaders of the Speech Technology Group at Microsoft Research share these advances — presenting not just the latest theory, but practical techniques for building commercially viable products.KEY TOPICS:Spoken Language Processing draws upon the latest advances and techniques from multiple fields: acoustics, phonology, phonetics, linguistics, semantics, pragmatics, computer science, electrical engineering, mathematics, syntax, psychology, and beyond. The book begins by presenting essential background on speech production and perception, probability and information theory, and pattern recognition. The authors demonstrate how to extract useful information from the speech signal; then present a variety of contemporary speech recognition techniques, including hidden Markov models, acoustic and language modeling, and techniques for improving resistance to environmental noise. Coverage includes decoders, search algorithms, large vocabulary speech recognition techniques, text-to-speech, spoken language dialog management, user interfaces, and interaction with non-speech interface modalities. The authors also present detailed case studies based on Microsoft’s advanced prototypes, including the Whisper speech recognizer, Whistler text-to-speech system, and MiPad handheld computer.MARKET:For anyone involved with planning, designing, building, or purchasing spoken language technology.

User’s Reviews

Editorial Reviews: From the Inside Flap PrefaceOur primary motivation in writing this book is to share our working experience to bridge the gap between the knowledge of industry gurus and newcomers to the spoken language processing community. Many powerful techniques hide in conference proceedings and academic papers for years before becoming widely recognized by the research community or the industry. We spent many years pursuing spoken language technology research at Carnegie Mellon University before we started spoken language R&D at Microsoft. We fully understand that it is by no means a small undertaking to transfer a state-of-the-art spoken language research system into a commercially viable product that can truly help people improve their productivity. Our experience in both industry and academia is reflected in the context of this book, which presents a contemporary and comprehensive description of both theoretic and practical issues in spoken language processing. This book is intended for people of diverse academic and practical backgrounds. Speech scientists, computer scientists, linguists, engineers, physicists, and psychologists all have a unique perspective on spoken language processing. This book will be useful to all of these special interest groups.Spoken language processing is a diverse subject that relies on knowledge of many levels, including acoustics, phonology, phonetics, linguistics, semantics, pragmatics, and discourse. The diverse nature of spoken language processing requires knowledge in computer science, electrical engineering, mathematics, syntax, and psychology. There are a number of excellent books on the subfields of spoken language processing, including speech recognition, text-to-speech conversion, and spoken language understanding, but there is no single book that covers both theoretical and practical aspects of these subfields and spoken language interface design. We devote many chapters systematically introducing fundamental theories needed to understand how speech recognition, text-to-speech synthesis, and spoken language understanding work. Even more important is the fact that the book highlights what works well in practice, which is invaluable if you want to build a practical speech recognizer, a practical text-to-speech synthesizer, or a practical spoken language system. Using numerous real examples in developing Microsoft’s spoken language systems, we concentrate on showing how the fundamental theories can be applied to solve real problems in spoken language processing. From the Back Cover New advances in spoken language processing: theory and practice In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface designMany case studies from state-of-the-art systems, including examples from Microsoft’s advanced research labsSpoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more:Essential background on speech production and perception, probability and information theory, and pattern recognitionExtracting information from the speech signal: useful representations and practical compression solutionsModern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognitionText-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and moreSpoken language understanding: dialog management, spoken language applications, and multimodal interfacesTo illustrate the book’s methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft’s Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you’re planning, designing, building, or purchasing spoken language technology, this is the state of the artfrom algorithms through business productivity. About the Author XUEDONG HUANG is founder and head of the Speech Technology Group at Microsoft Research. He received his Ph.D. from the University of Edinburgh. He is an IEEE Fellow.ALEX ACERO and HSIAO-WUEN HON are Senior Researchers at Microsoft Research and Senior Members of IEEE. Both received doctorates from Carnegie Mellon University.Foreword by Dr. Raj Reddy, Carnegie Mellon University Read more

Reviews from Amazon users which were colected at the time this book was published on the website:

⭐This book is a comprehensive overview of most of the major topics associated with speech processing. Divided into five main sections, the book is well structured with a clear division of concerns. The title, “Spoken Language Processing”, may be misleading to some as language processing topics only accounts for one section of the book.The first two sections cover the fundamental theories that should be understood before embarking in-depth into a study of speech processing. This may seem an obvious approach but many texts do not follow this pattern making their use as reference tomes limited. Separating background theory from its use is also useful in that it allows a rigorous approach to its description. Too often texts give a hurried imprecise overview of theories used before launching into a long and complex use of the theory; losing the reader instantly in a quagmire of formulae.The first two sections of the book deals with background material, material that the reader should at least understand the key concepts of. The first section concentrates on speech in general (including production and perception), probability and statistics, and pattern classification. These last two topics mentioned are both important parts of the book and are dealt with in their own chapters. Both are well written with the right amount of explanation and background. Much of the remainder of the book expects at least some familiarity with the material presented here. These chapters, like all chapters in the book finish with a section entitled, “Historical Perspective and Further Reading”. The inclusion of recommended further reading, in addition to the vast number of references appearing in each chapter, make the book as a whole a very good starting point for any work in speech processing.The second section concerns itself with the DSP topics which relate to speech processing. In this section the reader will find everything from FFTs to multi-rate signal processing and speech signal representations to speech coding. Again the section is well written and the reader is not forced to refer to other texts to understand what is written. If a topic is not expanded upon here then it is an indication that is not dealt further in any great depth in the remainder of the book.The third section of the book covers speech recognition and is probably the section which will find most use with many readers. This section is very thorough in its treatment of the subject. It starts immediately with a discussion of Hidden Markov Models which is almost exclusively the method employed in the pattern matching stage of speech recognition. Any algorithms that are mentioned are also detailed which really make the book useful. In fact algorithms are presented throughout the book making it a practical reference as much as a theoretical one. This is important because there is a big jump from understanding theory to being able to implement an algorithm to exploit that theory. Other topics covered include an excellent chapter on environmental robustness with one of the best discussions of microphones I have seen. Language modelling and search algorithms are given a thorough treatment. I would like to have seen more detailed information on front-end processing and endpoint detection, as this remains a critical stage of the recognition process. Perhaps the level of detail reflects the fact that this is currently a hot research topic with potential for significant advancement.Section four, on text-to-speech processing, is a good overview of the field and better than any book I’ve seen on the subject. It shows numerous block diagrams of what you need to build such a system and gives numerous algorithms in pseudocode. It also dedicates a subsection to each block of the text-to-speech system block diagram, discussing in detail what you would need to do to implement that particular block. Since much of the individual blocks have been discussed earlier in the book, it refers you back to specific earlier sections for details. The fifth section is a short one on entire systems and shows some case studies, concentrating on what Microsoft was doing at the time this book was published, since that is where the authors’ research came from. I would highly recommend that anyone anticipating getting into speech processing have a copy of this classic nearby.

⭐A good comprehensive overview of all things needed to know speech processing as of 2001. Once-in-a-decade book. Makes a great senior- or graduate-level textbook.Errata: p.38 “ax” schwa sound is FIRST syllable of “ago”, not second. High/(neutral)/low and front/(neutral)/back cannot be four binary systems, because it’s impossible to have a [+high, +low] or a [+front, +back] vowel. They are two trinary systems. You can represent them by four binary systems if you want to pervert the notation, but you should not call them that.

⭐What a wonderful book. Whether you are a computer scientist or mathematician with limited exposure to the discipline of speech processing, or alternatively you are a dedicated expert in this field, you will find everything you are looking for in this book. For two weeks, I couldn’t put this thing down. And that’s an extraordinary testimony to a book that’s 800+ pages of technical detail. If you want a high level understanding of how speech processing works, or if you want to dig in and build your own speech engine, everything you need is right here.

⭐Quick shipment, flawless textbook

⭐This is, to the best of my knowledge, still the most comprehensive textbook in speech technology, 12 years after its publication. So if you’re a student of the field, you probably want it. That is not to say it is all that good. I find the writing dull and the notation messy. Also, it is impossible to find in hardback, and the paperback is of pretty low quality. It doesn’t look like it will get an update, so I’m hoping a new standard textbook will come along.For newcomers to the field I would recommend starting with Jurafsky & Martin’s excellent “Speech and Language Processing” instead. That book covers NLP as well, so it is more superficial, but it does a much better job of explaining the basics. If you need more detail on speech technology than that provides you, then consider this one.For details, you might also want to have a look at “The Springer Handbook of Speech Processing”, preferably at a library. And Holmes and Holmes’s “Speech Synthesis and Recognition” gives a very good non-technical account of the speech technology field.

⭐A thourough and complete review about the subject, in which many disciplines (language, computer, probability, statistics, numerical analysis) converge. As a non-practitioner I have found it an enjoyable opportunity to refresh my knowledges in the field of signal processing, and a source of many hints I have been able to develop in other branches. In spite of notations and methodologies (e.g. bayesian) a bit far from I am used to, the near one thousand pages never seemed extreme related to the meaning compressed into them, spreading from base theory to advanced applications.

⭐I love this complete and practical book on speech technologies and application. It always ties in the real-world practice. While also covering most theoretic/academic results, it always points out what’s used in daily practice. This feature can help new comers identify promising directions to solve real problems. The only thing I don’t like is that it emphasizes too much work done in Microsoft Research, although this is understandable and MS is becoming power player in this arena.

⭐Beautifully written book covering almost all area of spoken language processing. However, despite of relative ease of reading, it is necessary to warn the beginning reader, that in some sections the deep enough acquaintance to their theme is veiled. It, for example, concerns questions of definition and application of delta function concept in chapter 5 (Digital Signal Processing).

Keywords

Free Download Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition in PDF format
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition PDF Free Download
Download Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition 2001 PDF Free
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition 2001 PDF Free Download
Download Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition PDF
Free Download Ebook Spoken Language Processing: A Guide to Theory, Algorithm, and System Development 1st Edition

Previous articleReadings in Speech Recognition 1st Edition by Alexander Waibel (PDF)
Next articleThe Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World (Oxford Linguistics) by J. P. Mallory (PDF)