Message-ID: <1437593590.30396.1394549617238.JavaMail.confluence@cwiki-vm> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_30395_1462351423.1394549617237" ------=_Part_30395_1462351423.1394549617237 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Can someone recommend me good books on Statistics and also on Linear Algebr= a
and Analytic Geometry which will provide enough background for understandin= g
machine learning algorithms?
The answers below focus on general background knowledge, rather than spe= cifics of Mahout and associated Apache tooling. Feel free to add useful res= ources (books, but also videos, online courseware, tools), particularly tho= se that are available free online.
This page originated in an email thread, and its different contributors = might not all agree on the best approach (and they might not know what's be= st for any given learner), but the resources here should give some idea of = suitable background reading. Check the mailing list archives if you care to figure out who-said-what, or find othe= r suggestions.
Don't be overwhelmed by all the maths, you can do a lot in Mahout with s= ome basic knowledge. The resources given here will help you understand your= data better, and ask better questions both of Mahout's APIs, and also of t= he Mahout community. And unlike learning some particular software tool, the= se are skills that will remain useful decades later.
Gilbert Strang's Introduction to Linear Algebra=
a> (full text online, highly recommended by several on the=
His lectures are also available online and are strongly recomme= nded. See http://ocw.mit= .edu/courses/mathematics/18-06-linear-algebra-spring-2010/
"Mathematical Tools for Applied Mulitvariate Analysis" by J.Do=
Stanford Machine Learning online courseware(cs229.sta= nford.edu):
"It's a very nicely taught course with super helpful lecture notes = - and you can get all the videos in youtube or iTunesU"
"The section notes for this course= will give you enough review material on linear algebra and probability the= ory to get you going."
MIT Machine Learning online courseware (6.867) has Lecture notes in PDF online.
As a pre-requisite to probability and statistics, you'll need basic calculus. A maths for scientists text might be useful here=
such as 'Mathematics for Engineers and Scientists', Alan Jeffrey, Chapman =
One of the best writers in the probability/statistics world is Sheldon R=
''A First Course in Probability (8th Edition), Pearson'' (amazon) and t= hen move on to his ''Introduction to Probability Models (9th Edition), Acad= emic Press.''(a= mazon)
Some good introductory alternatives here are:
Kahn Academy =E2=80=93 videos on stats, probability, linear al= gebra
Probability and Statistics (7th Edition), Jay L. Devore, Chapman.
Probability and Statistical Inference (7th Edition), Hogg and Tanis, Pea=
Once you have a grasp of the basics then there are a slew of great texts= that you might consult: for example,
Statistical Inference, Casell and Berger, Duxbury/Thomson Learning.
Most statistics books will have some sort of introduction to Bayesian me= thods, consider a specialist text, e.g.:
Introduction to Bayesian Statistics (2nd Edition), William H. Bolstad, W=
Then for the computational side of Bayesian (predominantly Markov chain =
Monte Carlo), e.g.
Bolstad's Understanding Computational Bayesian Statistics, Wiley.
Then you might try Bayesian Data Analysis, Gelman e= t al., Chapman &Hall/CRC
On top of the books, R - is an indis= pensable software tool for visualizing distributions and doing calculations=
For statistics related to machine learning, I would avoid normal statist= ical texts and go with these instead
Pattern Recognition and = Machine Learning by Chris Bishop
Elements of Statistical Learning by Trev= or Hastie, Robert Tibshirani, Jerome Friedman
Also http://research.mic= rosoft.com/en-us/um/people/cmbishop/PRML/index.htm
matrix computations/decomposition/factorization etc.?
How's this one?
any idea? any other suggestion?
I found the one by Peter V. O'Neil "Introduction to Linear Algebra&=
quot;, to be a great book for beginners
(with some knowledge in calculus). It is not comprehensive, but, I believe,=
it will be a good place to start and the author starts by explaining the
concepts with regards to vector spaces which I found to be a more natural way of explaining.http://www.amazon.com/Introduction-Linear-Algebra-Theory-Applications/= dp/053400606X
David S. Watkins "Fundamentals of Matrix Computations (Pure and App=
lied Mathematics: A Wiley Series of Texts, Monographs and Tracts)"
http://= www.amazon.com/Fundamentals-Matrix-Computations-Applied-Mathematics/dp/0470= 528338/
The Gollub / Van Loan text you mention is the classic text for numerical=
linear algebra. Can't go wrong with it. However, I'd also suggest you loo= k
at Nick Trefethen's "Numerical Linear Algebra". It's a bit more<= br /> approachable for practitioners =E2=80=93 GVL is better suited for researche= rs.
http://people.maths.ox.ac.uk/trefethen/text.html= (with some online lecture notes)
I think this is the most relevant book for matrix math on distributed sy= stems:
Many chapters on SVD, there are even chapters on Lanczos
BTW what about R? There is literally tons of books in R series devoted
to rather isolated problems but what would be a good crush course
"I have found that learning about R is a difficult thing. The best=
introduction I have seen is, paradoxically, not really a book about R and assumes a statistical mind-set that I disagree with. That introduction is<= br /> in MASS http://www.stats.ox.ac.uk/pub/MASS4/. Other ref= erences also
In addition, you should see how to plot data well:
Generally, I learn more about R by watching people and reading code than=
reading books. There are many small tricks like how to format data
optimally, how to restructure data.frames, common ways to plot data, which<= br /> libraries do what and so on that an introductory book cannot convey general=
principles that will see you through to success."