A Professor leaves a cushy job in academics for a job @ Google.
Here’s Why ?
« August 2012 | Main | November 2012 »
A Professor leaves a cushy job in academics for a job @ Google.
Here’s Why ?
Posted at 01:01 PM in Reflections | Permalink | Comments (0) | TrackBack (0)
Analyzing data is a little like fixing a motorbike but in reverse: it consists of breaking a data set into its parts e.g., covariate effects and variances), whereas fixing a bike means putting all the parts of a bike into the right place. One way to convince yourself that you really understand how a bike works is to first dismantle and then reassemble it again to a functioning vehicle. Similarly, for data analysis, by first assembling a data set and then breaking it apart into recognizable parts by analyzing it, you can prove to yourself that you really understand the analysis.- Marc Kéry
Posted at 04:40 AM in Reflections | Permalink | Comments (0) | TrackBack (0)
This book is indeed a joy to read. There were many “aha” moments, some of which are :
The book begins with natural numbers that made counting and tallying easy. It ends with with the subject of infinity where everything is on a slippery ground. In this journey from natural numbers to infinity, the book explores various subfields of mathematics.
This book is a pleasure to read as the author connects some basic math stuff with everyday life, in a way that I will never forget.
Posted at 02:59 AM in Books, Math | Permalink | Comments (0) | TrackBack (0)
Cal Newport wants to find out an answer to a nagging question in his mind, “Why do some people end up loving what they do, while so many others fail at this goal ?”. Researching on this question leads him on to a path where he finds rather unconventional answers. Through this book , he shares his findings.
Most of us have equate “passion” as intense love affair with one’s work. There is a also a belief that “passion” is a necessary condition in finding THE RIGHT work. In the first part of the book, the author debunks Passion Hypothesis. What is Passion Hypothesis ? The key to occupational happiness is to first figure out what you’re passionate about and then find a job that matches this passion.
In interviewing a lot of people, watching interviews of successful people and doing a lot of field work, the author comes to a conclusion that “passion hypothesis” is a myth and says,
If “follow passion” is a wrong advice, and people who love their work usually follow non linear paths, what makes people love what they do ?
In the second part of the book, the author argues that it is skill that matters and passion automatically follows you. Some of the points mentioned in this section are :
The book then talks about a very important thing , “Control Trap”. We often feel see people saying that they are going to start a firm because they are frustrated with their work. The author says it is a trap. Starting you company with out developing skillsets, most often, doesn’t work. We are all blinded by the survivorship bias. We see firms started by people and who take control of their lives. However there is a huge unseen cemetery of screwed ups, who start companies with out developing valuable and marketable skills. This kind of advice is very useful to people who are “over enthusiastic to start a company and take control of their lives”. In fact most of the success stories we get to hear are somewhat biased in their narrative. Suddenly someone decides that enough is enough. He starts a firm and then becomes successful. However what is often left out in the story is the background preparation that the person would have done, the non linear paths the person would have taken to develop a certain skillset etc. So sometimes turning that promotion down might be good idea as it will give you time to hone your skillsets. Instead of exercising control in the wrong environment,you are preparing diligently to take control in the right environment.
The book ends with the author applying these points in his own life. So, its not just preaching but he has applied the fundas to his own life.
What I really found interesting about this book is, the careful and well built argument against “Passion hypothesis”. Most of us see successful entrepreneurs and think that one should start a venture, thinking that it will give them control, solve all the problems that they are facing at work and somehow magically transform them lives from a “cubicle dweller” to a visionary. It’s a Fairy tale. In reality, unless you have acquired some valuable and marketable skillset, finding the work that you love will be a mirage!
Posted at 02:01 AM in Books, Philosophy, Reflections, Startup Gyan | Permalink | Comments (2) | TrackBack (0)
The month of September vanished from my life as my erratic eating habits and my work schedule took a toll on my health. Having recovered now, one of the biggest changes I have made to my life is, my diet. Thanks to a colleague of mine, Gautam, who suggested this book, I have started changing a few things in my daily schedule.
Usually I don’t even cast a glance on books with such titles.But this one, I took some time out to go over it. To my surprise, the content was refreshing. The author dispels a lot of myths about nutrition, diet and weight loss. Here are a few points that I will keep in mind:
Posted at 04:34 PM in Books | Permalink | Comments (1) | TrackBack (0)
I liked Daniel Coyle’s “Talent Code” that talks about the importance of “deep practice” in achieving mastery in any field. Not for the message of deep practice as it was already repeated in many books/articles, but for the varied examples in the book.
Here comes another book on the same lines by the same author. This book is a collection of thoughts and ideas from author’s field work, packaged as “TIPS” to improve one’s skillset. These tips are categorized in to three categories, “Getting Started”, “Improving Skills”, and “Sustaining Progress”.
I will just list down some of the tips from each of the sections, mostly from the perspective of someone wanting to improve his programming skills.
Getting Started:
Improving Skills:
Sustaining Progress:
Out of the 52 tips mentioned in the book, I am certain that at least a few will resonate with anyone who is serious about improving his skills.
Posted at 02:51 PM in Books, Ideas, Philosophy, Reflections | Permalink | Comments (2) | TrackBack (0)
One of the reasons for going over this book is, to shuttle between the macro and micro world of modeling. One can immerse in specific type of techniques/algos in stats, forever. But I can’t. I typically tend to take a break and go over macro aspects of modeling from time to time. Books like these give an intuitive sense of “What are the types of models that one builds? I like such books as they make me aware of inductive uncertainty associated with building models. Let me summarize the main points in the book.
Chapter 1 - Introduction
Chapter 3 - Visualizing and Exploring Data
Reading this chapter brought back old memories of multivariate data analysis that I had done. Need to revisit multivariate stuff soon. It needs remedial work from me as I have been spending time on other things.
Chapter 4 - Data Analysis and Uncertainty
This chapter covers the Frequentist inference and Bayesian inference. Well, the content is just enough to get a fair idea of the principles behind them. In doing so, it touches upon the following principles/concepts/terms/ideas :
As rightly pointed out, Bayesian inference has skyrocketed in popularity in the last decade or so, because of computing power available to everyone. Thanks to Gibbs Sampling, MCMC, and BUGS, one can do a preliminary Bayesian analysis on a desktop.
One thing this chapter made it very clear is that there is little difference between sampling with replacement and sampling without replacement in the data mining world? Why? Because of huge amount of data available, you just take a big enough sample and you get fairly good estimate of the parameters for the assumed distribution. Also the chapter says some topics from the traditional statistics discipline such as experimental design, population parameter estimate, etc., are useless in the big data world / data mining world. Instead issues like data cleaning, choosing the right kind of sampling procedure become very critical.
Chapter 5- A Systematic Overview of Data Mining Algorithms
I like this chapter as it gives an overall structure to the DM principles using five components. The first component, i.e. the task for a given problem is usually straight forward to agree upon. The next 4 components have a varying degree of importance based on the problem at hand, based on the person who is working on the problem. For a statistician, Structure of the model and Score function might dominate his thinking and efforts. For a Computer Scientist or a Machine learning expert, the Search /Optimization and DB techniques are the areas where his thinking and efforts are focused on. The relative importance of the components varies depending on the task at hand. A few pointers:
So, the component that becomes critical depends on the problem at hand.
This tuple {model structure, score function, search method, data base management} is a good way to look at things in the modeling world. However depending on the domain the relative importance of these components varies. Also these components in the tuple are not independent. They have a correlation structure so to say.
This chapter gives three examples where various components become important relative to the other and drives the modeling effort.
I have never looked at models and implementation from this perspective. I think this is by far the biggest learning from the book. It has given me a schema to think about various models and their implementation. It is also going to change the way I look at various articles and books in stats. For example the book on GLM, typically written by statisticians is going to focus more on the structure and score function. It is not going to focus on Search algos. It’s fine for toy datasets. But let’s say you want to fit GLM for some signal for a high frequency tick data. Search and DB Mgmt will become far more important and might even drive the modeling process. May be if you want to choose between Gamma link function and Gaussian link function, because of computational efficiency , you might end up choosing a Gaussian link function despite it showing a higher deviance as compared to Gamma link function.
Having a tuple structure mindset helps in moving out of silos and think broadly at the overall task at hand. If you are a stats guy, it is important to keep in mind, the search and Database management components before reading / building anything. If you are a CS guy, it is important to keep in mind the models and score functions, etc.
I think this chapter more than justifies my decision to go over this book. I have learnt a parsimonious and structured language for description, analysis and synthesis of data mining algorithms.
Chapter 6 - Models and Patterns
The chapter begins with the example of Data compression because it a useful way to think of difference between model and pattern. Lower resolution image transmission is like a model and High resolution local structure transmission is like a pattern
The chapter then goes on to talks systematically about various models. The following models are covered in the chapter:
Each chapter ends with a “Further Reading” section that contains information about various books that an interested reader can refer. This is valuable information as it serves as an experienced guide for the data mining literature.
Chapter 7 - Score functions for Data Mining Algorithms
Scoring functions are useful to select the right model. In the regression framework, least square function can be used as a scoring function. In the case of GLM, deviance can be used as scoring function. The chapter starts off by saying that one needs to distinguish between various types of Score functions
There exists numerous Scoring function for Patterns, but none have gained popularity. This is mainly because there is a lot of subjectivity in acknowledging whether a pattern is valuable or not.
The theory behind scoring function for models is well developed and applicable to the real world.
This type of overview of scoring functions is very valuable info. These terms crop up in various modeling techniques. For example, Cross Validation and Cp pop up in local regression, Deviance in GLM, etc. Some of these functions are empirical in nature whereas some have an asymptotic distribution.
Other takeaways from this chapter
Chapter 8 - Search and Optimization Methods
Given a score function, it is important to find the best model and best parameters for a given model. In one sense there are two loops that need to run, first on a set of models and second inner loop on the parameters of each model.
Parameter spaces can be discrete or continuous or mixed. The chapter starts off with general search methods for situations where there is no notion of continuity in the model space or parameter space being searched. This section includes discussion of the combinatorial problems that typically prevent exhaustive examination of all solutions.
Typically if one is coming from Stats background, model fitting usually involved starting with a null model or a saturated model and then comparing the model of interest with them. If there are p parameters to be fitted there can be 2^p model evaluations. However this approach becomes extremely cumbersome in data mining problems. Although mathematically correct, this viewpoint is often not the most useful way to think about the problem, since it can obscure important structural information about the models under consideration.
Since score function is not a smooth function in the Model space, many traditional optimization techniques are out. There are some situations where the optimization can be feasible, i.e. when score function is decomposable, when the model is somewhat linear or quasi-linear so that inner loop of finding the best parameter for a model is not computationally expensive. So obviously with computational complexity, the best way out is heuristic based search. The section then gives an overview of state-space method, Greedy Search method, Systematic Search methods (breadth-first, depth-first, branch-and-bound).
If the score function happens to be continuous, then the math and optimization is pleasing as you can use calculus to get estimates. As an aside, the score function here should not be confused with the score function in MLE context, where score function is the first derivative of the log likelihood with respect to the parameters. The section starts off with describing simple Newton Raphson method and then shows the parameter estimation for univariate and multivariate case. Gradient descent methods, Momentum based methods, Bracketing methods, back propagation methods, iterated weighted least squares, simplex are some of the algos discussed. The chapter then talks about Constrained Optimization, where there is linear/nonlinear score function and linear/nonlinear constraints.
EM Algo is covered in detail. If you are handling financial data, this is a useful technique to have in your toolbox. The algo is a clever extension of likelihood function to depend on hidden variable. Thus the likelihood function has true parameters as well as hidden variables. An iteration between Expectation and Maximization algorithm does the job. Probably the clearest explanation of this algo is in Yudi Pawitan’s book, In all Likelihood.
The chapter ends with stochastic search and optimization techniques. The basic problem with nonstochastic search is that the solution is dependent on the initial point. It might get stuck at the local minimum. Hence methods such genetic algos and simulated annealing can be used so as to avoid seeking local minimas.
This chapter has made me think about the fact that I have not implemented some of the algos mentioned in the book at least on a toy dataset. I should somehow find time and work on this aspect
Chapter 9 - Descriptive Modeling
Chapter 10 - Predictive Modeling for Classification
Having a tuple [model or pattern, score function, search and optimization, database] view point is very important while building and testing models. Off the shelf softwares have only a specific combination of the tuple built in to them. Even in the data mining literature, certain combos of model-score function-optimization algos have become famous. This does not mean that they are the only ones . At the same time, it is not prudent to experiment with all combinations of the tuple.
Chapter 11 - Predictive Modeling for Regression
The basic models are linear, generalized linear and other types of extended models.
Here is a map of models mentioned in the book :
Through out the book, the tuple structure {model structure, score function, search method, data base management} is used to structure a reader’s thought process. Using this structure, the book gives a bird’s overview of all possible models that can be built in the data mining world. Awareness of these possibilities will equip a modeler to take inductive uncertainty in to their modeling efforts.
Posted at 09:59 AM in Books, Statistics | Permalink | Comments (0) | TrackBack (0)
This graphic novel talks about Steve Jobs and Zen Buddhist priest Kobun, who acted as Jobs’ spiritual guru. Hard core Apple fans might like to know the kind of conversations that Jobs had with Kobun . However I felt the book was pointless. I think it is merely trying to cash in on two aspects, 1) Increasing popularity of graphic novels among adults and 2) Steve Jobs death in Oct 2011.
Posted at 01:38 AM in Books, Technology | Permalink | Comments (0) | TrackBack (0)
Likelihood function is a very useful mathematical object in statistics. With it, you can perform the two main tasks in statistics,i.e. estimation and inference. If you can get the distribution right or the overall structural equation right, you can do all types of stats; univariate stats, multivariate stats , linear models, generalized linear model, mixture modeling, mixed effects model and even non parametric statistics to an extent. All of this can be done from scratch with one math object, “Likelihood function” + pen & paper + a plain vanilla optimization routine.
In these days of readily available functions and packages that do everything, often the modeler is left with a 10,000 ft. view of things only. For example, if you are doing a Poisson regression, the modeling of dispersion parameter is close to automatic in R, SAS, SPSS. It almost looks like magic. What’s going on under the hood ? If one goes to the Frequentist side of world and explore things, one often finds heavy reliance on asymptotics and heaps of formulae. If you go to the Bayesian world, there is some learning curve in terms of setting up the right infra to get to the bottom of the stuff. You need to know BUGS and also a way to invoke BUGS from your programming environment. So, sometimes back of envelope parameter estimation and inference becomes elusive. Having said that, the knowledge of Bayesian world is definitely better than living ONLY in the Frequentist world.
But there is an alternate world between this Frequentist and Bayesian mode of thinking, The Fisherian World. This is very appealing world to inhabit from time to time. In this world, all one needs to know is, just one object , “Likelihood function”. That’s it. Once you have the likelihood function for whatever data you have , estimation and inference is largely computational. What I mean by computational is, a plain vanilla optimization routine. Nothing fancy.
I like bootstrapping for, it gets me out of Frequentist world. However bootstrapping takes me only so far. If I have to find relationships between variables, hypothesize a model and test it, I have to eventually fall back to a non-bootstrapping world.
Till date, I have came across Fisherian concepts only in bits and pieces. “Maximum Likelihood estimation” is something that looked nice and easy to apply. Fisher information was convenient to get standard errors of estimates. However I made the mistake of thinking that MLE and Fisher information is all that is there in the Fisher’s world. A grave mistake. This book opened my eyes to a completely new world of modeling and inference.
Brad Efron, says in one of his papers, that 21st century stats will heavily rely on long forgotten Fisherian concepts. Whether the prediction comes true or not, learning Fisherian way of modeling and inference is going to change the way you think about many aspects of statistics.
This book is the main reason for me being thrilled about the whole Fisher’s way of thinking. The book is extremely well written and a diligent reader can reap massive benefits by spending time and effort on it. I think it is THE BEST book on statistics that I have ever read till date. When I worked through this book, it seemed like I was climbing a hillock at regular intervals, rather than a big mountain. The author introduces various concepts with a seemingly challenging problem, i.e a steep climb on a hillock and then allows you to glide smoothly down the hillock.This type of presentation does not tire the reader or create fatigue. May be stats teachers/faculty can take a cue from this book to organize their lectures for the students.
A great quote from the book ,
Understanding statistics is best achieved through a direct experience, in effect letting the knowledge pass through the fingers rather than the ears and the eyes only.
Indeed, this book makes a strong case for coding up stuff and letting the knowledge pass through fingers. Every chapter has concepts that become mightily clear, ONLY after coding. In fact there isn’t a single chapter in the book where one can merely read the contents and get the message.
Reading this book has been a delightful experience. May the best thing to have happened to me in 2012.
Posted at 06:59 PM in Statistics | Permalink | Comments (0) | TrackBack (0)
Stumbled on to an interesting paper that connects Bayesian ideas to Likelihood based inference. Both are related in the sense that Likelihood based Inference can be thought of a Bayesian Inference with uniform/vague prior. However when you get down to estimating and inferring from the data using these two philosophies, the math, the equations you use, the code you need to write are completely different.
This paper by Steel talks about whether a hard core Bayesian must accept Likelihood principle or not. It talks about two versions of Likelihood Principle(LP) that can be easily connected to the posterior-prior Bayes framework. The first version(LP1) is where you see different sets of data but they don’t change the likelihood function and the second version(LP2) is where you evaluate a competing hypotheses with the same dataset.
The paper uses Bayesian confidence measures to show that Bayesians can accept LP1 whereas LP2 can be accepted ONLY when the competing hypotheses is mutually exhaustive(which is never the case in real world). One usually comes across LP2 in many contexts, for example Log Likelihood ratio of Null and Alternate hypothesis in GLM. In such cases Likelihood theory has a straightforward way to test,i.e Compute the Deviance and check the realization of Deviance with respect to the relevant asymptotic distribution This paper shows that this kind of reasoning is weak in the Bayesian world.
Posted at 09:38 AM in Statistics | Permalink | Comments (0) | TrackBack (0)