« January 2012 | Main | March 2012 »
Thanks to Ravi, came to know about an old wired article on Netflix prize, where Gavin Potter used fundas from Behavioral economics to crack the problem
A deeper part of Potter's strategy is based on the work of Amos Tversky and Nobel Prize winner Daniel Kahneman, pioneers of the science now called behavioral economics. This new field incorporates into traditional economics those features of human life that are lost when you think of a person as a rational machine, or as a list of numbers representing cinematic taste.
One such phenomenon is the anchoring effect, a problem endemic to any numerical rating scheme. If a customer watches three movies in a row that merit four stars — say, the Star Wars trilogy — and then sees one that's a bit better — say, Blade Runner — they'll likely give the last movie five stars. But if they started the week with one-star stinkers like the Star Wars prequels, Blade Runner might get only a 4 or even a 3. Anchoring suggests that rating systems need to take account of inertia — a user who has recently given a lot of above-average ratings is likely to continue to do so. Potter finds precisely this phenomenon in the Netflix data; and by being aware of it, he's able to account for its biasing effects and thus more accurately pin down users' true tastes.
Couldn't a pure statistician have also observed the inertia in the ratings? Of course. But there are infinitely many biases, patterns, and anomalies to fish for. And in almost every case, the number-cruncher wouldn't turn up anything. A psychologist, however, can suggest to the statisticians where to point their high-powered mathematical instruments. "It cuts out dead ends," Potter says.
Posted at 09:14 PM in Ideas, Statistics | Permalink | Comments (0) | TrackBack (0)
Dexy seems to offer an amazing solution to documentation that is Platform Independent. It’s a tool created by a Ana Nelson, an open source software developer, who happens to have a PhD in economics.
Obviously a better output that star gazing and making predictions in economics , that anyway go wrong .
Dexy’s USP in Ana’s words:
There are many excellent tools for specific types of documentation, in particular many object-oriented languages have built-in tools for writing API reference documentation (e.g. JavaDocs, RDoc, pydoc), however these tools are very specialized and don’t tend to work well for other types of documentation, such as tutorials, nor do they deal well with the fact that projects these days almost always involve more than one programming language (it would be hard to find a project that didn’t include, at minimum, a few bash scripts or some JavaScript). Dexy lets you continue to use specialized tools for some parts of your documentation, and also gives you the flexibility to write other types of documentation in different formats (tutorials, user guides, lecture notes, presentation slides, posters, even blog posts).
Have started using it and it looks very promising!
Posted at 09:04 PM in Programming | Permalink | Comments (1) | TrackBack (0)
In the Mar 2012 issue of “Advanced Trading” magazine , I found these points worth noting down :
Posted at 07:34 AM in Magazines | Permalink | Comments (0) | TrackBack (0)
I like books that explain things visually and this book falls in that category. I am reading this book after reading and understanding the basics of Python from “Think Python” and “Learn Python the Hard Way”. This book serves a nice visual recap of Python 101. I have listed down some of the points in various chapters mainly to ruminate over the learning's from the previous two books.
Chapter 1- Getting Started
Python environment variables
Chapter 2 - Expressions and Statements
Chapter 3 - Working With Numbers
Chapter 4 - Working With Strings
Chapter 5 - Working With Lists and Tuples
Chapter 6 - Working With Dictionaries
Chapter 7 - Control Flow Statements
Chapter 8 - Functions
Chapter 9 - Modules
Chapter 10 - Files
This chapter contains a list of functions that one would typically use to work with files stored on the disk. The note on pickle and cpickle module was something that I haven't tried till date. Have to work on it soon.
Chapter 12 - Classes
The following gives a schematic diagram of Python in-built types
With abundant screenshots scattered through out , this book indeed offers a quick recap of all the basic elements of Python programming.
Posted at 09:48 PM in Books, Programming | Permalink | Comments (0) | TrackBack (0)
Prof. Hadley Wickham, the creator of ggplot2 and other useful packages like plyr, reshape etc. has one strong advice to R programmers – “Read other’s code”. This comes from a person who has developed 30 packages till date. We all have an immense urge to program, code up something, view the results, tweak our code to make it work etc. However pausing to read somebody else’s source code requires a certain amount of hard work, willingness to learn from others . In R particularly, where all the functions are documented really well, one hardly NEEDS to go in to the code. But that’s exactly what Hadley Wickham recommends.
In that sense, this book by Zed A Shaw has a similar message for Python. You have to read code on a regular basis and it is typically hard work to read what other people have written. I guess that’s the reason why this book is titled, “Learn Python The Hard Way”. This book introduces Python step by step in 52 exercises where the author gives pointers to various modules, websites for the reader to figure out stuff. So, all the exercises have one common structure – “ introduce a topic and make the reader curious to check out things from other sources”.
As a newbie, I found this book interesting for a couple of reasons. Firstly, the author urges the reader to type out every single line of code in the book. No copy pasting allowed when you learning something new. The other thing I liked about the book is about author giving clear instructions to the reader to follow a directory structure for a Python project. For a long time I never followed any specific directory structure funda for many projects in whatever languages I have coded. However once I learnt Ruby on Rails, I understood the advantage of following a nice standardized directory structure for any task/ project/ library. Not all frameworks make strong recommendations like Ruby on Rails. So, the programmer has to figure out something that works. That’s usually a trial and error process. Starting from a well thought out directory structure in Python is going to be helpful in the long run when you want to go back, review or commit the project to Version control system.
Let me list down the things that I learnt from this book.
The author concludes the book with a superb reminder to any programmer
Which programming language you learn and use doesn’t matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their true purpose of being your tool for doing interesting things.
Posted at 07:22 PM in Books, Programming | Permalink | Comments (0) | TrackBack (0)
If there is a lot of data parsing and cleaning that needs to done before modeling, I tend to follow one of the three paths :
Path 3 is something I take very often. However Paths 1 and 2 are also interesting as they give a ton of modules that one can use from Python. A few years back I had used some data types of Python, mainly the dictionary and had worked on something I don’t even remember properly. It was more of an ad-hoc task and had since then never used Python in a big way but for some basic data cleaning tasks. Over the years I have slowly graduated to performing the entire data cleaning exercise in R itself and completely avoid Python. Lately I have realized that I have followed a convenient path instead of a hard but worthwhile paths(1&2). So, I picked up this book to get a decent understanding of data types and modules in Python . In this post, I will list all the points that I found relevant in this book for a newbie like me :
Posted at 09:21 PM in Books, Programming | Permalink | Comments (4) | TrackBack (0)
Via TP:
What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. There is a lot more data, all the time, growing at 50 percent a year, or more than doubling every two years, estimates IDC, a technology research firm. It’s not just more streams of data, but entirely new ones. For example, there are now countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates. They can measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air.
Link these communicating sensors to computing intelligence and you see the rise of what is called the Internet of Things or the Industrial Internet. Improved access to information is also fueling the Big Data trend. For example, government data — employment figures and other information — has been steadily migrating onto the Web. In 2009, Washington opened the data doors further by starting Data.gov, a Web site that makes all kinds of government data accessible to the public.
Link : Big Data
Posted at 07:18 AM in Math, Statistics | Permalink | Comments (0) | TrackBack (0)
The connection between The Lady Tasting Tea and Statistics :
Posted at 09:44 PM in Statistics | Permalink | Comments (0) | TrackBack (0)
In the Feb 2012 issue of “Traders” magazine , I found these points worth noting down :
Posted at 12:21 PM in Magazines | Permalink | Comments (0) | TrackBack (0)
A popular ghazal based on Raag Yaman Kalyan rendered by Rohini Ravada , Shankar Tucker
Posted at 10:12 AM in Music | Permalink | Comments (0) | TrackBack (0)
Alan Jacobs, the author of this book is an English Professor at Wheaton College, Illinois. Given his position as a professor, his students and other people often ask him, “What are the 10 best books on literature that every educated person must read", "Dear Prof, Can you suggest some books to read this summer?", This book is written to answer all such questions. So one might think this book is basically a recommendation type / instructional / didactic guide to reading. Far from it, this 150 page long essay on reading at Whim, with no fixed pattern, with only one objective in mind, "Pleasure".
The book starts off with the author noticing that many people including his son are put off by books such as, “How to Read a Book?”, “ How to Read Literature like a Professor ?” , “ The New Lifetime Reading Plan” , etc. The premise behind all these kind of books is that reading needs to be systematically carried out and there are certain books that need to be read to appreciate and become good at understanding literature. Most of these books smell of Responsibility, Obligation and Virtue, the very attributes that make people make run away from reading. So, he says, reading needs a model that works, i.e "Read at Whim". The people who look out for such "10 best books to read" recommendations actually don't really want to read a book, but want to check things off from a mental bucket list. They want to say,“Yes, now I am done with this book”. Reading at Whim means reading something that gives you pleasure,i.e there is nobody that we are signaling to , nobody that we are trying to impress.It is really out of pure enthusiasm that one reads. One usually sees this in children when you give them a book. They read it for the pure joy of it. There are tons of authors out there who feel that reading must not be frivolous, meaning, Harry Potter is not serious book, in their opinion. In fact they have this assumed checklist of books that,`Ought to be read' by a serious reader. The author states that this model is broken, and says, "Read at Whim" should be the new model.
Ok, fine. You should read at Whim, So pick up whatever you feel like reading and the one that you think will give you pleasure. Done deal. 20 pages in to the book, the author makes this abundantly clear.So, Is there any point in going over the 130 odd pages in the book ?
Well, the rest of the book is NOT reiterating this message over and over again. `Reading at Whim' is the foundation of the model that the author talks about in the book. If this were the only principle that we follow, soon we will be facing with situations as these
So, you see reading at Whim can take us only so far. In this context, the author talks about the second element of the model, i.e self-knowledge and discernment. These are crucial to develop while reading. These will help you chuck the books midway, if you think prodding through the text doesn't give you pleasure. This also makes you aware of your tastes and preferences. “ Self-knowledge and Discernment” are precisely the things that you will not develop if you tend to follow somebody else's recommendations, maintain a list of books to be read, etc.
“How to Read a Book” and similar guides offload accountability for our reading: they say, implicitly, that self-knowledge and discernment aren't needful because experts can take care of that for us. But if we reject that implicit claim, the next question that needs to be addressed is,“ How to move from `blind propensity' to `informed consent' to `Whim's sovereignty' ” ? One of the suggestions by the author is to “Read Upstream”, i.e read books that your favorite authors have read as they give a peek in to your favorite books' characters, plots and imagination. This kind of upstream reading is also useful in math. You might come across a good application of a technique, but if you read upstream you might get to read all the trials and tribulations that went behind the technique etc. For example,Baire's failure in categorizing functions helped Lebesgue in defining measurable and non-measurable functions. If you read ONLY about Lebesgue and don't look in to the development made by Baire, you are likely to miss a lot of action. Reading upstream need not be only be about historical developments behind a technique. It might be about things that make you wonder, curious about life in general. If you look at Cantor's math and read about the Cantor's life, what shaped his ideas about infinite infinities, What drove him mad, what made him die alone in an asylum, What made his story tragic but his achievements a mathematical breakthrough, you will forever look upon George Cantor in a completely different light.
The author then makes a strong case for annotating the text/ reading with a pencil. By turning our passive reading style in to an active one, the book tends to offer more than what it might seem in the beginning. There is also warning against highlighting, as Highlighters allow you very quickly and easily to mark a text, but only by covering it with a bright color; and the very quickness and easiness of the process are inimical to the kind of active reading that is needed. This point is similar to Dr.Medina's finding mentioned in his book Brain Rules. By making the initial contact about an idea/phrase/character more elaborate , it is likely that one remembers better. By reading fast we miss on the opportunity of elaborate encoding. Obviously this does not apply to every book. One should not read Harry Potter with a pencil , such books are good when the reader goes with the momentum, the less stoppages the better. This means that as a reader, the decision to annotate or go with the flow of the book is important.
Reading Slowly is the next aspect that author focuses on. Most of us read fast because of the implicit thought that “ Time is too short to read all the books”. Yes time IS short, but one crucial aspect that gets neglected by making reading . `a race' ,is, “Books become better when they are reread. Unless you annotate , read slowly, your re-read would be equivalent to a new read". Reading fast - It's like you have the content uploaded in your working memory, feel good about it, check off that item from the list, move on to the next book." Considering the short term nature of working memory, its like all the content is in RAM. Once the application shuts off, RAM is erased. If you want the stuff to get stored in long term memory, you have to read slowly, annotate and MOST IMPORTANT part is to re-read. Whenever you have the urge to read a set of blogs / books in quick succession, pause and ask yourself, “ Do you want to read' ? ” or “Do you want to have read ?”. An honest answer will keep you off the speed track.
Via a Poem from W.H.Auden, the author makes a case for `eye-on-the-object' look that is needed for getting pleasure from a book, i.e we must cultivate attention while reading. We need to be attentive of words , phrases, characters, etc. so that we can lose ourselves in the process of reading. The poem mentioned in this context is very beautiful and goes like this,
You need not see what someone is doing to know if it is his vocation.
You have only to watch his eyes; a cook mixing a sauce, a surgeon
making a primary incision, a clerk completing a bill of lading,
wear the same rapt expression, forgetting themselves in a function.
How beautiful it is, that eye-on-the-object look.
There is a section that talks about a 12th century Abbot, Hugh's advice to his monks. Even though it belongs to advice centuries ago, it is equally relevant for people whose motives for reading are far from monastic. Hugh's advice on humility is relevant to the book as it says the reader should keep in mind three aspects,
These lessons mean that one should not only be attentive to what one studies, but also positively disposed towards it: friendly,even affectionate.
Amidst all this discussion about reading, the author takes a radical view point , i.e Schools can never teach students to deep-read. Irrespective of which class a student is in, there is always this feeling that, ` I will be graded' lurking in his mind. So, the kind of attentiveness that is proper to school is more of `hyper attention' than `deep attention'. Look at any kids curriculum, you will amazed at the QUANTITY that is covered as a part of syllabus. With grades and the competitive pressure, Can a student deep-read ? No , says the author as reading textbooks and the like-does not require extended unbroken focus. It requires discipline not raptness.I don't agree to this point. Yes, a student probably can't deep-read all subjects but I think focusing on a few subjects and understanding them really well, might be better than knowing a bit about all the subjects. Yes, the student might fall behind on the average grade across subjects, but he will graduate from a school or a college with a better frame of mind. However looking at the way the educational system in India, I think the author might be right as LOT is taught and tested from the young minds that there is no choice but to cram.
One of the most important points that I found relevant to my reading habits is : Reread. I tend to read math /stats books a lot and I find it imperative to reread them.Well, one aspect of summarizing and posting them to a blog is that, these summaries serve as a starting point when I reread a book. The author makes a strong case for reading and I quote the author ,
If most of us read too fast, most of us also read too many books and are unwisely reluctant to return to something we think we already know. I use "think" here advisedly, because , as my examples show, a first encounter with a worthwhile book is never a complete encounter and we are usually in error to make it a final one. But those who want to have read, who are checking books off their bucket list , will find the thought of rereading even more repulsive than the thought of reading slowly and ruminatively. And yet rereading a book can often be a more significant dramatic and new experience than encountering an unfamiliar work
This visual broadly gives the structure.model explained in the book
We usually read for information or understanding or entertainment. Dismissing all the so called expert recommendations that one receives on reading, the book has one central message , "Read at Whim". It warns the reader from making reading in to a 'have read' activity.
Posted at 06:11 PM in Books, Reflections | Permalink | Comments (0) | TrackBack (0)
Via BayesianBiologist : An excellent explanation of the fallacy that most people have about, P-value
What we really want is the probability of hypotheses given our data (written as P(H | D) ), which we can obtain by applying Bayes rule.
What we get from a p-value is the probability of observing something as extreme or more than our data, under the null hypothesis ( written as P(x>=D | Ho) ). Isn’t that awkward? No wonder it is so commonly misrepresented.
Posted at 12:09 AM in Statistics | Permalink | Comments (0) | TrackBack (0)
“Unless one is happy, one cannot bestow happiness on others.
Happiness is born of Peace and can reign only when there is no disturbance.
Disturbance is due to thoughts, which arise in the mind.
When the (thinking) mind is absent there will be perfect Peace.”
--- Ramana Maharshi
Posted at 12:04 AM in Reflections | Permalink | Comments (0) | TrackBack (0)
In the Feb 2012 issue of “Advanced Trading” magazine , I found these points worth noting down :
Posted at 11:14 PM in Magazines | Permalink | Comments (0) | TrackBack (0)
This book contains most of the productivity hacks that one comes across in various articles/blogs/books. In one sense, this book is a laundry list of hacks that one can try out to increase productivity. A big font size for the text and rich images scattered through out the book, makes it a coffee table book.
Some of the hacks that I found interesting are,
Posted at 09:15 AM in Books, Ideas | Permalink | Comments (0) | TrackBack (0)
Via Advanced Trading :
As regulators in the United States and Europe weigh the merits of new regulations to govern high-frequency trading, emerging markets have been methodically paving the way for the practice to expand within their borders.
In an interview with the BBC, Progress Software's chief technology officer John Bates says said the practice of rapid-fire trading is quickly expanding in the so-called BRIC nations - Brazil, Russia, India and China.
From BBC:
We've seen it grow very quickly in Brazil. It's done what happened in London and New York much more quickly. Now we're seeing the same trend in India and China and even, embryonically, in Russia." According to Dr. Bates, in the past two to three years, Brazil has already run through a cycle of development that took far longer in London and New York, with algorithm-based trading now available in equities, futures and foreign exchange markets. Brazil's Bovespa stock exchange has invested in new technology, boosting the proportion of algorithm-based equity trades from 4% to 12% in the past year. "The adaptation is faster and they can leapfrog the mistakes that have been made in other places," he says.
Brazil has cleared regulatory hurdles of its own to spur the growth of its marketplace. In December it lifted a financial transaction tax for foreign investors, a move that will undoubtedly create new opportunities for Dodd-Frank and Basel III escapees. And the recent moves by Bovespa - the nation's largest exchange - are likely to bring a dramatic lift to trading volumes and the level of liquidity it handles over the foreseeable future.
Meanwhile in India, the BBC noted that nearly a quarter of all trading is now done using algorithms, a number that's virtually assured of an exponential jump as well. The Bombay Stock Exchange, a $1.5 trillion marketplace, said it expects such trading to double over the next three years, which would put that nation on par with Europe and the U.S.
Now emerging markets aren't without their own set of serious challenges.But they've clearly caught HFT fever and the practice is poised for sharp growth this year, even as the old world wrestles with how to govern it.
Sometimes I get skeptical about such articles that say HFT is going to happen soon in India. These articles make it sound as though things are all rosy and HFT is just on the corner. Well, Just because 25% of the trading volume in India happens electronically, it does not mean they are necessarily HFT orders. They might just be order routing algos. Since the stock exchanges(NSE + BSE) are tight lipped about these orders and the actual numbers, one can only speculate here. I think that they are nothing but the result of simple order routing algos. I think they are not even the result of smart order routing algos but plain vanilla order routing algos. At least from whatever I have read about HFT, order routing algos do no qualify as HFT. Would be pleasantly surprised if my speculation is proved wrong and one of the exchanges actually gives out some metrics and numbers that indicate that majority of orders are HFT in nature.
One boost to HFT could be a drastic reduction in Securities Transaction Tax. Budget is just a few weeks away and it will be interesting to see whether govt. does something about it.
Posted at 12:07 AM in Finance | Permalink | Comments (0) | TrackBack (0)
An old NY Times article on the surprising ubiquity of Zipf’s law :
One of the pleasures of looking at the world through mathematical eyes is that you can see certain patterns that would otherwise be hidden. This week’s column is about one such pattern. It’s a beautiful law of collective organization that links urban studies to zoology. It reveals Manhattan and a mouse to be variations on a single structural theme.
The mathematics of cities was launched in 1949 when George Zipf, a linguist working at Harvard, reported a striking regularity in the size distribution of cities. He noticed that if you tabulate the biggest cities in a given country and rank them according to their populations, the largest city is always about twice as big as the second largest, and three times as big as the third largest, and so on. In other words, the population of a city is, to a good approximation, inversely proportional to its rank. Why this should be true, no one knows.
Even more amazingly, Zipf’s law has apparently held for at least 100 years. Given the different social conditions from country to country, the different patterns of migration a century ago and many other variables that you’d think would make a difference, the generality of Zipf’s law is astonishing.
Keep in mind that this pattern emerged on its own. No city planner imposed it, and no citizens conspired to make it happen. Something is enforcing this invisible law, but we’re still in the dark about what that something might be.
Many inventive theorists working in disciplines ranging from economics to physics have taken a whack at explaining Zipf’s law, but no one has completely solved it. Paul Krugman, who has tackled the problem himself, wryly noted that “the usual complaint about economic theory is that our models are oversimplified — that they offer excessively neat views of complex, messy reality. [In the case of Zipf’s law] the reverse is true: we have complex, messy models, yet reality is startlingly neat and simple.”
After being stuck for a long time, the mathematics of cities has suddenly begun to take off again. Around 2006, scientists started discovering new mathematical laws about cities that are nearly as stunning as Zipf’s. But instead of focusing on the sizes of cities themselves, the new questions have to do with how city size affects other things we care about, like the amount of infrastructure needed to keep a city going.
For instance, if one city is 10 times as populous as another one, does it need 10 times as many gas stations? No. Bigger cities have more gas stations than smaller ones (of course), but not nearly in direct proportion to their size. The number of gas stations grows only in proportion to the 0.77 power of population. The crucial thing is that 0.77 is less than 1. This implies that the bigger a city is, the fewer gas stations it has per person. Put simply, bigger cities enjoy economies of scale. In this sense, bigger is greener.
The same pattern holds for other measures of infrastructure. Whether you measure miles of roadway or length of electrical cables, you find that all of these also decrease, per person, as city size increases. And all show an exponent between 0.7 and 0.9.
Now comes the spooky part. The same law is true for living things. That is, if you mentally replace cities by organisms and city size by body weight, the mathematical pattern remains the same.
For example, suppose you measure how many calories a mouse burns per day, compared to an elephant. Both are mammals, so at the cellular level you might expect they shouldn’t be too different. And indeed, when the cells of 10 different mammalian species were grown outside their host organisms, in a laboratory tissue culture, they all displayed the same metabolic rate. It was as if they didn’t know where they’d come from; they had no genetic memory of how big their donor was.
But now consider the elephant or the mouse as an intact animal, a functioning agglomeration of billions of cells. Then, on a pound for pound basis, the cells of an elephant consume far less energy than those of a mouse. The relevant law of metabolism, called Kleiber’s law, states that the metabolic needs of a mammal grow in proportion to its body weight raised to the 0.74 power.
This 0.74 power is uncannily close to the 0.77 observed for the law governing gas stations in cities. Coincidence? Maybe, but probably not. There are theoretical grounds to expect a power close to 3/4. Geoffrey West of the Santa Fe Institute and his colleagues Jim Brown and Brian Enquist have argued that a 3/4-power law is exactly what you’d expect if natural selection has evolved a transport system for conveying energy and nutrients as efficiently and rapidly as possible to all points of a three-dimensional body, using a fractal network built from a series of branching tubes — precisely the architecture seen in the circulatory system and the airways of the lung, and not too different from the roads and cables and pipes that keep a city alive.
These numerical coincidences seem to be telling us something profound. It appears that Aristotle’s metaphor of a city as a living thing is more than merely poetic. There may be deep laws of collective organization at work here, the same laws for aggregates of people and cells.
The numerology above would seem totally fortuitous if we hadn’t viewed cities and organisms through the lens of mathematics. By abstracting away nearly all the details involved in powering a mouse or a city, math exposes their underlying unity. In that way (and with apologies to Picasso), math is the lie that makes us realize the truth.
Posted at 08:57 PM in Math | Permalink | Comments (0) | TrackBack (0)
Posted at 08:55 PM in Talks | Permalink | Comments (0) | TrackBack (0)