« April 2015 | Main | June 2015 »
The book serves a nice intro to Bayes theory for an absolute newbie. There is minimal math in the book. Whatever little math that’s mentioned, is accompanied by figures and text so that a newbie to this subject “gets” the basic philosophy of Bayesian inference. The book is a short one spanning 150 odd pages that can be read in a couple of hours. The introductory chapter of the book comprises few examples that repeat the key idea of Bayes. The author says that he has deliberately chosen this approach so that a reader does not miss the core idea of the Bayesian inference which is,
Bayesian inference is not guaranteed to provide the correct answer. Instead, it provides the probability that each of a number of alternative answers is true, and these can then be used to find the answer that is most probably true. In other words, it provides an informed guess.
In all the examples cited in the first chapter, there are two competing models. The likelihood of observing the data given each model is almost identical. So, how does one chose one of the two models ? Well, even without applying Bayes, it is abundantly obvious which of the two competing models one should go with. Bayes helps in formalizing the intuition and thus creates a framework that can be applied to situations where human intuition is misleading or vague. If you are coming from a frequentist world where “likelihood based inference” is the mantra, then Bayes appears to be merely a tweak where weighted likelihoods instead of plain vanilla likelihoods are used for inference.
The second chapter of the book gives a geometric intuition to a discrete joint distribution table. Ideally a discrete joint distribution table between observed data and different models is the perfect place to begin understanding the importance of Bayes. So, in that sense, the author provides the reader with some pictorial introduction before going ahead with numbers.
The third chapter starts off with a joint distribution table of 200 patients tabulated according to # of symptoms and type of disease. This table is then used to introduce likelihood function, marginal probability distribution, prior probability distribution, posterior probability distribution, maximum apriori estimate . All these terms are explained using plain English and thus serves as a perfect intro to a beginner. The other aspect that this chapter makes it clear is that it is easy to obtain probability of data given a model. The inverse problem, i.e probability of model given data, is a difficult one and it is doing inference in that aspect that makes Bayesian inference powerful.
The fourth chapter moves on to continuous distributions. The didactic method is similar to the previous chapter. A simple coin toss example is used to introduce concepts such as continuous likelihood function, Maximum likelihood estimate, sequential inference, uniform priors, reference priors, bootstrapping and various loss functions.
The fifth chapter illustrates inference in a Gaussian setting and establishes connection with the well known regression framework. The sixth chapter talks about joint distributions in a continuous setting. Somehow I felt this chapter could have been removed from the book but I guess keeping with the author’s belief that “spaced repetition is good”, the content can be justified. The last chapter talks about Frequentist vs. Bayesian wars, i.e. there are statisticians who believe in only one of them being THE right approach. Which side one takes depends on how one views “probability” as - Is probability a property of the physical world or is it a measure of how much information an observer has about that world ? Bayesians and increasingly many practitioners in a wide variety of fields have found the latter belief to be a useful guide in doing statistical inference. More so, with the availability of software and computing power to do Bayesian inference, statisticians are latching on to Bayes like never before.
The author deserves a praise for bringing out some of the main principles of Bayesian inference using just visuals and plain English. Certainly a nice intro book that can be read by any newbie to Bayes.
Posted at 06:41 PM in Books, Math, Statistics | Permalink | Comments (0)
Via boingboing:
Someone stole $90 million from a company I was involved in. I'm a poor judge of people. The company collapsed.
Some things I can't learn. I tend to like people too much.
So it's hard for me to be a good judge of people, no matter how much I try. So I find other people who are good at judging people and I ask them to help me.
Don't force yourself to learn something if you don't want to or it's not a natural talent.
What's the role of talent? Very small. But you have to start with it. Talent is the seed of skill.
How do you know if you are talented? If you loved it when you were ten years old. If you dream about it. If you like to read about it. Read the below and you'll know what you are talented at.
Trust me when I say: everyone is talented at many things.
In the past 20 years I've wanted to learn how to do some things really well. Writing, programming, business skills (leadership, sales, negotiating, decision-making), comedy, games.
So I developed a ten step technique for learning.
1. LOVE IT.
If you can't start with "love" then everyone who does love will beat everyone who "likes" or "hates".
This is a rule of the universe. The first humans who crossed the arctic tundra from Siberia to Alaska in -60 degree temperatures had to love it. The rest stayed in the East Africa Savannah.
The very first day I wrote a "Hello, World" computer program I dreamed about computers. I woke up at 4am to get back to the "computer lab" and make even bigger programs.
When I first started to write every day, I would write all day. I couldn't stop. And all I wanted to talk about with people were different authors.
When I was 10 years old I wrote a gossip column about all my fellow 5th graders. I read every Judy Blume book. I read everything I could. I loved it.
Most of my friends got bored with me and soon I was very lonely. Except when I was writing.
2. READ IT.
Bobby Fischer wasn't that good at chess. He had talent but nobody thought much of him.
So around the age of 12-13 he disappeared for a year. He did this later in his 20s.
But at 13 when he came back on the scene he was suddenly the best chessplayer in the US, won the US championship, and became the youngest grandmaster in the world.
How did he do it? He barely played at all during his year of wandering.
Instead he did two things:
a) he studied every game played in the prior century. In the 1800s.
When he came back on the scene he was known for playing all of these antiquated openings but he had improvements in each one. Nobody can figure out how to defeat these improvements.
In fact, the final game of the World Championship many years later, in 1972 when he was playing Spassky, he brought out his 1800s arsenal to become World Champion.
Spassky desperately needed to win to keep the match going. Fischer needed to draw to win the title.
Spassky started with a very modern attacking opening ("The Sicilian") But then around 13 moves in, all of the commentators watching gasped.
Fischer had subtly changed the opening into an old-fashioned very drawish 1800s opening called "The Scotch Game." Spassky didn't have a chance after that.
b) He learned enough Russian to read the Russian chess magazines. At the time, the top 20 players in the world were all Russian. The Americans didn't really have a chance.
So Fischer would study the Russian games while all of the Americans were sitting around with openings and styles the Russians already knew how to defeat.
Consequently, when Fischer competed in the US championship in the early 60s it was the first complete shutout, all wins and not a single draw.
Studying the history, studying the best players, is the key to being the best player. Even if you started off with average talent.
3. TRY IT. BUT NOT TOO HARD.
If you want to be a writer, or a businessman, or a programmer, you have to write a lot, start a lot of businesses, and program a lot of programs.
Things go wrong. This is why quantity is more important than quality at first.
The learning curve that we all travel is not built by accomplishments. It's only built by quantity.
If you see something 1000 times, you'll see more than the person who sees the same thing only ten times.
Don't forget the important rule: the secret of happiness is not "being great" - the secret is "growth".
If you only "try" you'll get to your level that is natural for you. But growth will stop and you won't be happy.
4. GET A TEACHER (PLUS THE 10X RULE).
If I try to learn Spanish on my own, I get nowhere. But when I go out (and now marry) someone who is from Argentina, I learn more Spanish.
With chess, writing, programming, business, I always find someone better than me, and I set a time each week to ask them tons of questions, have them give me assignments, look over my mistakes and tell me where I am wrong.
For everything you love, find a teacher and that makes you learn 10x faster.
In fact, everything I put on this list, makes you learn 10x faster. So if you do everything on this list you will learn 10 to the 10th power faster than anyone else.
That's how you become great at something.
5. STUDY THE HISTORY. STUDY THE PRESENT.
If you want to learn how to be a GREAT programmer (not just good enough to program an app but good enough to be GREAT, study machine language.
Study 1s and 0s. Study the history of the computer, learn how to make an operating system, and Fortran, Cobol, Pascal, Lisp, C, C++, all the way through the modern languages of Python, etc.
If you want to write better, read great books from the 1800s. Read Hemingway and Virginia Woolf and the Beats, and the works that have withstood the test of time.
They have withstood the test of time. versus millions of other books, for a reason. They are the best in the world.
Then study the current criticism of those books to see what you have missed. This is just as important as the initial reading.
If you want to study business, read biographies of Rockefeller, Carnegie, the first exchange in Amsterdam, the junk-bond boom, the 90s, the financial bust. Every Depression. All the businesses that flourished in every depression.
Read "Zero to One" by Peter Thiel. Watch "The Profit" on CNBC. Read about Steve Jobs. Read about the downfall of Kodak in "The End of Power".
Don't read self-help business books. They are nothing. You are about to enter a great field, the field of innovation that has created modern society. Don't read the average books that came out last year.
Step up your game and read about the people and inventions that changed the world into what it is today.
Read how Henry Ford had to start three car companies to get it right and why "three" was the important number for him.
Read about why Ray Kroc's technique for franchising created the world's largest restaurant chain. Read how the Coca-Cola makes absolutely nothing but is the largest drink company in the world.
Write down the things you learn from each reading.
6. DO EASY PROJECTS FIRST.
Tony Robbins told me about when he was scared to death on his first major teaching job.
He had to teach a bunch of Marines how to improve their sharpshooting. "I had never shot a gun in my life," he said.
He studied quite a bit from professionals but then he came up with a technique that resulted in the best scores of any sharpshooting class before then.
He brought the target closer.
He put it just five feet from them. They all shot bullseyes. Then he moved it back bit by bit until it was the standard distance.
They were still shooting bullseyes.
Richard Branson started a magazine before he started an airline. Bill Gates wrote BASIC before his team wrote Windows.
E.L. James (and yes, I'm including her) wrote Twilight fan fiction, before she wrote "50 Shades of Grey".
Ernest Hemingway never thought he could write a novel. So he wrote dozens of short stories.
Programmers write "Hello, World" programs before they make their search engines.
Many chess grandmasters recommend you study the endgame first in chess (when there are few pieces left on the board) before you study the other parts of the game.
This gets you confidence, it teaches subtleties, it gives you greater feelings of growth and improvement - all steps on the path to success.
7. STUDY WHAT YOU DID.
The other day I threw everything out. Everything. I threw out all my books (donated). I threw out all my clothes.
I threw out old computers. I threw out plates I never used. I threw out sheets I would never have guests for. I threw out furniture (four book cases) and my TV and old papers and everything.
I wanted to clean up. And I did.
I found a novel I wrote in 1991. 24 years ago. It was horrible.
For the first time in those 24 years, I re-read it. I studied what I did wrong (character unrelatable. Plot too obvious. Deus ex machina all over the place).
Someone told me a story about Amy Schumer, one of my favorite comedians. She videotapes all her performances.
Then she goes back to her room and studies the performance second by second. "I should have paused another quarter-second here," she might say.
She wants to be the best at comedy. She studies her every performance.
When I play chess, if I lose, I run the game into the computer. I look at every move, what the computer suggests as better, I think about what I was thinking when I made the bad move, and so on.
A business I was recently invested in fell apart. It was painful for me. But I had to look at it and see what was wrong. Where did I make a mistake. At every level I went back and wrote what happened and where I might have helped better and what I missed.
If you aren't obsessed with your mistakes then you don't love the field enough to get better.
You ask lousy questions: "Why am I no good?" Instead of good questions: "What did I do wrong and how can I improve?"
When you consistently ask good questions about your own work, you become better than the people who freeze themselves with lousy questions.
Example: I hate watching myself after a TV appearance. I have never done it. So I will never get better at that.
8. YOU ARE THE AVERAGE OF THE FIVE PEOPLE AROUND YOU.
Look at every literary, art, and business scene. People seldom get better as individuals. They get better as groups.
The Beats: Jack Kerouac, Allen Ginsberg, William Burroughs and a dozen others.
The programmers: Steve Jobs, Bill Gates, Ted Leonsis, Paul Allen, Steve Wozniak and a dozen others all came out of the Homebrew Club
The art scene in the 50s: Jasper Johns, De Kooning, Pollack, etc all lived on the SAME STREET in downtown NYC.
YouTube, LinkedIn, Tesla, Palantir, and to some extent Facebook, and a dozen other companies came out of the so-called "PayPal mafia".
All of these people could've tinkered by themselves. But humans are tribal mammals. We need to work with groups to improve.
Find the best group, spend as much time with them, and as a "scene" you become THE scene.
You each challenge each other, compete with each other, love each other's work, become envious of each other, and ultimately take turns surpassing each other.
9. DO IT A LOT.
What you do every day matters much more than what you do once in awhile.
I had a friend who wanted to get better at painting. But she thought she had to be in Paris, with all the conditions right.
She never made it to Paris. Now she sits in a cubicle under fluorescent lights, filling out paperwork all day.
Write every day, network every day, play every day, live healthy every day.
Measure your life in the number of times you do things. When you die: are you 2 writing sessions old? Or are you 50,0000?
10. FIND YOUR EVIL PLAN.
Eventually the student passes the master.
The first hedge fund manager I worked for now hates me. I started my own fund and his fund went out of business. My evil plan was ultimately to be better than him.
But how?
After all of the above, you find your unique voice. And when you speak in that voice, the world hears something it has never heard before.
Your old teachers and friends might not want to hear that voice. But if you continue to be around people who love and respect you, then they will encourage that new voice.
There's that saying, "there are no new ideas." But there are.
There are all the ideas in the past combined with the new beautiful you. You're the butterfly.
Now it's your turn to teach, to mentor, to create, to innovate, to change the world. To make something nobody has ever seen before and perhaps will never see again.
Posted at 01:19 AM in Reflections | Permalink | Comments (0)
“Write your code as though you are releasing it as a package” - This kind of thinking forces one to standardize directory structure, abandon adhoc scripts and instead code well thought out functions, and finally leverage the devtools functionality to write efficient, extensible and shareable code.
Posted at 05:07 PM in Programming | Permalink | Comments (0)
I’m not interested in daily chores. We have now swapped information for knowledge, which is not the same thing. I do not want to know. I’m not online. I don’t even have a computer.
- Brunello Cucinelli
Posted at 10:27 AM in Reflections | Permalink | Comments (0)
The paper titled, “Discerning Information from Trade Data” by David Easley, Marcos Lopez de Prado, Maureen O'Hara, gives a Bayesian framework for trade classification. The most popular method for classifying a trade as buy/sell is via “tick test”. The authors introduced Bulk Volume Classification (BVC) and empirically test the performance of it vis-à-vis tick test. In this post, I will briefly summarize the paper :
Introduction
With the advent of HFT, the markets have changed completely. Order cancellations and modifications have shot up as compared to yesteryears. Execution side algos chop orders and send it across to the exchange and hence it is order flow rather than individual orders that relate to trade motivation. Also with strategies such as “persistent bidder”, where the aggressive trader uses limit orders to trade, the vital link between informed traders and aggressive traders is lost. All these implications severely undermine algos that infer trade direction from a single trade.
If one thinks about the trading intention, it is clear that it is unobservable and hence a Bayesian approach seems a very logical approach. Have a prior on the unobservable, look at the data and then formulate a posterior on the unobservable - sounds good on paper but it is inherently difficult in a practitioner’s world. Why ?
Knowing the above problems, avoiding any distributional assumptions, which is what tick-test does, also might not be a reasonable idea. The authors reason that in a noisy data world, it is likely that Bayes approach would yield a better solution, despite its crude assumptions. This paper is about testing the performance of BVC and tick-test on HF datasets. There are obviously problems in testing classification problems. No one knows the true intention. Hence the authors use three proxies :
What’s the idea behind BVC ?
BVC is a coarser estimate, in the sense that, it works on a time interval or volume interval, computes the standardized price change over the interval and uses “t” distribution to classify the total volume in the time bar or volume bar in to buy volume and sell volume. Why t distribution? The authors reason that it is far more flexible and parsimonious than other distribution functions. BVC procedure splits the volume in a bar equally between buy and sell volume if there is no price change from the beginning to the end of the bar. If the price increases, the volume is weighted more towards buys than sells depending on how large the price change in absolute terms is relative to the distribution of price changes.
How does one test a coarse estimate vs. a fine estimate?
BVC is based on a time interval or volume interval. Tick-test is based on single trade. How does on check the performance of one with the other. The author employ two strategies here:
Using simple distributional assumptions, the authors make the following remarks :
Data:
The authors use E-mini S&P futures, the Gold futures and WTI Crude Oil futures for testing the performance of BVC. The dataset for E-mini S&P500 Futures contract is from November 7th 2010 to November 6th 2011. E-mini S&P futures has 128 million trades. WTI Crude oil futures dataset has ~78 million trades. Gold futures dataset has ~27 million trades. All these trades are characterized by small lot sizes. All these are big datasets and the authors do not shy away from sharing their experiences dealing with such a huge dataset.
In the context of this paper, we will always refer to version 2.19, dated 12/09/11. This level 3 data was purchased directly from the CME, and was delivered as 357 zip files containing 2272 flat files. This represents about 21.6GB of compressed data, and about 220GB uncompressed. We mention these numbers to signal the difficulty of working with this data using standard commercial package
Problems arising out of using aggressor proxy
Test results :
Posted at 09:25 PM in Finance, Statistics | Permalink | Comments (0)
The paper titled,”Spectra of some self-exciting and mutually exciting point processes”, is one of the most widely cited papers in marked point process literature. I guess this was the first paper that explored the complete covariance density function of point processes, and in particular, self exciting and mutually exciting processes. In the time series literature, the covariance of a stationary process at various lags have special meaning. If you consider the generating function of the covariance at various lags and evaluate at a specific complex exponential, you arrive at population spectrum of the time series. The population spectrum thus obtained has several interesting applications.
In this paper, the author starts off with deriving the spectral density of a general point process and then applies this generic form to univariate self-exciting process, multivariate self-exciting and mutually exciting processes. In each of the cases explored, the author writes an equation for covariance density that looks like a renewal equation. By applying Laplace transforms on the obtained equation, the author obtains Laplace transform of the covariance density in terms of the Laplace transform of the propagator function and baseline rate function. For simple exponentials and univariate processes, one can apply inverse Laplace transform to obtain the covariance density. For multivariate process, the math is tedious but the procedure is straightforward. In each of the cases explored, once the covariance density is obtained, it is straightforward to obtain the spectral density of the process.
Posted at 09:07 PM in Math | Permalink | Comments (0)
I had to read this paper again after ~1.5 years as I had forgotten the basic idea behind the classification. My understanding, this time was far better than the previous encounter. In this post, I will list down a few points form the paper
One can think of three ways to classify trades as “buy” or “sell” trades.
What are the findings ?
Based on these findings, the authors propose a trade classification algorithm:
Lee and Ready algorithm :
There are two takeaways from this paper. One is that “tick test” is highly accurate. Second is that the classification based on current quote needs to be looked in to carefully. One might have to consider a delayed quote as the “prevailing quote” in order to classify a trade correctly.
Posted at 10:31 AM in Finance | Permalink | Comments (0)
I have stumbled on to a few mini-projects that revolve around fitting univariate and bivariate Hawkes processes. In this post, I will briefly summarize the write ups :
High Frequency Trade Prediction with Bivariate Hawkes Process
The authors starts with a SDE for intensity process and formulate its solution as a univariate Hawkes process. A visual depiction of self-excited intensity process is obtained via simulation. The time change theorem is stated and a QQ plot of the compensator is shown to follow an exponential inter-arrival distribution. The same thing is repeated for bivariate mutually exciting Hawkes process. Expressions for the log-likelihood of the bivariate Hawkes process is stated and MLE results are shown on a simulated dataset so that the estimates can be compared to the true values. TAQ database is used to obtain tick data for DELL, YHOO and ORCL stock. Since the data is discretized in whole seconds, the timestamps that share the same second are uniformly redistributed in the overlapping second. Using Lee and Ready tick algo, the trades are categorized as buy or sell trades. Bivariate Hawkes process are fit to the buy and sell trades. This model is put to test on a strategy where 1) stocks are longed if buy intensity > 8 times sell intensity, 2) stocks are shorted if the ratio drops below 1/8.
The authors looks at order arrivals for a period of 10 days between March 9, 2009 and March 20, 2009 for QQQ and AAPL. Summary stats of the # of limit order , market orders and cancellations show that % of market orders and cancellations is much more than 10% reported in the previous papers. Density plot of number of orders for different ticks away from the best-bid/best-ask shows that the plot does not follow power law. This result is common across QQQ and AAPL stocks and is common across limit buys , limit sells, cancellations. In a paper by Bouchaud and others, a power-law was found to best describe the order book. However Bouchaud’s paper was written in 2002. In the last decade or so, HFT has taken over the market and HF traders are much more active near the best bid/ best ask rather than any tick away from it. Based on these empirical findings, the authors concentrate on modeling the order arrivals at the bid and ask. They use univariate Hawkes to model the order arrivals. Since the gradient and hessian of univariate Hawkes is well know, the authors use their MLE procedure, randomize the initial values over a range, estimate parameters for a block of 10,000 arrivals each. Standard diagnostic tests such as checking whether the time changed process based on compensator yields a standard Poisson are carried out. They find that the Hawkes process deviates the most form the data when there are few orders in a period of time. The authors come up with a simple HFT strategy based on the order intensity of limit buys and limit sells. The problems of implementing this strategy in a pre-trade setting/lab are also mentioned towards the end of the paper. One of their recommendations for future researchers is to build a regime based Hawkes model,i.e. different models for high and low intensity order arrivals.
Exciting times for Trade Arrivals
The authors start by explaining briefly the math behind discrete and continuous time Hawkes process, following it up with casting the MLE as a non-convex optimization problem. Given that it is non-convex, there is no efficient way of solving it. The nice thing about this project is that the authors cast the difficult non-convex optimization problem to solvable convex-optimization problem by making a few assumptions about the propagator function and calling the it, “Generalized Hawkes process”. Subsequently, the authors use exponential and generalized Hawkes to test a HFT strategy where one goes long or short based on the buy/sell intensity ratio.
Posted at 08:14 PM in Finance, Statistics | Permalink | Comments (0)
The paper, written by Ioane Muni Toke and Fabrizio Pomponio, titled, Modeling Trades-Through in a Limit Order Book Using Hawkes Processes, uses Hawkes process to examine microstructure behavior.
This paper uses Multivariate Hawkes process to model trades-through. The best thing about this paper is that the authors have made the dataset available for the readers so that they can work through the numbers
and get a feel of model inference. The dataset is available at dataverse. I have used the dataset from the repository, crunched numbers and have managed to replicate most of the results in the paper. I hope this feature of “Reproducible Research” becomes more widespread and authors start disseminating their datasets along with their papers. In this blog post, I will summarize the main points of the paper.
Introduction
The authors model trades-through, i.e. transactions that reach at least the second level of limit orders in an order book. Trades-through are very important in price formation and microstructure. Any big size order
is usually chunked and executed and hence trades-through may contain information. The paper has three sections. In the first section, basic summary statistics of the dataset is given. There are 1296707 time stamps in the dataset.
Trades-through Summary Statistics
If you spend sometime watching the order book, then it becomes abundantly clear that trades-through are the ones that stand outside the usual trading pattern. What is a trades-through ? An n^{th} limit trade-through is any trade that consumes at least one share at the n^{th} limit available in the order book. The paper describes the trades-through statistics of BNP Paribas stock for 109 trading days(June 2010 to Oct 2010). The empirical findings leads one to infer the following
Modeling and Calibration
The authors fit a bivariate Hawkes process for the trades-through on the ask side and bid side. There are 4 variants of the model that are tested in the paper :
The parameters for each of the above four models for each trading day is aggregated across 109 days and the major finding is that there is no cross excitation effect.
Goodness-of-fit
The author perform the following two goodness-of- fit tests for each of the trades-through processes(ask and bid) for each day:
The authors conclude that univariate Hawkes process with piecewise-linear function is a better fit to the trades-through on the bid and ask side, than the other models considered in the paper.
The paper models the trades-through for BNP Paribas stock for a period of 109 days. An empirical analysis of self-excitation and cross-excitation motivates the authors to test out multivariate Hawkes model for the trades-through on the bid and ask side. There are four variants of Hawkes model fitted to the data. For each of the four models, for each day, two diagnostic tests are applied to ask trades-through process and bid trades-through process, thus obtaining 4 tests per day per model. These diagnostic tests are aggregated across 109 days. The authors find that, out of the four models, the univariate Hawkes process with a piecewise linear function for base intensity seems to fit the data better.
Posted at 04:48 PM in Finance, Math, Probability, Statistics | Permalink | Comments (1)
The paper titled, “Self-Exciting Point Process Models of Civilian Deaths in Iraq”, deals with fitting point processes to civilian deaths from March 2003 to December 2007. In this post, I will summarize main points from the paper
Firstly, What is “Operation Iraqi Freedom” ? Here’s a wiki blurb
The 2003 invasion of Iraq lasted from 19 March to 1 May 2003 and signaled the start of the conflict that later came to be known as the Iraq War, which was dubbed Operation Iraqi Freedom by the United States. The invasion consisted of 21 days of major combat operations, in which a combined force of troops from the United States, the United Kingdom, Australia and Poland invaded Iraq and deposed the Ba'athist government of Saddam Hussein. The invasion phase consisted primarily of a conventionally fought war which concluded with the capture of the Iraqi capital of Baghdad by American forces.
The devastating war has had dire consequences and over 100,000 Iraqi civilians have died. The authors hypothesize that the temporal events are driven by two components :
The main aspect where the models considered in the paper departs from the usual self-exciting models is : It assumes non-stationary base line rate.Three different non-stationary processes are tested out :
Since there is a nonstationary part to the baseline rate, the authors do not propose a Poisson Null model. It would be like comparing apples to oranges. Instead, the authors use a Null model as one without the self-exciting component.
The authors analyze 15,977 deaths and make the following assumptions in the analysis :
The analysis examine temporal patterns of violent deaths for four different regions in Iraq including Karkh,
Najaf, Mosul, and Fallujah. For each region, the following aspects are reported :
The following are the findings across all regions :
Towards the end of the paper, the authors claim the importance of the results by saying,
Our results also raise the possibility that intervention strategies can be designed to counteract self-excitation in patterns of Iraqi violence. If it is know that a large fraction of events generate daughter events, then it may be possible to strategically detect this mechanism.
For instance, if daughter events are generated out of a desire to replicate recent successes, then recognizing and altering the environmental or situational characteristics the facilitated success in the first place may help to decrease the chance of self-excitation.
Alternatively, if daughter events are driven by cycles of reprisals, then intervening with the impacted parties may decrease the chance of self-excitation. While there may be general strategies that are applicable across both types of self-excitation, such events are inherently situational and will require a situational response.
Posted at 04:16 PM in Math, Probability, Statistics | Permalink | Comments (0)
Via TP (Techcrunch) :
Today, Silicon Valley is the hottest place for quants to be – though people with this skill set are often referred to now as data scientists. A similar confluence of factors — data, technology and algorithms — has combined to enable a new class of transformational opportunities. These opportunities are not limited to just financial services; they are showing up in every sector of the economy.
The volume and variety of data sources has exploded, with companies now regularly directly capturing all manner of user web and mobile traffic, e-commerce and real-world transactions, social profile information, location and even sensor data. In addition, there are vast pools of third-party data available through APIs for everything from advertising and beauty to yellow pages and ZIP codes.
Consequently, tech companies are lusting after all manners of data scientists with the biggest companies (Google, Facebook, Baidu, Microsoft, etc.) already having made early acquisitions in machine/deep learning. However, these acquisitions may largely be used just to fuel their existing businesses in search, social and other applications inside their enterprises.
There are many, many more real-world problems to solve. The hunt is on across both horizontal business functions like sales, marketing, finance and security, as well as vertical industries such as retail, manufacturing, healthcare and even transportation. The Holy Grail here is finding patterns and insights where we didn’t know they existed.
Posted at 03:21 PM in Math | Permalink | Comments (0)
The following note is motivated by the blog post, “Bitcoin Trade Arrival as Self-Exciting Process”. Since the author has shared the data and code, I wanted to check some of the numbers from the post. The author uses “ptproc” which is deprecated library and has been removed from CRAN. In this note I have used the trades dataset from the author’s github directory and fit a self-exciting model to the trade arrivals. My analysis shows that the data does not fit Hawkes process, contradicting the blog post conclusion. In fact when I looked at the code at github, I found the task of adding random milliseconds missing. The sanitized dataset used after randomized addition is double the size of the original trades data. Clearly there is something wrong with the data.
Even though the math in the blog post seems ok, the dataset and the analysis looks flawed. The conclusions are based on visual fit and we know how deceptive our eyes can be, sometimes. I have used Box-Ljung test and KS test and both of the tests reject any presence of self-exciting behavior.
Posted at 03:15 PM in Finance, Probability, Statistics | Permalink | Comments (0)