I spent the past two years writing an algorithmic trading system that ran in my bedroom while I left the house, worked my day job, and lived my life. The idea took several months to derive, implement, and perfect while overcoming more obstacles that I could have imagined. By far, the largest problem I have encountered is finding good, free stock data.
Stock Tick Data
The smallest increment of free data I could find is on a Norwegian website that can be read in English by searching “NetFonds” in Google and selecting “Translate this page.” Along with currency and commodity data, NetFonds also has tick data for all NASDAQ, NYSE, and AMEX stocks. Getting the data for free is easy, but takes some effort to understand.
To start, enter the following URL into your browser
http://hopey.netfonds.no/tradedump.php?date=[date]&paper=[stock]&csv_format=txt
Here, there are two parameters that you need to alter.
[Date] – Should be replace with a date in the form YYYYMMDD, so for example 20130919 would be the data obtained from Thursday, September 19, 2013. In my experience, data goes back around 15 days, but I can’t guarantee this for every stock. Generally, I take and store yesterday’s data today.
[Stock] – This is where you replace the name of the ticker to collect. The catch is that you must know the exchange code.
NYSE code is ‘N’ — for example, to collect Macy’s, [stock] = M.N
NASDAQ code is ‘O’ — for example, to collect Google, [stock] = GOOG.O
AMEX code is ‘A’ — for example, you get the picture
The data displayed has a time, price, and quantity in .txt format. Everything looks self explanatory, expect for the time column. I elaborate by example of the first entry I see.
time = 20130919T153000
Translated as 2013, 09 (sept), 19 (day), Time, 15:30:00
which seems weird, but remember, you are collecting data from a Norwegian website and Oslo is six hours ahead of New York City time. Considering military time format, 15:30:00 is really 3:30 in Norway, which is 9:30 in EST and the market open. Notice that under this logic, the last data point during open market hours is represented by the string
time = 20130919T220000
By far, the largest problem I have encountered is finding good free data
You will also notice that some rows have identical time stamps. This should be interpreted chronologically with the logic that the price is changing several times per second. Recall how price changes.
Finally, I want to note that all times outside 15:30 and 22:00 are after hours transactions. You can always see after market activity on Google finance. Try searching for Apple, and check “Extended Hours” under the settings link under the given chart. Grey prices are transactions that occurred after hours.
Order Book Tick Data
The best free tick order book data I could find displays only the best bid and ask quotes for a given time. Nevertheless, there are endless ways this information could be used to improve a system.
Again on NetFonds, try pasting the following URL into your browser:
http://hopey.netfonds.no/posdump.php?date=[date]&paper=[stock]&csv_format=txt
with the same date and stock convention used above. Notice you have a few extra columns corresponding to volume and best bid/ask in the market.
For this data set, you will see that extended hours quotes extend far more than in tick data, though the spread widens considerably. Extended hours trading is considered risky due to this lack of liquidity, but this is a topic of it’s own.
Google Finance Data
Data can be found here, and follows very similar conventions to NetFonds though data comes in every minute. The URL is
http://www.google.com/finance/getprices?i=[PERIOD]&p=[DAYS]d&f=d,o,h,l,c,v&df=cpct&q=[TICKER]
[PERIOD] – Time interval in seconds
[DAYS] – Historical data period. For example [DAYS] = 10d asks for the last ten days
[TICKER] – The stock symbol. No codes necessary, so AAPL works just fine
Yahoo Finance Data
Similar to Google Finance and Yahoo, the general URL is given by
http://chartapi.finance.yahoo.com/instrument/1.0/[TICKER]/chartdata;type=quote;range=1d/csv
Frequency is seconds, and historical range available is 5 days.
Obtaining the Data
Programming languages have an age old trade off. If you want a fast language, you have to sacrifice in learning non-trivial languages and concepts. If you want a code that downloads the above data sets, and you want it to work tomorrow, you have to sacrifice in using a slower language.
For me, Mathematica and Python were extremely intuitive to use on day one, and both have built in functions to browse and download data. I also learned to use Apple Script on my mac with very little effort. This was nice because I could program my computer to wake up in the morning, go to a website, and download the latest data.
The speed trade off from not using a language like C++ was assumed away for me. Unless you pay top dollar, you have to assume that the data you are downloading is somewhat perturbed and there is nothing you can do about it.
larry says
nice article, I would like to add http://www.quandl.com/ What I am always looking for and never was able to source are intraday futures prices, even 5 min would be great.
Shaun Overton says
That’s brilliant. Thanks for sharing the link! I think quandl is my new best friend.