Walter's DConf 2014 Talks - Topics in Finance

Mon Dec 22 19:07:09 PST 2014

Hi.

Sorry if this is a bit long, but perhaps it may be interesting to 
one or two.

On Monday, 22 December 2014 at 22:00:36 UTC, Daniel Davidson 
wrote:
> On Monday, 22 December 2014 at 19:25:51 UTC, aldanor wrote:
>> On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson 
>> wrote:
>> I don't see D attempting to tackle that at this point.
>>> If the bulk of the work for the "data sciences" piece is the 
>>> maths, which I believe it is, then the attraction of D as a 
>>> "data sciences" platform is muted. If the bulk of the work is 
>>> preprocessing data to get to an all numbers world, then in 
>>> that space D might shine.
>> That is one of my points exactly -- the "bulk of the work", as 
>> you put it, is quite often the data processing/preprocessing 
>> pipeline (all the way from raw data parsing, aggregation, 
>> validation and storage to data retrieval, feature extraction, 
>> and then serialization, various persistency models, etc).
>
> I don't know about low frequency which is why I asked about 
> Winton. Some of this is true in HFT but it is tough to break 
> that pipeline that exists in C++. Take live trading vs 
> backtesting: you require all that data processing before 
> getting to the math of it to be as low latency as possible for 
> live trading which is why you use C++ in the first place. To 
> break into that pipeline with another language like D to add 
> value, say for backtesting, is risky not just because the 
> duplication of development cost but also the risk of live not 
> matching backtesting.
>
> Maybe you have some ideas in mind where D would help that data 
> processing pipeline, so some specifics might help?

I have been working as a PM for quantish buy side places since 
98, after starting in a quant trading role on sell side in 96, 
with my first research summer job in 93.  Over time I have become 
less quant and more discretionary, so I am less in touch with the 
techniques the cool kids are using when it doesn't relate to what 
I do.  But more generally there is a kind of silo mentality where 
in a big firm people in different groups don't know much about 
what the guy sitting at the next bank of desks might be doing, 
and even within groups the free flow of ideas might be a lot less 
than you might think
   Against that, firms with a pure research orientation may be a 
touch different, which just goes hex again to say that from the 
outside it may be difficult to make useful generalisations.

A friend of mine who wrote certain parts of the networking stack 
in linux is interviewing with HFT firms now, so I may have a 
better idea about whether D might be of interest.  He has heard 
of D but suggests Java instead.  (As a general option, not for 
HFT).  Even smart people can fail to appreciate beauty ;)

I think its public that GS use a python like language internally, 
JPM do use python for what you would expect, and so do AHL (one 
of the largest lower freq quant firms).  More generally, in every 
field, but especially in finance, it seems like the data 
processing aspect is going to be key - not just a necessary evil. 
  Yes, once you have it up and running you can tick it off, but it 
is going to be some years before you start to tick off items 
faster than they appear.  Look at what Bridgewater are doing with 
gauging real time economic activity (and look at Google Flu 
prediction if one starts to get too giddy - it worked and then 
didn't).

There is a spectrum of different qualities of data.   What is 
most objective is not necessarily what is most interesting.  Yet 
work on affect, media, and sentiment analysis is in its very 
early stages.  One can do much better than just affect bad, buy 
stocks once they stop going down...  Someone that asked me to 
help with something are close to Twitter, and I have heard the 
number of firms and rough breakdown by sector taking their full 
feed.  It is shockingly small in the financial services field, 
and that's probably in part just that it takes people time to 
figure out something new.

Ravenpack do interesting work from the point of view of a 
practitioner, and I heard a talk by their former technical 
architect, and he really seemed to know his stuff.  Not sure what 
they use as a platform.

I can't see why the choice of language will affect your back 
testing results (except that it is painful to write good 
algorithms in a klunky language and risk of bugs higher - but 
that isn't what you meant).

Anyway, back to D and finance.  I think this mental image people 
have of back testing as being the originating driver of research 
may be mistaken.  Its funny but sometimes it seems the moment you 
take a scientist out of his lab and put him on a trading floor he 
wants to know if such and such beats transaction costs.  But what 
you are trying to do is understand certain dynamics, and one 
needs to understand that markets are non linear and have highly 
unstable parameters.  So one must be careful about just jumping 
to a back test.  (And then of course, questions of risk 
management and transaction costs really matter also).

To a certain extent one must recognise that the asset management 
business has a funny nature. (This does not apply to many HFT 
firms that manage partners money),   It doesn't take an army to 
make a lot of money with good people because of the intrinsic 
intellectual leverage of the business.  But to do that one needs 
capital, and investors expect to see something tangible for the 
fees if you are managing size.  Warren Buffett gets away with 
having a tiny organisation because he is Buffett, but that may be 
harder for a quant firm.  So since intelligent enough people are 
cheap, and investors want you to hire people, it can be tempting 
to hire that army after all and set them to work on projects that 
certainly cover their costs but really may not be big 
determinants of variations in investment outcomes.  Ie one 
shouldn't mistake the number of projects for what is truly 
important.

I agree that it is setting up and keeping everything in 
production running smoothly that creates a challenge.  So it's 
not just a question of doing a few studies in R.  And the more 
ways of looking at the world, the harder  you have to think about 
how to combine them.  Spreadsheets don't cut the mustard anymore 
- they haven't for years, yet it emerged even recently with the 
JPM whale that lack of integrity in the spreadsheet worsened 
communication problems between departments (risk especially).   
Maybe pypy and numpy will pick up all of  slack, but I am not so 
sure.

In spreadsheet world (where one is a user, not a pro), one never 
finishes and says finally I am done building sheets. One question 
leads to another in the face of an unfolding and generative 
reality.  It's the same with quant tools for trading.  Perhaps 
that means value to tooling suited to rapid iteration and 
building of robust code that won't need later to be totally 
rewritten from scratch later.

At one very big US hf I worked with, the tools were initially 
written in Perl (some years back).  They weren't pretty, but they 
worked, and were fast and robust enough.  I has many new features 
I needed for my trading strategy.  But the owner - who liked to 
read about ideas on the internet - came to the conclusion that 
Perl was not institutional quality and that we should therefore 
cease new development and rewrite everything in C++.  Two years 
later a new guy took over the larger group, and one way or the 
other everyone left.  I never got my new tools, and that 
certainly didn't help on the investment front.  After he left a 
year after that they scrapped the entire code base and bought 
Murex as nobody could understand what they had.

If we had had D then, its possible the outcome might have been 
different.

So in any case, hard to generalise, and better to pick a few 
sympathetic people that see in D a possible solution to their 
pain, and use patterns will emerge organically out of that.  I am 
happy to help where I can, and that is somewhat my own 
perspective - maybe D can help me solve my pain of tools not up 
to scratch because good investment tool design requires 
investment and technology skills to be combined in one person 
whereas each of these two are rare found on their own.  (D makes 
a vast project closer to brave than foolhardy),

It would certainly be nice to have matrices, but I also don't 
think it would be right to say D is dead in water here because it 
is so far behind.  It also seems like the cost of writing such a 
library is v small vs possible benefit.

One final thought.  It's very hard to hire good young people.  We 
had 1500 cvs for one job with very impressive backgrounds - 
French grande ecoles, and the like.  But ask a chap how he would 
sort a list of books without a library, and results were 
shocking,  seems like looking amongst D programmers is a nice 
heuristic, although perhaps the pool is too small for now.  Not 
hiring now, but was thinking about for future.