i am a research scientist at microsoft research in new york city, where my work in the area of computational social science involves applications of statistics and machine learning to large-scale social data. i was previously a member of the social dynamics group at yahoo! research. i received my ph.d. from columbia university's physics department where i am an adjunct professor in the applied math department. please see my resume for more project and background information.
this site serves several purposes, from presenting and organizing my current research and teaching efforts to publishing code and tips that i hope others will find useful.
i bookmark lots of references on delicious, occasionally tweet things, post random tidbits on tumblr, and share photos on flickr.
2013.01.20: the course website is up for computational social science (columbia, spring 2013)
2012.05.03: looking forward to starting microsoft research new york city
... 2010.09.28: our paper on predicting consumer activity with web search is released (bbc, ars technica)
2010.05.23: slides and code for my icwsm 2010 tutorial, large-scale social media analysis with hadoop
2009.12.10: our recent paper, "what can search predict?" is posted (blog post, slate article)
2009.10.02: slides for my hadoopworld nyc talk: social network analysis with hadoop
2009.09.11: the course website is up for data-driven modeling (columbia applied math, fall 2009)
2009.05.13: our kdd paper has been covered by mit, slashdot, lifehacker, the chicago tribune, and wired
2009.04.22: our centmail paper was presented and demonstrated at the www 2009 developers track
2008.09.25: we have posted the call for papers for the NIPS 2008 workshop on analyzing graphs
2008.06.24: "a bayesian approach to network modularity" available at physical review letters online 2008.07.04: received "best student presentation award" at mlg 2008 (mining and learning with graphs)
2008.03.18: a scientific american article on my ph.d. advisor, chris wiggins; also a summary from ams
2008.03.14: coverage of analysis of the 2008 aps march meeting co-authorship network from nature blogs
my latest geek tips, also available on twitter, tumblr, or as plain text:
20.08.21.14. git 08.21.14 11:21 git remove sensitive data from a git repo with filter-branch or bfg http://bit.ly/1oh0EBM
20.08.18.14. rstats: normalize histograms with ggplot http://bit.ly/VA4D5z
20.07.30.14. shell: use watch to run and periodically monitor a process (h/t pablo) http://bit.ly/UBnlJE
20.07.29.14. python: use html.fromstring in the lxml package for fast webpage parsing http://bit.ly/1tnvCjk
20.06.24.14. rstats: overlay a histogram with empirical and normal density estimates in ggplot http://bit.ly/1o02FFu