Using large-scale data and quantitative methods to understand how people use language and why languages look the way they do