Some literature and websites useful for understanding statistics and analysis of data ...
Daniel Kunin's superb Seeing Theory - a very visual and comprehensive introduction to probability theory (distributions, central limit theorem, expectation and variance) and statistics (including linear models, point estimates, confidence intervals and hypothesis testing)
UCLA's Institute for Digital Research and Education: Comprehensive, practical and concrete advice on performing almost all kinds of analysis in popular stats software (R, STATA, SPSS)
Andrew Gelman's website
Frank Harrell's blog and website - also, the book (with James Slaughter) "Biostatistics for Biomedical Research"
Centre for Evidence Based Medicine recommended reading list including critical appraisal and systematic reviews
Cochrane Handbook for Systematic Reviews and the Cochrane Library
Kass, R.E. et al (2016) Ten Simple Rules for Effective Statistical Practice, PLoS Computational Biology, June 9th. Rules 1,2 and 4 are particularly relevant. Rules 8 and 10 more so now than ever.
When a linear relationship between response (dependent) and predictor (independent) variables don't hold : Harrell, Lee and Pollock (1988) recommend splines or piecewise polynomials - simple to build and use with standard software.
Avoid multiple comparisons by multi-level modelling/analysis of your data (from Andrew Gelman, Jennifer Hill and Masanao Yajima)
The TRIPOD consensus papers parts one and two: How to report analyses and results from predictive models (i.e. machine learning algorithms as well as more traditional models in predictive applications).
Assessing predictive model performance : Steyerberg et al Assessing the performance of prediction models: a framework for some traditional and novel measures covers binary and survival outcomes.
Assessment of predictive discrimination and calibration for prognostic models (Harell et al, 1996) available here
On the topic of statistical power, sample sizes, and the promise of "Big Data" by Rexplorations
Frank Harrell explains why using ordinal (not binary) outcomes can increase power in analysis.
Gelman explains why you need "16 times the sample size to estimate an interaction than to estimate a main effect"
Danielle Navarro blog/autobiography of model selection methods
Critique of leave-one-out for Bayesian model selection by Dan Simpson