Statistics is the science of learning from experience.

We provide consulting to help researchers prepare their experimental design and before they collect data.

Never ask a statistician to analyse the data you already have.

To quote R.A. Fisher, most often it will be a *post-mortem* diagnostic.

It is always good practice to meet a ** biostatistician** when designing your experiments and
be ready to discuss the following:

What is your primary research question ? What is the nature of your expected outcome (continuous, categorical, binary, count variables or survival/censored variables or ranked values) ? What should be your quantitative measure of success ? What are the sources of

**? What do you expect the outcome to be depending on ? What is your covariates (predictors) list ? Are your covariates possibly correlated ? Pay attention to confounding and possible multicolinearity issues. Do you have a statistical model to fit ? How good is the fit of the model to the data ? Are you looking for outliers ? To determine a**

*variability***sample size**,

*n*, you will need to know the variability (σ) of your outcome, fix the

**effect size**(δ) you want to detect, set a value for the

**and require a minimal**

*risk of false positive results (type I error, α)***of your setup (i.e.**

*power***). Maybe is it advisable to carry out a pilot test before you proceed further with the full study ?**

*the probability to detect an effect if there is truly an effect = 1-β*Can you trust a panel of raters to monitor the quality of a food or beverage manufactured product ? How do you assess the consistency of a panel of raters or the objectivity among the jury's members ? This is where ranked outcome and

**statistical methods come into play. Permutation and**

*non-parametric***methods could be helpful to get the empirical distribution of your outcome variable, at least under particular assumptions. You might as well require**

*bootstrap***to test the performances of your statistical analytical toolbox.**

*simulated datasets*Will you suffer the ** curse of dimensionality** with your big data ?
If the number of variables is much larger than the the number of experimental subjects, you certainly will.
Should you filter out and prune some possible irrelevant variables ?
There are

**or**

*unsupervised***techniques which could be useful to help you get better insights in your big data :**

*supervised machine learning***(Least Absolute Shrinkage and Selection Operator), just to mention a few.**

*classification and regression tree*,*hierarchical clustering*,*nearest neighbours (kNN)*,*principal components analysis (PCA)*,*support vector machine (SVM)*,*random forest*(RF),*Lasso*We illustrate hereafter in a few selected examples some of the above issues and how they are dealt with:

to select the best possible chemical additive to increase a product shelf life (download this report here);*Logistic regression with generalized estimating equation (GEE)*to build a classifier for a lung cancer metabolomic signature in patients blood samples (download this report here);*Support Vector Machine (SVM)*and data mining methods in breast cancer diagnostics (download this report here).*Unsupervised and supervised machine learning*

Should you consider ** Bayesian methods** instead of the frequentist approach? How reliable is the prior expert knowledge?

Two examples are given below providing a flavour of the Bayesian approach to statistical analysis.

The full power of Bayesian methods has emerged in the electronic computing era over the last three decades.

We present hereafter, in a very intuitive way, the main sampling algorithms useful in Bayesian advanced analysis to evaluate the posterior probability density (when problems are not amenable to closed form analytical solutions), and know as:

- MCMC: Markov Chain Monte Carlo among which belong the two following:
- Metropolis algorithm
- Gibbs sampler