Data Science


I am a de facto Data Scientist because:

During my PhD (2007-2014) and later as a postdoc at CERN (2014-2021), I developed and deployed several analytical tools that are critical to the success of the ATLAS experiment, including the discovery of the Higgs boson in 2012. These include data collection, data selection, Monte Carlo simulation, machine learning algorithms, statistical analysis, data visualization, etc. In addition to delivering advanced analytical solutions in a cutting-edge research environment, I have also successfully managed technical projects. Leading groups of researchers that provide service to hundreds of researchers, I have published several results in peer-reviewed journals. Even though the tools I developed and the projects I managed were carried out in the context of particle physics, the underlying programming, statistical, and management skills are quite general and broadly applicable.

At Washington College, I teach PHY252/MAT252, which is a focused introduction to programming for scientists and engineers. The course is heavily project-based, and the topics include Python programming, numerical calculation, algorithm development, dynamics simulation, fit to data, statistical tests, machine learning, and data visualization. Students also build models and carry out the validation of models based on experimental data and fundamental principles from physics. Data from specials experiments such as muon lifetime measurement, TimePix particle detector, e/m ratio, as well as public data from open source are analyzed.

Title Data Files Jupyter Notebook
1 ATLAS_OpenData_13-TeV_python_full_HyyAnalysis_5min run_ttbar_allhad run_WWbb_allhad Get Jupyter Notebook
2 ATLAS_OpenData_13-TeV_python_simple_two_samples_comparison run_ttbar_allhad run_WWbb_allhad Get Jupyter Notebook
3 ATLAS_OpenData_13-TeV_simple_python_example_histogram run_ttbar_allhad run_WWbb_allhad Get Jupyter Notebook