An introduction to Python for Data Science

Language: English

Timeline

Session I: 9.00–12.00

Break

Session II: 13.00-16.00

Description

On day one we will build a working knowledge for performing simple data processing of tabular data and generating visualization of data using the programming language Python. The course requires no knowledge of the Python programming language, but a basic programming proficiency is required (your have programmed before). We will first cover basic programming in the Python language and how to work with the Jupyter Notebook tool. This basic part will then be extended with data processing and visualization based in a dedicated data analytics tools named Pandas. The day is organized with lectures and small exercises to be solved individually or in small groups.

Method

The day is based on participatory live coding where the instructor will type in the code with the participants following along. We have good experience with this teaching form. There will also be plenty of small 5 min. exercises.

Content

  1. (1/2 h) Why? workflow? Setting the scene - packages/eco-system - sharing - visibility - literate programming - reproducibility
  2. (2 h) Getting started with Jupyter Notebook
  3. Running cells, data types
  4. Arithmetic
  5. Control (if)
  6. For-loops (and list comprehension)
  7. Functions (+ doc-strings)
  8. Objects and methods
  9. Shift-tab and help
  10. Shortcuts
  11. Comments (Markdown) and Latex-ify
  12. Cell execution order and state (pitfalls of arbitrary execution order)
  13. Dictionaries and JSON (if time permits)
  14. (2 1/4 h) Pandas
  15. Intro and key data structures
  16. Iterating - lists and dictionaries
  17. Extracting columns/rows - indexing and selecting
  18. Handling missing data
  19. Simple statistics (mean, count, median, min, max, std, corr)
  20. File handling (CSV, STATA, SAS)
  21. Concatenate, join and merge
  22. Split-apply-combine methodologies (groupby)
  23. (1 h) Visualization with seaborn
  24. Bar plots and histograms
  25. Basic terminology
  26. Scatter plots
  27. (1/4 h) Outlook, final questions, wrap-up etc.

Literature:

  • Python primer

Python Data Science Handbook, Jake VanderPlas

A Primer on Scientific Programming with Python, Hans Petter Langtangen

  • Notebooks and data carpentry

Inspiration: Interesting Jupyter Notebooks

Inspiration: JupyterCon

Data Capentry: Data Analysis and Visualization in Python for Ecologists

  • Data Viz

Fundamentals of Data Visualization, Wilke

Edward R. Tufte, The Visual Display of Quantitive Information, Graphics Press, 1983

The Principle pf Propotional Ink, Carl Bergstrom and Jevin West,

An Admin’s Guide to Data Visualization, Caskey L. Dickson

  • Seaborn

seaborn: statistical data visualization

Data visualization with Seaborn