An introduction to Python and Data Science

Monday (Nov. 25th 2019)

Time Module Instructor
9:00-12:00 An introduction to Python and Data Science pt. 1 Tobias L. Jensen & Thomas Arildsen
12:00-13:00 LUNCH
13:00-16:00 An introduction to Python and Data Science pt. 2 Tobias L. Jensen & Thomas Arildsen

Description

On day one we will build a working knowledge for performing simple data processing of tabular data and generating visualization of data using the programming language Python. The course requires no knowledge of the Python programming language, but a basic programming proficiency is required (your have programmed before). We will first cover basic programming in the Python language and how to work with the Jupyter Notebook tool. This basic part will then be extended with data processing and visualization based in a dedicated data analytics tools named Pandas. The day is organized with lectures and small exercises to be solved individually or in small groups.

Method

The day is based on participatory live coding where the instructor will type in the code with the participants following along. We have good experience with this teaching form. There will also be plenty of small 5 min. exercises.

Material

Material and files are available from here

Content

  1. (12 h) Why? workflow? Setting the scene - packages/eco-system - sharing - visibility - literate programming - reproducibility
  2. Getting started with Jupyter Notebook
    1. (1 12 h) Python and Jupyter basics
      1. Running cells, data types
      2. Arithmetic
      3. Control (if)
      4. For-loops (and list comprehension)
      5. Functions (+ doc-strings)
      6. Objects and methods
      7. Shift-tab and help
      8. Shortcuts
      9. Comments (Markdown) and Latex-ify
      10. Cell execution order and state (pitfalls of arbitrary execution order)
  3. (2 h) Pandas
    1. Intro and key data structures
    2. Iterating - lists and dictionaries
    3. Parsing CSV files
    4. Populating and writing to a file (CSV, STATA, SAS)
    5. Extracting columns/rows - indexing and selecting
    6. Handling missing data
    7. Simple statistics (mean, count, median, min, max, std, corr)
    8. String manipulations
    9. Regex
    10. Split-apply-combine methodologies (if time permits)
  4. (1 34 h) Visualization with seaborn

    1. Bar plots and histograms
    2. Basic terminology
    3. Scatter plots
    4. Box plots
    5. Working with JSON and XML
  5. (14 h) Outlook, final questions, wrap-up etc.

Literature:

  • Python primer

Python Data Science Handbook, Jake VanderPlas

A Primer on Scientific Programming with Python, Hans Petter Langtangen

  • Notebooks and data carpentry

Inspiration: Interesting Jupyter Notebooks

Inspiration: JupyterCon

Data Capentry: Data Analysis and Visualization in Python for Ecologists

  • Data Viz

Fundamentals of Data Visualization, Wilke

Edward R. Tufte, The Visual Display of Quantitive Information, Graphics Press, 1983

The Principle pf Propotional Ink, Carl Bergstrom and Jevin West,

An Admin’s Guide to Data Visualization, Caskey L. Dickson

  • Seaborn

seaborn: statistical data visualization

Data visualization with Seaborn