Analyzing data with Python using NumPy and Pandas

Objectives

  • Analyzing data is critical for many people even if their core job is something else. In this course you'll get the tools to take data provided in some well known format, e.g. Excel files, CSV files etc. and analyze them.
  • In the recent years Python and the SciPy libraries became the de-facto standard for data analysis, both for its price, the available breadth of solutions, and the general usability of the language. Python has long passed both Matlab and R in its popularity among practitioners.
  • In this course we'll learn about two of the most basic libraries: NumPy and Pandas, and several other, related libraries that will help you answer some of the questions about your data.

Audience

  • People with lots of data who would like to make sense of that data and make decisions based on the results.

Course Format

  • Duration of the course is 32-40 academic hours. (Usually 4-5 full days or twice as many half-days).
  • The course includes approximately 40% hands on lab work.

Prerequisites

  • Basic background in Python: Understanding basic data types: int, float, string, list, dictionary. Being able to use if-statement, loops, and functions. Being able to install and use 3rd party libraries.
  • Taking the Python Beginner course and having some practice beyond the exercises will cover the prerequisites.

Syllabus

Development environments

  • VS Code vs PyCharm vs Jupyter notebook
  • Jupyter lab and Jupyter notebook

NumPy

  • NumPy arrays - vectors and matrices
  • Data types
  • Operations on arrays without writing loops
  • Working with external data (Excel, CSV, images)
  • Selecting data with boolean indexing
  • Sorting, searching, filtering and retrieving data

Pandas

  • Series, an extension on NumPy arrays
  • DataFrame representing a table of data

Working with Pandas

  • Loading and saving data (Excel, CSV)
  • Understanding the basics about the data
  • Filtering data by rows and columns
  • Working with strings
  • Filtering data while loading it to allow to handle huge data-files.

Indexes

  • Indexing and multi-level indexing
  • Stacking and unstacking, and melting.
  • Pivot tables

Aggregate functionality

  • Grouping
  • Sorting
  • Joining
  • Combining data frames

Other

  • Categorizing data
  • Working with date/time data
  • Visualization of data with Pandas and other tools
  • Pandas and reducing memory usage

Let's talk

If you would like to bring this course to your organization, let's talk about it! You can reach me via email at gabor@szabgab.com or you can go ahead and schedule a chat:

Contact me