Course code: PYTH3« Back

Jazyk Python III – Data Analysis (Pandas)

This course is intended for all those who are looking for flexible data analysis tool, those interested in data processing in the Python programming language that it plans to use to manipulate, analyze and visualize data, respectively. Deployment of Data Science.

 DateDurationCourse priceHandbook priceCourse languageLocation 
GTK 9/23/2019 5 26 500 CZK included in course price Český jazyk GOPAS Praha_GTT
 
11/25/2019 5 26 500 CZK included in course price Český jazyk GOPAS Praha_GTT
 
GTK 9/23/2019 5 26 500 CZK included in course price Český jazyk GOPAS Brno_GTT
 
GTK 11/25/2019 5 26 500 CZK included in course price Český jazyk GOPAS Brno_GTT
 
11/25/2019 5 925,00 EUR included in course price Slovenský jazyk GOPAS Bratislava_GTT
 

AffiliateDurationCatalogue priceHandbook priceITB
Praha5 26 500 CZK included in course price 50
Brno5 26 500 CZK included in course price 50
Bratislava5 925,00 EUR included in course price 50

What we teach you:

  • Participants will learn to use the library Pandas and other supporting libraries that are required for working with data analysis and visualization. Training leads participants examples of real-world data sets and real-world projects in the field of data processing. The examples given and procedures are of course applicable proLinux / UNIX, Windows and OS X.

Who the course is for:

  • This course is intended for all those who are looking for flexible data analysis tool, those interested in data processing in the Python programming language that it plans to use to manipulate, analyze and visualize data, respectively. Deployment of Data Science.

Required skills:

  • Knowledge of Python on the level of the course PYTH1

Teaching methods:

  • Professional explanation with practical samples and examples.

Teaching materials:

  • Powerpoint handouts and module printouts.

Course syllabus:

A Tour of pandas

  • Pandas and why it is important
  • Pandas and IPython Notebooks
  • Referencing pandas in the application
  • Primary pandas objects
  • The pandas Series object
  • The pandas DataFrame object
  • Loading data from files and the Web
  • Loading CSV data from files
  • Loading data from the Web
  • Simplicity of visualization of pandas data

Installing pandas

  • Getting Anaconda
  • Installing Anaconda
  • Installing Anaconda on Linux
  • Installing Anaconda on Mac OS X
  • Installing Anaconda on Windows
  • Ensuring pandas is up to date
  • Running a small pandas sample in IPython
  • Starting the IPython Notebook server
  • Installing and running IPython Notebooks
  • Using Wakari for pandas

NumPy for pandas

  • Installing and importing NumPy
  • Benefits and characteristics of NumPy arrays
  • Creating NumPy arrays and performing basic array operations
  • Selecting array elements
  • Logical operations on arrays
  • Slicing arrays
  • Reshaping arrays
  • Combining arrays
  • Splitting arrays
  • Useful numerical methods of NumPy arrays

The pandas Series Object

  • The Series object
  • Importing pandas
  • Creating Series
  • Size, shape, uniqueness, and counts of values
  • Peeking at data with heads, tails, and take
  • Looking up values in Series
  • Alignment via index labels
  • Arithmetic operations
  • The special case of Not-A-Number (NaN)
  • Boolean selection
  • Reindexing a Series
  • Modifying a Series in-place
  • Slicing a Series
  • Chapter 5: The pandas DataFrame Object
  • Creating DataFrame from scratch
  • Example data
  • S&P 500
  • Monthly stock historical prices
  • Selecting columns of a DataFrame
  • Selecting rows and values of a DataFrame using the index
  • Slicing using the [] operator
  • Selecting rows by index label and location: .loc[] and .iloc[]
  • Selecting rows by index label and/or location: .ix[]
  • Scalar lookup by label or location using .at[] and .iat[]
  • Selecting rows of a DataFrame by Boolean selection
  • Modifying the structure and content of DataFrame
  • Renaming columns
  • Adding and inserting columns
  • Replacing the contents of a column
  • Deleting columns in a DataFrame
  • Adding rows to a DataFrame
  • Appending rows with .append()
  • Concatenating DataFrame objects with pd.concat()
  • Adding rows (and columns) via setting with enlargement
  • Removing rows from a DataFrame
  • Removing rows using .drop()
  • Removing rows using Boolean selection
  • Removing rows using a slice
  • Changing scalar values in a DataFrame
  • Arithmetic on a DataFrame
  • Resetting and reindexing
  • Hierarchical indexing
  • Summarized data and descriptive statistics

Accessing Data

  • Setting up the IPython notebook
  • CSV and Text/Tabular format
  • The sample CSV data set
  • Reading a CSV file into a DataFrame
  • Specifying the index column when reading a CSV file
  • Data type inference and specification
  • Specifying column names
  • Specifying specific columns to load
  • Saving DataFrame to a CSV file
  • General field-delimited data
  • Handling noise rows in field-delimited data
  • Reading and writing data in an Excel format
  • Reading and writing JSON files
  • Reading HTML data from the Web
  • Reading and writing HDF5 format files
  • Accessing data on the web and in the cloud
  • Reading and writing from/to SQL databases
  • Reading data from remote data services
  • Reading stock data from Yahoo! and Google Finance
  • Retrieving data from Yahoo! Finance Options
  • Reading economic data from the Federal Reserve Bank of St. Louis
  • Accessing Kenneth French's data
  • Reading from the World Bank

Tidying Up Your Data

  • What is tidying your data?
  • Setting up the IPython notebook
  • Working with missing data
  • Determining NaN values in Series and DataFrame objects
  • Selecting out or dropping missing data
  • How pandas handles NaN values in mathematical operations
  • Filling in missing data
  • Forward and backward filling of missing values
  • Filling using index labels
  • Interpolation of missing values
  • Handling duplicate data
  • Transforming Data
  • Mapping
  • Replacing values
  • Applying functions to transform data

Combining and Reshaping Data

  • Setting up the IPython notebook
  • Concatenating data
  • Merging and joining data
  • An overview of merges
  • Specifying the join semantics of a merge operation
  • Pivoting
  • Stacking and unstacking
  • Stacking using nonhierarchical indexes
  • Unstacking using hierarchical indexes
  • Melting
  • Performance benefits of stacked data

Grouping and Aggregating Data

  • Setting up the IPython notebook
  • The split, apply, and combine (SAC) pattern
  • Split
  • Data for the examples
  • Grouping by a single column's values
  • Accessing the results of grouping
  • Grouping using index levels
  • Apply
  • Applying aggregation functions to groups
  • The transformation of group data
  • An overview of transformation
  • Practical examples of transformation
  • Filtering groups
  • Discretization and Binning

Time-series Data

  • Setting up the IPython notebook
  • Representation of dates, time, and intervals
  • The datetime, day, and time objects
  • Timestamp objects
  • Timedelta
  • Introducing time-series data
  • DatetimeIndex
  • Creating time-series data with specific frequencies
  • Calculating new dates using offsets
  • Date offsets
  • Anchored offsets
  • Representing durations of time using Period objects
  • The Period object
  • PeriodIndex
  • Handling holidays using calendars
  • Normalizing timestamps using time zones
  • Manipulating time-series data
  • Shifting and lagging
  • Frequency conversion
  • Up and down resampling
  • Time-series moving-window operations

Visualization

  • Setting up the IPython notebook
  • Plotting basics with pandas
  • Creating time-series charts with .plot()
  • Adorning and styling your time-series plot
  • Adding a title and changing axes labels
  • Specifying the legend content and position
  • Specifying line colors, styles, thickness, and markers
  • Specifying tick mark locations and tick labels
  • Formatting axes tick date labels using formatters
  • Common plots used in statistical analyses
  • Bar plots
  • Histograms
  • Box and whisker charts
  • Area plots
  • Scatter plots
  • Density plot
  • The scatter plot matrix
  • Heatmaps
  • Multiple plots in a single chart

Applications to Finance

  • Setting up the IPython notebook
  • Obtaining and organizing stock data from Yahoo!
  • Plotting time-series prices
  • Plotting volume-series data
  • Calculating the simple daily percentage change
  • Calculating simple daily cumulative returns
  • Resampling data from daily to monthly returns
  • Analyzing distribution of returns
  • Performing a moving-average calculation
  • The comparison of average daily returns across stocks
  • The correlation of stocks based on the daily percentage
  • Change of the closing price
  • Volatility calculation
  • Determining risk relative to expected returns
Tištěné prezentace probírané látky

Price:
included in course price
The prices are without VAT.