- Comparison with SQL
- Python Pandas - Caveats & Gotchas
- Python Pandas - Sparse Data
- Python Pandas - IO Tools
- Python Pandas - Visualization
- Python Pandas - Categorical Data
- Python Pandas - Timedelta
- Python Pandas - Date Functionality
- Python Pandas - Concatenation
- Python Pandas - Merging/Joining
- Python Pandas - GroupBy
- Python Pandas - Missing Data
- Python Pandas - Aggregations
- Python Pandas - Window Functions
- Statistical Functions
- Indexing & Selecting Data
- Options & Customization
- Working with Text Data
- Python Pandas - Sorting
- Python Pandas - Iteration
- Python Pandas - Reindexing
- Function Application
- Descriptive Statistics
- Python Pandas - Basic Functionality
- Python Pandas - Panel
- Python Pandas - DataFrame
- Python Pandas - Series
- Introduction to Data Structures
- Python Pandas - Environment Setup
- Python Pandas - Introduction
- Python Pandas - Home
Python Pandas Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python Pandas - IO Tools
The Pandas I/O API is a set of top level reader functions accessed pke pd.read_csv() that generally return a Pandas object.
The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both use the same parsing code to intelpgently convert tabular data into a DataFrame object −
pandas.read_csv(filepath_or_buffer, sep= , , depmiter=None, header= infer , names=None, index_col=None, usecols=None
pandas.read_csv(filepath_or_buffer, sep= , depmiter=None, header= infer , names=None, index_col=None, usecols=None
Here is how the csv file data looks pke −
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
Save this data as temp.csv and conduct operations on it.
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
Save this data as temp.csv and conduct operations on it.
read.csv
read.csv reads data from the csv files and creates a DataFrame object.
import pandas as pd df=pd.read_csv("temp.csv") print df
Its output is as follows −
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000 1 2 Lee 32 HongKong 3000 2 3 Steven 43 Bay Area 8300 3 4 Ram 38 Hyderabad 3900
custom index
This specifies a column in the csv file to customize the index using index_col.
import pandas as pd df=pd.read_csv("temp.csv",index_col=[ S.No ]) print df
Its output is as follows −
S.No Name Age City Salary 1 Tom 28 Toronto 20000 2 Lee 32 HongKong 3000 3 Steven 43 Bay Area 8300 4 Ram 38 Hyderabad 3900
Converters
dtype of the columns can be passed as a dict.
import pandas as pd df = pd.read_csv("temp.csv", dtype={ Salary : np.float64}) print df.dtypes
Its output is as follows −
S.No int64 Name object Age int64 City object Salary float64 dtype: object
By default, the dtype of the Salary column is int, but the result shows it as float because we have exppcitly casted the type.
Thus, the data looks pke float −
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000.0 1 2 Lee 32 HongKong 3000.0 2 3 Steven 43 Bay Area 8300.0 3 4 Ram 38 Hyderabad 3900.0
header_names
Specify the names of the header using the names argument.
import pandas as pd df=pd.read_csv("temp.csv", names=[ a , b , c , d , e ]) print df
Its output is as follows −
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
Observe, the header names are appended with the custom names, but the header in the file has not been epminated. Now, we use the header argument to remove that.
If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.
import pandas as pd df=pd.read_csv("temp.csv",names=[ a , b , c , d , e ],header=0) print df
Its output is as follows −
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
skiprows
skiprows skips the number of rows specified.
import pandas as pd df=pd.read_csv("temp.csv", skiprows=2) print df
Its output is as follows −
2 Lee 32 HongKong 3000 0 3 Steven 43 Bay Area 8300 1 4 Ram 38 Hyderabad 3900Advertisements