Skip to main content
Overview

pandas

August 19, 2021
1 min read

Every time I use pandas I hate it, and I forget the usage right after learning it. Writing this down so I don’t forget.

Pandas

A library for handling tabular data. Said to borrow a lot from R’s system. Performance improved after integrating with numpy.

DataFrame

  • An object that contains the entire data table. Think of it as a wrapper for all data.
  • Series within a DataFrame can have different data types.

Series

ojb = Series(data=data, index=index)
ojb.index # -> index list
ojb.values # -> only list of values
  • An object corresponding to a single column of a DataFrame.
  • A wrapper around numpy, but differs in indexing.
    • Unlike numpy, which only indexes by numbers, you can also index by strings.
  • Passing a list to data auto-indexes with numbers.
  • Passing a dict to data auto-indexes according to the dict structure.
  • The index parameter takes top priority for indexing.

read_csv()

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

pd.read_csv(data, sep='\s+\', header=None)
  • data: file system or web URL both work
  • separator: specify the separator
    • s: single space
    • +: multiple I think I only used about this much. Look up the docs as needed.

head(n)

Loads only the top n data entries.

columns

A list format; you can set column names.

df_data.columns = ['a', 'b']

values

Returns pandas data in numpy format.

iloc, loc

loc supports accessing by column name. iloc lets you access data like numpy. I prefer iloc.

Loading comments...