pandas

Every time I use pandas I hate it, and I forget the usage right after learning it. Writing this down so I don’t forget.

Pandas

A library for handling tabular data. Said to borrow a lot from R’s system. Performance improved after integrating with numpy.

An object that contains the entire data table. Think of it as a wrapper for all data.
Series within a DataFrame can have different data types.

1
ojb = Series(data=data, index=index)
2
ojb.index # -> index list
3
ojb.values # -> only list of values

An object corresponding to a single column of a DataFrame.
A wrapper around numpy, but differs in indexing.
- Unlike numpy, which only indexes by numbers, you can also index by strings.
Passing a list to data auto-indexes with numbers.
Passing a dict to data auto-indexes according to the dict structure.
The index parameter takes top priority for indexing.

1
pd.read_csv(data, sep='\s+\', header=None)

data: file system or web URL both work
separator: specify the separator
- s: single space
- +: multiple I think I only used about this much. Look up the docs as needed.

Loads only the top n data entries.

A list format; you can set column names.

1
df_data.columns = ['a', 'b']

Returns pandas data in numpy format.

loc supports accessing by column name. iloc lets you access data like numpy. I prefer iloc.