Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gotchas of Pandas

Gotchas of Pandas

Slides of Talk "Gotchas of Pandas" in Pycon Israel 2017

prabhant

June 16, 2017
Tweet

More Decks by prabhant

Other Decks in Programming

Transcript

  1. In [4]: s = pd.Series([1, 2, 3, 4, 5, 6],

    index=list('abcdef')) s Out[4]: a 1 b 2 c 3 d 4 e 5 f 6 dtype: int64
  2. In [6]: #3 s2 = s.reindex(['a', 'b', 'c', 'h', 'e',

    'r']) s2 Out[6]: a 1.0 b 2.0 c 3.0 h NaN e 5.0 r NaN dtype: float64
  3. Lack of NA value support in Numpy Lack of NA

    value support in Numpy then why not make it like R
  4. Numpy has way more data types than R Pandas replaces

    all the NA value with NaN which then changes the data type to either float or object
  5. In [13]: Tab=pd.DataFrame(['Promotion dtype for storing NAs','No Change','No Change','Cast t

    o Float64','Cast to object'],index=['Typeclass','floating','object','integer','bool ean']) Tab Out[13]: 0 Typeclass Promo�on dtype for storing NAs floa�ng No Change object No Change integer Cast to Float64 boolean Cast to object
  6. In [14]: df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'],in

    dex=list('abcdef')) df Out[14]: one two three four a -0.491977 -0.148958 0.302686 -0.126440 b -0.481760 0.657904 -0.105482 -1.085279 c 1.815573 -0.588777 -1.677981 -0.981797 d -0.061162 -1.439965 0.146472 -0.955991 e -1.525030 1.510161 0.002977 -1.066108 f -1.620642 -0.827015 -1.296184 0.230389
  7. In [15]: df.ix[['b', 'c', 'e']] Out[15]: one two three four

    b -0.481760 0.657904 -0.105482 -1.085279 c 1.815573 -0.588777 -1.677981 -0.981797 e -1.525030 1.510161 0.002977 -1.066108
  8. In [16]: df.reindex(['b', 'c', 'e']) Out[16]: one two three four

    b -0.481760 0.657904 -0.105482 -1.085279 c 1.815573 -0.588777 -1.677981 -0.981797 e -1.525030 1.510161 0.002977 -1.066108
  9. In [17]: df.ix[[1, 2, 4]] Out[17]: one two three four

    b -0.481760 0.657904 -0.105482 -1.085279 c 1.815573 -0.588777 -1.677981 -0.981797 e -1.525030 1.510161 0.002977 -1.066108
  10. In [18]: df.reindex([1, 2, 4]) Out[18]: one two three four

    1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 4 NaN NaN NaN NaN
  11. Reindex being strictly label based indexing can't perform this Reindex

    being strictly label based indexing can't perform this function function
  12. In [19]: series = pd.Series([1, 2, 3, 4, 5, 6])

    series Out[19]: 0 1 1 2 2 3 3 4 4 5 5 6 dtype: int64
  13. the dtype changes to python object because reindex_like silently inserts

    the dtype changes to python object because reindex_like silently inserts NaNs and the dtype changes accordingly. NaNs and the dtype changes accordingly.
  14. Some other tips Some other tips Avoid mul�threading while using

    pandas as it may delete some frames or dataframe.copy might skip some frames while handdling very large scale data sets Avoid using with BS4.
  15. links links ### ### ### ### h�p:/ /pandas.pydata.org/pandas-docs/stable/gotchas.html (h�p:/ /pandas.pydata.org/pandas-docs/stable/gotchas.html)

    h�p:/ /docs.python-guide.org/en/latest/wri�ng/gotchas/#late-binding- closures (h�p:/ /docs.python-guide.org/en/latest/wri�ng/gotchas/#late- binding-closures) h�ps:/ /gist.github.com/manojpandey /41b90cba1fd62095e247d1b2448ef85b (h�ps:/ /gist.github.com /manojpandey/41b90cba1fd62095e247d1b2448ef85b) h�p:/ /pandas.pydata.org/pandas-docs/version/0.19.2/gotchas.html (h�p:/ /pandas.pydata.org/pandas-docs/version/0.19.2/gotchas.html)