AI with Pandas
Pandas revolutionalized the data analysis industry when it was introduced in 2008. To this day, it is actively used to do exploratory data analysis in python. What if we could do more with this, maybe add a tinge of AI and see what miracles unfold…
Imagine being able to talk to your data like it’s your best friend. That’s what Pandas AI does! It can intelligently detect patterns, outliers, and missing values, analyse complex data and provide you with easy to understand summaries.
Imagine your stakeholder asks you for a particular metric and its derivation. For this you will need to figure out the required dataset first, then prepare a complex query to get the metric of your choice and then present it to your stakeholder. Too much effort I say, worry not Pandas AI is here to rescue.
Setting up is easy, just install using pip and start using.
PS : you need api key for the llm backend you wish to use(openai, Starcoder, google palm etc)
With above you can actually ask questions using prompts and get the answers without coding. Also you can ask the llm to plot graphs as well
This library also shines for data science tasks such as clean_data
, impute_missing_values
, generate_features
, plot_histogram
and so on.
# Clean data
pandas_ai.clean_data(df)
# Impute missing values
pandas_ai.impute_missing_values(df)
# Generate features
pandas_ai.generate_features(df)
# Plot histogram
pandas_ai.plot_histogram(df, column="gdp")
Although this sounds super cool but as with all new technologies, it also has a few caveats.
1. It doesn’t reproduce the same answer on every run
2. With custom datasets, it doesn’t always return you with the correct answer
3. Openai llm worked pretty good but I wasn’t able to generate good results with Hugging face Starcoder.
Overall I would say its a pretty slick library to work with. Its not production ready but damn with these new technologies coming up, its an exciting time to be in the data and ml space.
So does ai work or is it all hype?? I’ll let the below image answer that
Ref notebook: https://colab.research.google.com/drive/1tPw6cvslZ6QNnj3BNotzmB2k0DEhYKAa?authuser=0#scrollTo=XRBYANVdG51M