Data anlaysis with Pandas using Jupyter in VSCode
#
ObjectiveInstall VS Code extension for Python and Jupyter and analyse sales data using Pandas.
#
About Pandas, Jupyter and Anaconda#
PandasPandas is a library written for Python to perform data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series.
#
Jupyter notebookThe Jupyter Notebook is a web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.
#
AnacondaAnaconda is a distribution of Python and R for package management and deployment. The distribution includes data-science packages for easy installation and management.
#
Setup the development environment- Download and install Anaconda Individual Edition
- Download and install Visual Studio Code
- Open the Anaconda Prompt and create a new environment by performing the
following command replacing sales with the name you
want.
conda create --name sales
- To activate the environment you created, perform the following command
replacing sales with the name you used.
conda activate sales
- Once activated, you can notice that your prompt changed from (base) to your environment name. This indicates that your environment is activated.
- To install pandas, perform
conda install pandas
- To install ipykernel, perform
conda activate ipykernel
. The ipykernel aka IPython kernel is the execution backend for the Jupyter and an essential component. - Now, open VSCode and install the Jupyter extension from the
marketplace/extensions. To open extensions, either press Ctrl+Shift+X for
shortcut or click on the extensions icon on the left of VSCode and search
Jupyter. Click on the first result that appears and install.
- Create a new jupyter notebook by either creating a new .ipynb file in your
workspace or by running the following command from the Command Palette
(Ctrl+Shift+P) and save it.
Jupyter: Create New Jupyter Notebook
- Click Select Kernel present on top-right and choose the appropriate
kernel that is of the same name as the conda environment you created.
- Create a file named "sales.csv" in the same location as the jupyter file and
copy the data displayed below inorder to work using pandas. The following
image shows you how the data in the csv file should be present.
- Try printing 'hello world' in python to verify if it runs properly.
#
Code explainationBelow is the entire code for the task mentioned. Let's go through it line-by-line.
- Pandas library is imported.
- read_csv() is used to read .csv files such as the
sales.csv
and is stored in data variable. - The data variable is printed to view the contents of the file.
- The max
sales_amount
is found using max(). The output is compared using the equality operator with every single value in thesales_amount
column to get a list of boolean values to easily retrieve the row that satisfies the condition. - Since we need only item and date, we restrict it to those two columns and print the output.