Read and analyse a csv file containing sales data for a grocery store.
Copy the below data and write to a file named
orders.csv and read the data
Now, open the terminal and run
spark-shell to launch the interactive REPL for
running Spark programs.
Let's get a sneek peek of the dataset read, using the shows command that takes a parameter that defines the number of rows you want to fetch.
Now this gives weird column names as _c0, _c1 and even considers the header names in the csv as a row. In order to fix this, let's change the read option for this csv.
Based on the given dataset, we will calculate the total orders the store received.