Data Analytics using Microsoft Excel: Sales Analysis
Hello! This is my first portfolio as a data analyst using Excel tools. I hope you like it and can provide feedback in the comments section.
So in this post, i will analyze the dataset from Kaggle. The dataset contains sales data from three branches over the last three months and comprises a total of 1000 rows. I would like to assess how the dataset is performing based on certain Key Performance Indicators.
Here are some things i want to do with this data:
- Data preparation, to determine if the data is clean and to identify columns that need cleaning, such as spaces, capitalization, and typos.
- Perform exploratory data analysis to identify patterns in the data
- Conduct data analysis using pivot tables.
- Analyze top-selling products based on the city.
- Then, visualize the data using charts and others.
DATA PREPARATION
This dataset is a collection of sales data from 3 branches in 2019. There are various columns in this dataset such as invoice id, branch, city, customer type, gender, product line, unit price, quantity, tax, total, date, time, payment, cogs (cost of goods sold), gross margin percentage, gross income, and rating. Now, let’s investigate this dataset further below.
The data presented appears to be quite clean, as Kaggle data is typically ready-to-use. Since the data I have obtained doesn’t require additional cleaning, I will proceed to explore it using statistical methods.
STATISTIC DESCRIPTIVE
The objective of this analysis is to provide insights into central tendencies, such as mean, median, and mode, as well as measures of data variability, including range, variance, and dispersion. Additionally, it aims to assess the shape of the data distribution, considering factors like skewness and kurtosis. With this information, we can improve our understanding of the data distribution, identify any directional skewness tendencies, and pinpoint outliers or anomalies that may exist in the dataset.
Based on the analysis above, we can conclude the following:
- Unit Price
— The average price per unit is around $55.67 with a standard error of about $0.83.
— The median price per unit is $55.23 and the mode is $83.77.
— The standard deviation is around $26.49, which indicates data are more spread out.
—The variance of price per unit is about $701.96.
— The range of price per unit is about $89.
— The distribution of price per unit is approximately symmetric because the skewness is 0.007.
— Price per unit has negative kurtosis which is a light-tailed distribution or platykurtic.
— The total unit price is about $55,672.13. - Quantity
— The average quantity is around 5.51 with a standard error of 0.09.
— The median quantity is 5 and the mode is 10.
— The standard deviation quantity is around 2.92, which means data are clustered around the mean.
— The sample variance of quantity is about 8.54.
— The range of quantity is about 9, with a minimum value is 1 and a maximum value is 10.
— The quantity has an approximately symmetric distribution because the skewness is 0.01.
— The total quantity is 5,510. - Total sales
— The average total sales is around $322.96, with a standard error of 7.77.
— The median of total sales is about $253.84 and the mode is about $829.08.
— The standard deviation of total sales is $245.88, which indicates data are more spread out.
— The variance of total sales is around $60,459.59, providing a deeper insight into the data’s variation.
— The range of total sales is around $1,031.97, with a minimum value of $10.67 and a maximum value of $1,042.65.
— The distribution of total sales is slightly skewed to the right, indicated as positive skewness.
— Total sales have negative kurtosis which is a light-tailed distribution or platykurtic.
— The total sales from all data is approximately $322,966.74.
This descriptive analysis provides numerous insights and enables the identification of data characteristics, particularly in the fields of unit price, quantity, and total sales. This information can assist in understanding the data thoroughly and making data-driven decisions. The analysis of the dataset was done using a pivot table.
TOP PRODUCT SALES
The results of this analysis provide an overview of total product sales in this dataset. The highest sales are in “Food and Beverages” products, totaling around $56,000.00, and the lowest is in “Health and Beauty” products, with a total of approximately $49,000.00.
Next, explore the sales by branch. In the Mandalay branch, the highest sales are in “Health & Beauty” and “Sports & Travel”, totaling around $20,000.00, while the lowest is in “Food and Beverages” with a total of around $15,000.00.
In Naypyitaw, “Food and Beverages” has the highest sales, reaching around $24,000.00, and the lowest sales are in the “Home and Lifestyle” category with a total of about $14,000.00.
At Yangon, the highest sales are in “Home and Lifestyle” with a total of approximately $23,000.00, while the lowest is in the “Health and Beauty” category with a total of around $13,000.00.
The result of this analysis shows that the highest count of products is in “Home and Lifestyle” at the Yangon branch and in “Food and Beverages” at Naypyitaw, totaling around 65–66. Meanwhile, the lowest counts are in the products “Home & Lifestyle” and “Sports & Travel” at Naypyitaw, with a total of about 45.
TOTAL SALES ALL PRODUCT
The analysis result indicates that the highest sales are in the Naypyitaw branch, totaling around $111,000.0. Meanwhile, both the Mandalay and Yangon branches have nearly the same total sales, approximately $106,000.0 each.
ANALYZE BY MONTH
The result of this analysis provides an overview of the total sales count by month in each branch. The highest one is in the Yangon branch in March with a total count of around 127. In the Naypyitaw branch, the highest sales count was in January with a total of around 122. Meanwhile, in the Mandalay, both January and March have nearly the same count, approximately 111 and 112, respectively.
The results of this analysis show the sales pattern per month over three months in each branch. In the Naypyitaw branch, the highest sales occurred in January, totaling around $41,000.00, while the lowest was in February with a total of about $33,000.00.
At the Mandalay branch, the peak sales were also in January, totaling around $37,000.00. Meanwhile, both February and March had nearly the same total sales, approximately $34,000.00.
In the Yangon branch, the highest sales were also in January, reaching around $39,000.00. Meanwhile, the lowest sales occurred in February at the Yangon branch, with total sales of approximately $30,000.00.
Overall, these analysis results provide an overview of sales over the last three months. The sales pattern appears stable, and the differences are not significant.
CUSTOMER TYPE
The result of this analysis indicates that more female customers join as members compared to male customers, with approximately 261 female customers and 240 male customers. Meanwhile, for normal customers, there are 259 male customers and 240 female customers.
The result of this analysis indicates that 35% of customers use e-wallets as their payment method, 34% of customers use cash payment, and the lowest percentage, at 31%, is for customers using credit cards.
In conclusion, these are the findings from my analysis. I apologize for any errors in the preceding data analysis. I welcome your insights and suggestions to further enhance my skills as a data analyst.
Thank you!!