Project Details
Hello There! Below are the project I am working on or completed to gain some expereince with SQL and Python. These projects are mainly focusing around Data Analytics, Data Cleaning, Visualizations and
Umpire Accuracy Exploration
As an avid baseball fan, Go Dodgers!, I wanted to expolore some statistical data for umpires behind the plate. I used data on Kaggle from the UmpScoreCard and explored into the data to see if any recent MLB rule changes have impacted the overall performance of umpire calls. A great way to improve my querying skills, at the same time review some baseball data.
High and Low Accuracy Games
Yearly Metrics
Does Home Field also Impact Umpires (Theory)
Conclusion
Coffee Review Project
As an avid coffee drinker, I was excitied to see a data set on Kaggle with Reviews associate to coffee types and brands. I noticied the data set was scraped from the Coffee Review Site and started reviewing the details. I learned about specific coffee characteristcs and what determins the roast level of coffee. I also read into how each coffee is rated and the scales being used. I thought I knew my fair share, but after this I realized I knew very little about coffee. I wanted to take a look into a topic that I already enjoyed and see what I could add to the data set.
Majority of the time was cleaning the data and updating details to make the data more manageable. Such as nulls, formatting, and missing data. After completing the data clean up, I created a view to start looking into my next batch of coffee to try next! I think I might go with Bird Rock Coffee Roasters
Changes that were made in the data set
- Updated review_date with date. Used to removed the 00:00:00 format in the standard mm/dd/yyyy.
- Standardized NA values, found 'NA', 'Na/', '/', 'NA/NA'. Used NA to declare no value given.
- Added Coffee Type based on url and description. As there were variations as expresso, bottled, whole bean, cold brew selection in the data.
- Updated Null values in coffee characteristics. Scale used in the review were based on 1 to 10 values. Used 0 to indicate no value given.
- Updated Slug column to end of sheet as review_url for easier read of data.
- Created view to use in an excel sheet
Califonia National Park Forecast
The inital question that started this project was, "Can I attempt to forecast how busy a national park might be for next year? I wanted to see if we could forecast visitation to national parks in California to get some what of an idea on how busy each month might be.
I purchased my first America the Beautiful National Park card this year and enjoyed every moment I had with it. I live in San Diego and we have a national moument that is close by, we also
visted Joshua Tree and Yoesemite Twice this year. So the question began, can I see past yearly vists and forecast what we might expect next year and put this in a dashboard.
I started on the NPS Stats Site, which had a lot of infomation avaiable to download. I used a large amount of infomation to start the project. I compiled all data into a more usable excel sheet and started working on a visualization.
Data set that I used and compiled into a more usable format can be can found here Anual Park Rankings & Current Year Monthly and Annual Summary Report.
The dashboard and findings can be found here: Califonia Dashboard. My future plan with this project is to use the data collected to also forecast all National Parks that are avaiable in the US.
I will also be updating the project with more data that becomes avalaibe and checking if the predections were close to actual numbers.
Movie Correlation
For this project I wanted to explore some data visualization with python. We used the Kaggle - Movie Industry Data Set. I wanted to also start and learn pyton and in this project we setup Jupiter Notebooks in vscode, we used a few libaries and started working on the data. This was a great way to get an introduction into the language and see how powerful the tool is for future projects. I hope to start incoporating python to make some simple visual and correlations in the data.
Data Cleaning
In this project we are exploring a housing data set and focused on cleaning the data. The data has some formating and data issues, we attempted to clean the data to make it more manageable to use. We make updates to the table, parse some details like address, and replace null values with alread existing data points with in the set. The project was focused on a guide and a learning opportunity to visually see what to expect when dealing with data cleanup.