Hands-on data analysis is Datawhale's open source project on the direction of data analysis. This project began in Datawhale's previous data analysis course, when I was a student who read the book - python for data analysis as the teaching material. The book for pandas and numpy operation is very clear and detailed, but for the logic of data analysis, there is much less content. So many learners and I found after learning, do not know what they have to do, when we meet data analysis problems. The idea of "I don't know how to use it" is actually very understandable, after learning the more theoretical things, there will be a small gap between the practical application in life and what we learned from the theory. How to bridge this gap may require your own experimentation and study of real-world materials.
So if there is a course, it is a project-based line, the knowledge points bred in it, through the side of learning, while doing and being guided to make learning better. After learning the course, we can master pandas and can master the general experience of data analysis process. Through research, it seems that there are no projects on the market about data analysis that can fully meet the above criteria. So Datawhale's partners joined together to make an open source course to accomplish the small goals mentioned above, so that all the learners who have used our course can better start their data analysis journey.
Now the course has been updated to version 1.3, we have improved the learning process, as well as providing better answers to explain. Later on, we will gradually launch the supporting materials. We still want to start from the basic data analysis operation and data analysis process, and introduce real-world examples in each module. After that, we will continue to add new content (such as data mining algorithms and so on). This is an open source project, we will keep iterating, and we will all participate and work together.
About the name of our project - hands-on data analysis . Data analysis is a process to see the truth from a bunch of numbers.Learning to manipulate data is only part of the skill of data analysis, the other half is the experience inside the brain. So we need to think more and summarize more in the learning process, and more hands-on, realistic code. So I also hope that when you learn this course, you will reason more and ask more why; practice more and make sure that the theory and practice are combined. At the end of the course, you will definitely have a big harvest.
Since this is a course born out of Datawhale, it is better to learn it with other resources that Datawhale provides. The code we provide is in the form of a jupyter, which contains the tasks you have to complete, as well as the hints and guidance we give you, so this format combined with Datawhale's group learning, you can discuss with everyone and add information together, then the learning effect will definitely be doubled. Also, Datawhale previously open-sourced a pandas tutorial - Joyful-Pandas. It composes the logic of Pandas as well as the code demonstration, so in our data analysis course, about the operation of Pandas, you can refer to Joyful-Pandas, which can make your data analysis learning more rewarding.
The course is now divided into three units, which can be roughly divided into: Basic Data Operations, Data Cleaning and Reconstruction, and Modeling and Evaluation:
- Part I: We get a data to be analyzed, I have to learn how to load the data, view the data, then learn some basic operations of Pandas, and finally start to try exploratory data analysis.
- Part 2: After we can be more proficient in manipulating the data and recognizing the data, we need to start data cleaning and reconstruction to turn the original data into a usable data, in preparation for putting it into the model later.
- Part 3: We have to consider what model to build depending on the task requirements, and we use the popular sklearn library to build the model. For a model to be good or bad, we are required to evaluate it, after that we evaluate our model and do optimization of the model.
Chapter | Summary |
---|---|
Chapter 1 | Data loading and preliminary observations |
Pandas basics explained | |
Exploratory Data Analysis | |
Chapter 2 | Data cleaning and feature processing |
Data Reconstruction 1 | |
Data Reconstruction 2 | |
Data Visualization | |
Chapter 3 | Data Modeling |
Model Evaluation |
Our codes are in jupyter form, and each part of the course is divided into two parts Course and Answers. During the learning period, in the course code, finish all the learning, find the information by yourself, finish the code operation inside by yourself, think about the part and the insights. After that, you can discuss with your buddies and share the information and insights. About the answer part, you can refer to, because the data analysis itself is open, so the answer is also open, more hope that you can have their own understanding and answers. If you need a reference, we provide the answers we wrote in the Answers section, so you can refer to them.
(课程部分-需要自己根据要求敲代码)