Doing Data Science

The goal of the Doing Data Science course is to provide the students with skills needed to set up, manage and conduct data science projects themselves.

Students acquire knowledge of processes describing how to approach and implement data science projects. They know the steps of CRISP-DM [1] an acronym for Cross-Industry Standard Process for Data Mining, that is a standardized process that describes and codifies common approaches used by data mining experts: it is the most widely used analytical model in the industry.
Moreover, students learn about various cases of how to apply this to different applications (from different areas such as business, humanities, etc..).

This class consists of different parts:

  • Part I: How to Approach Data Science Projects
  • Part II: Showcase Examples for Data Science Projects
  • Tutorial: Introduction into the Data Science Tool KNIME
  •  Group Work: Students work in an interdisciplinary group on a data science project


KNIME Analytics Platform is the open source software for creating data science. Students can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.

Moreover, other showcases are introduced by guest speakers in the environment of the Research Network Data Science.

The project work is organized in four steps: in the first one, the students indicate their preferences on the topics, we try to accommodate these topic requests and assign the students to an interdisciplinary group of 3-4 members. In the second step, the students address the part “Project and Data Understanding” of the CRISP process and have the first meeting with their supervisor. In the third step, the groups work on the “Data Preparation and Modeling” phase and have a second meeting with the supervisors. The last step of the project work consists of the “Data Modeling and Evaluation” part.

There are three exams in total - a written mid term and a written final exam as well as an oral exam in terms of poster pitch videos and discussions on the investigated project topics.

On the day of the oral exam, all the participants watch the poster pitch and take part in a Q&A session.
Afterwords, everyone is invited to join Wonder*, an online gatherings platform, where each group stands in a “corner” and all the participants can move and join a group to ask questions or to provide a feedback.

Students using Wonder for the discussion on the posters.

 

The success of the LV Doing Data Science was evaluated with a questionnaire. In order to find out whether students understood the questions in the way we expected, we conducted a pre-study with few students who filled out the questionnaire and shared the reactions to the questions. Then, we submitted the final questionnaire and evaluated the answers.We learned that the course was organized in an effective way: teaching of basic knowledge, presentation of exemplary projects, group work. The students indicated that they enjoyed working in groups with colleagues from different backgrounds. Moreover, they liked the interaction during classes: recap of past lectures, summary at the end, breakout rooms. An important aspect for the students is that the lecturer underlines how the learned concepts can be used and applied in real life.Lastly, the students find useful to have the course with recorded or prerecorded lectures, Q&A session with the professor, homework or assignments.

* The organization is due to the current COVID-19 regulations

[1] R. Wirth and Jochen Hipp. Crisp-dm: Towards a standard process modelfor data mining.Proceedings of the 4th International Conference on thePractical Applications of Knowledge Discovery and Data Mining, 01 2000.

 




Text Mining for Non-Computer Scientists

Students from humanities, social sciences, linguistics and law are interested in newspaper articles which debate on some particular discourse. There is a huge variety on the topics the students can choose. Migration, poverty, power relations, corruption or financial crisis. How does the language in newspaper articles change before and after drastic events such as the arab spring, 9/11 or the bitcoin hype?

  • Techniques of Text Mining
  • Analysis and organization of corpora
  • Basics of Natural Language Processing
  • Clustering of Text
  • Tools of Visualization of Text Content
  • Understanding Control Structures in Python programming
  • Python Spacy Package


Students will get introduced to lexicometric analysis. This is closely related to the methods of text mining and can be applied to critical discourse analysis. These metrics help the students to extract some over- or underrepresented terms in the newspaper articles. Students will work on a case study by focusing on text mining tools and some macro analysis rather than interpretation.

We will have some special emphasis on KNIME, a free and open-source data analytics, reporting, and integration platform. KNIME Analytics Platform is the open source software for creating data science. Students can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. The students learn to apply common methods of text mining with KNIME and apply this kind of method to custom compiled corpora. With a final introduction to Python we want to show that it can be beneficial to learn a scripting language.

The lecture is held for the first time in the winter term 2021/22. The success of our lecture is evaluated using a questionnaire.