DS406 Text Mining with R

Lecturer:	Dr. Gregor Wiedemann (Universität Hamburg)
Course description:	DS406
Language:	English
Recommended for this semester or higher:	1
ECTS-Credits:	6
Course can be taken as part of following programs/modules:	Data Science in Business and Economics Economics and Finance European Management General Management International Business International Economics Economics Management and Economics
Prerequisites	Good programming skills in R
Course Type:	Lecture (2 weekly lecture hours) block course
Date:	Block Course: Monday, April 6, 2020 from 9 am s.t. - 5 pm Tuesday, April 7, 2020 from 9 am s.t. - 5 pm Wednesday, April 8, 2020 from 9 am s.t. - 4 pm All courses take place in PC Lab, ground floor, Nauklerstr. 47
Registration:	Registration in Ilias required. Registration is open from Monday, March 2, 2020 (originally from March 16) on ILIAS - end of registration time: March 29, 2020 (23:55 pm). Preferred access for students in M.Sc. Data Science in Business and Economics, remaining places are open to students from all programs. In case the number of registrations exceeds the remaining available places, a random selection will be made. Link is announced here.
Downloads:	ILIAS
Method of Assessment:	Assignment Successful participation requires to hand in a written assignment after the end of the course, in which methods taught in the course are applied on a self-scraped web corpus. Data collection and analysis should be conducted in a comprehensive way to answer a (small) self-chosen research question. The assignment may also include a critical reflection about the conducted analysis, and how it might be supplemented with other methods or datasets to answer the research question. Assignment deadline: May, 31, 2020 - Upload in ILIAS - 8 pm s.t. (more details in the first lecture)
Content:	The course teaches an overview of text mining in connection with data acquisition (basics of web scraping), text preprocessing and methodological integration using the statistical programming language R. In sessions alternating between lectures and tutorials, we teach theoretical and methodological foundations, introduce exemplary studies and get hands on programming to realize different analyses. We will cover a range of text mining methods from basic lexicometric measures such as word frequencies, key term extraction and co-occurrence analysis, to more complex machine learning approaches such as topic models.
Objectives:	Students will know how (1) to perform web scraping of textual data from websites for corpus creation, (2) to apply fundamental text preprocessing technieques and how they affect outcomes, (3) to perform basic quantitative text analysis, and (4) perform topic modeling on large text corpora.
Literature:	Grimmer, J. & Stewart, B. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21 (3), 267–297. doi:10.1093/pan/mps028 Ignatow, G. & Mihalcea, R. F. (2017). An Introduction to Text Mining: Research Design, Data Collection, and Analysis: SAGE. Lemke, M. & Wiedemann, G. (Hrsg.). (2016). Text Mining in den Sozialwissenschaften. Grundlagen und Anwendungen zwischen qualitativer und quantitativer Diskursanalyse. Wiesbaden: Springer VS. Welbers, K., van Atteveldt, W. & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures 11 (4), 245–265. doi:10.1080/19312458.2017.1387238