Universitat Internacional de Catalunya

MÓDULO 2: Lenguajes de Programación para el Data Scientist

MÓDULO 2: Lenguajes de Programación para el Data Scientist
5
13944
1
First semester
OB
Main language of instruction: Catalan

Other languages of instruction: English, Spanish,

Teaching staff


Faculty:

Josep Arrufat (SQL) jarrufat@uic.es

Albert Climent (Python) albert.climent@pervasive-tech.com

David Roche (R) droche@uic.es

Introduction

In the event that the health authorities announce a new period of confinement due to the evolution of the health crisis caused by COVID-19, the teaching staff will promptly communicate how this may effect the teaching methodologies and activities as well as the assessment.


The primary tools for a data scientist are essentially based on the ability to program with different languages and at different levels. In addition, currently, with the process of change and transformation that companies are undergoing, knowledge of the main languages for data science is considered a "skill" in itself. This subject presents the main programming languages necessary to complete any master's degree in data science, R and Python, along with learning the SQL database language.

Pre-course requirements

Basic computer skills and being able to read and understand English

Objectives

The objectives of this subject are composed of the learning and knowledge of the different programming languages for the data scientist. Python, R and SQL.

 For each of them the objectives are:

1. Understand the application of different languages 

2. Know how to select the appropriate language for different situations

3. Know the use and practical application of the various languages

4. Know how to create code to solve simple and complex problems from the various languages contemplated.

Competences/Learning outcomes of the degree programme

- Search of data (institutions and libraries). Database access, selective Internet browsing.   - Approach the student to the use of computer tools to include graphic samples.   - Establish criteria for the adoption of reasoned decisions.   - Recognize and solve problems in the field of professional performance.   - Analyze the variables that intervene in the management of the knowledge area of the program.   - Recognize and solve problems related to the management of the knowledge area of the program.   - Reflect on the forms of communication necessary for good management.    - Manage bibliographic and documentary resources.

 

Learning outcomes of the subject

The results of the students' learning are the capacities that they will have obtained after taking and passing this subject:   1. Be able to understand the application of different languages 2. Be able to select the appropriate language for different situations 3. Be able to use and apply in a practical way the different programming language of the subject 4. Know how to create code to solve simple and complex problems from the various languages contemplated.

Syllabus

1. R language 1.1 Introduction to the R language 1.2. Variables and basic aspects of R 1.3. Loops and flow control in R 1.4. Code structure and functions 1.5. Visualization with R   2. Python language 2.1. Introduction to the Python language 2.2. Introduction to Docker and Git 2.3. Python basics 2.4. The work environment: Notebooks 2.5. Working with data: Pandas 2.6. Python case study   3. The SQL language 3.1. Theoretical Foundations of SQL 3.2. Work environment and the PostgreSQL database management system 3.3 First steps with SQL 3.4. Practical Advanced Aspects with SQL 3.5. Theoretical foundations of databases (Relational Algebra)

Teaching and learning activities

In person



The learning technique of this subject is “learning by doing” so practical cases will be applied to each theoretical concept that is developed in the different sessions and for the different languages. The objective is always to bring the student closer to the reality of their profession where they will have to apply the theoretical-practical knowledge learned throughout the course. Most sessions are structured as follows:   1. Presentation of the theoretical summary by the teaching staff 2. Example application by teachers 3. Presentation of problems and solution by the students 4. Joint problem solving 5. Simulated case study or with real data

6. Practical work to do at home with the intention of assimilating the concepts learned in the session

Evaluation systems and criteria

In person



The evaluation of this subject will be obtained with the equitable weighting of all the deliveries made throughout the course. The final mark is the mark of the continuous evaluation.

Bibliography and resources

- R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics. J D Long y Paul Teetor. 2019

- SQL Cookbook: Query Solutions and Techniques for All SQL Users. Anthony Molinaro. 2020

- An Introduction to Statistical Learning: with Applications in R. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. Springer Publishing Company, Incorporated.

-The Python Language Reference, https://docs.python.org/3/reference/