Universitat Internacional de Catalunya

MÓDULO 5: Tecnologías y Arquitecturas Big Data

MÓDULO 5: Tecnologías y Arquitecturas Big Data
5
13948
1
Second semester
OB
Main language of instruction: Catalan

Other languages of instruction: English, Spanish,

Teaching staff

Introduction

In the event that the health authorities announce a new period of confinement due to the evolution of the health crisis caused by COVID-19, the teaching staff will promptly communicate how this may effect the teaching methodologies and activities as well as the assessment.


In the Advanced Analytics and Big Data Science program, underlying digital technology plays a primary role, complementary to the knowledge that students are expected to acquire, centered on estimator modeling. It is not the purpose of providing in-depth knowledge about technological aspects, but rather to provide students with the necessary sufficiency to lead, with solvency, the aspects used in the technological adoptions that are lavished. Technologies such as the "cloud", "edge computing", GPUs, distributed processing and storage are intrinsic to Big Data, which tries to provide a technological solution to the field of Advanced Analytics.
For this reason, the module aims to provide the foundations and to settle in the students to understand the implications with which they will have to work in their future performances.

Pre-course requirements

Essential basic computer skills

Objectives

  • Understand the digital technologies involved in Advanced Analytics
  • Understand the premises that technologies entail
  • Associate the different phases of a project with technological infrastructure solutions
  • Have the knowledge to build automated pipelines
  • Assess the cost of technological resources

Learning outcomes of the subject

  • The student will be able to understand and be able to apply the underlying technologies for the practice of Advanced Analytics 
  • The student will be able understand the implication of technology in the deployment of predictive models elaborated in the laboratory and production environment. 
  • The student will be able to associate business problems with an architecture solution based on the type of data, the models to be used, the availability of new information and the inference requirements.

Syllabus

Arquitectura BIg Data y Cloud,
- introducción al Big Data y Cloud
- Datacenters
- Agile Analytics y Cloud
- Fases de la metodología Analítica
- 2020 Data and AI Landscape
Bases de datos (SQL, NoSQL, Documentales, clave-valor y Graph), teoría, prácticas y casos de aplicación
- NoSoloSQL
- MongoDB
- Noo4j
- Prácticas con lab de python y MongoDB
Recursos Cloud (Servidores, Microservicios, Colas, Bases de datos, ML, Gráficos y otros servicios), teoría, prácticas y casos de aplicación
- Introducción a los servicios cloud
- Servidores virtualizados
- Concepto de microservicios
- Colas
- Bases de datos en Cloud
- Almacenamiento y Data Lakes
- Prácticas con labs de storage, bases de datos, microservicios y colas
Procesamiento distribuido (Hadoop y Spark) herramientas open source y cloud , teoría, prácticas y casos de aplicación
- Map Reduce
- Hadoop
- Spark
- Prácticas con labs de Hadoop y Spark con python
Procesamiento batch, tiempo real y stream, teoría, prácticas y casos de aplicación
- Tipos de procesamiento: tiempo real, batch y stream
- Spark Streaming
- Prácticas con labs de Spark Streaming
Herramientas para ML, teoría, prácticas y casos de aplicación
- Spark MLlib
- Prácticas de ML y AutoML en Cloud

Teaching and learning activities

In person



  • Presentation with concepts and theory
  • For each topic, labs, tutorials, individual self-learning practices will be carried out, experimenting with the technology in question, with the support of the student community and the teacher.
  • A dozen cases of real application will be proposed where the search for a technological solution for architecture will be worked together, through group analysis of specific customer cases for a participatory resolution of the students

Evaluation systems and criteria

In person



  • Resolution of an architecture for a specific customer case
  • Individual Labs: A dozen of Labs will be proposed, self-learning, some compulsory and others optional, but highly recommended, combining architecture with other knowledge acquired during the master

Bibliography and resources

Several readings of papers and articles related to the different points discussed will be proposed, combining them with other topics of the master.

- G. Linden, B. Smith and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," in IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan.-Feb. 2003, doi: 10.1109/MIC.2003.1167344.
- Overview of Amazon Web Services, AWS, August 2020
- J Dean, S Ghemawat , MapReduce: simplified data processing on large clusters, Communications of the ACM, 2008
- Matt Turck, 2020 Data and AI Landscape, FirstMark
- Liu, Guimei & Nguyen, Tam & Zhao, Gang & Zha, Wei & Yang, Jianbo & Cao, Jianneng & Wu, Min & Zhao, Peilin & Chen, Wei. (2016). Repeat Buyer Prediction for E-Commerce. 155-164. 10.1145/2939672.2939674.