🧒
Sunbird ED
new release -7.0.0 (Jun 2024)Askdot
latest
latest
  • Learn
    • Overview
    • Capabilities
      • Learning Apps
      • Asset Sourcing
      • Organised Collections
      • Discover Content - Digital & Phygital
      • User Engagement
      • Rich and diverse content
      • Versatile Question Bank
      • Observability
      • Launch Course
      • Verifiable Credentials
      • Multi-Channel Chatbot
      • Targeted Programs
      • Manage Learn
        • Overview
        • What is an entity?
        • What is a Program?
        • What is a Project?
        • What is Observation?
        • What is a Survey?
        • What is Observation as a task inside a Project?
      • Product and Developer's Guide
        • Learning apps
          • Workflows
            • Onboarding of Users
            • Discovery of Content
            • Play content
            • Track progress and Earn credentials
            • Interacting / Collaborating
        • Asset Sourcing
        • Organised Collections
        • Discover Content - Digital & Phygital
        • User Engagement
        • Rich and Diverse Content
        • Versatile Question Bank
        • Observability
        • Launch Courses
        • Verifiable Credentials
        • Multi-Channel Chatbot
        • Targeted Programs
        • Manage Learn
          • Overview
          • Component Diagram
          • ML Core Service
          • ML Project Service
          • ML Survey Service
          • ML Report Service
          • ML Analytics Service
    • Tech Overview
      • Design Principles
      • Technical Architecture Diagram
      • Tech Stack
    • Adopters
      • DIKSHA
    • Roadmap
      • Plan for 2025-2026
      • Releases and Dates
  • USE
    • Getting Started - Setup
      • Pre-requisites
      • Install
      • Functional Configurations
    • Developer Guide - Overview
      • Architecture - Component Diagram
      • System Requirements
        • Learning Apps
      • Install Locally
        • SunbirdED Mobile
        • SunbirdED Portal
      • Easy Installer
        • Adding Support for a New Cloud Provider
      • Configuration
        • SunbirdEd Portal
        • Sunbird Mobile
      • Portal
        • Component Diagram
        • I18N (Resource Bundles)
        • Branding Name and Logo Configuration Guide
      • Desktop
        • Component Diagram
      • Mobile
        • Component Diagram
        • sunbird-mobile-sdk
        • Sunbird-mobile-app plugins
        • Configurations to setup mobile app
        • I18N (Resource Bundles)
      • Form service
        • Component Diagram
        • Data model
        • API's
      • Manage Learn
        • ML Core Service
          • Overview
          • User Flow Diagram
          • Component Diagram
          • Data Model
          • Folder Structure
          • API's
          • Deployment Overview
          • Local Service Setup Guide
        • ML Project Service
          • Overview
          • User Flow Diagram
          • Component Diagram
          • Data Model
          • Folder Structure
          • API's
          • Deployment Overview
          • Local Service Setup Guide
        • ML Survey Service
          • Overview
          • User Flow Diagram
          • Component Diagram
          • Data Model
          • Folder Structure
          • API's
          • Deployment Overview
          • Local Service Setup Guide
        • ML Report Service
          • Overview
          • User Flow Diagram
          • Component Diagram
          • Data Model
          • Folder Structure
          • API's
          • Deployment Overview
          • Local Service Setup Guide
        • ML Analytics Service
          • Overview
          • Component Diagram
          • Data Model
          • Setup Guide
            • ENV Variables ( Config.ini)
          • Ingestions
          • Folder Structure
          • Report creation and Updation Scripts
          • Deployment Overview
      • UI (User interface)
        • Angular Material
          • Overview
          • Installation
          • Material Icons
          • Components Usage
          • Theming
          • Theme Setup
          • Customazion
            • Palette
            • Theme
            • Accessibility
            • Colors
            • Typography
            • Components
              • Buttons
              • CC Components
        • Component Style Guide Version 1
          • Accordion
          • Buttons
          • Cards
          • Forms
          • Grid & Layout
          • Labels
          • Modals
          • Pagination
          • Rating
          • Search Box
          • Select Box
          • Tables
          • Tabs
          • Toast messages
          • Tooltip
          • Typography
        • SB-Styles: A Comprehensive Design Resource
        • SB-Themes Repository: Unifying Design Across Platforms
          • Classical Theme in Sunbird-Ed portal
          • Joyful Theme in Sunbird-Ed portal:
      • Reference Apps
        • Independent Libraries
          • Common Consumption Components
          • SunbirdEd Forms
          • Sunbird Client Services
          • Sunbird Styles
          • Sunbird Themes
          • Sunbird Tag Manager
      • API's
      • CSP changes
      • Cloud-Store SDK Maven Deployment Guide
      • Other
        • Building Images
        • Minimal forms
        • Telemetry
          • Trackable Collection
        • Platform
        • Learning Apps
    • Learn More
      • Dependencies
      • Specifications
        • SOFIE
        • SOFIE Implementation
  • SB Ed Releases
    • Sunbird v7.6.0 (Latest)
    • Sunbird v7.5.1
    • Release - SB 7.5
    • SB Release - 7.0.0
      • Release notes
      • Updating Sunbird Releases
        • 6.0.1 to 7.0.0
      • Release Calendar 7.0.0
      • Demo of released items
  • Engage
    • Discuss
    • Contribute to Sunbird ED
    • Extend / Contribute to Sunbird
    • Issue tracker/ Create Issue
  • Misc
    • Templates
      • Upgrade Sunbird release document
      • Release Notes
    • Misc Pages
      • Portal - Manage Learn - Reports
      • App - Manage Learn - component diagram
        • Projects
        • Observation & Survey
        • Program
      • Portal - Manage Learn - component diagram
      • Mobile form configurations
      • Content Indexing Flow
      • What are multiple databases used for
      • Course completion, reports and certificate issue
      • How to access Flink UI
      • What are all the flink jobs for
      • How does data flow into Druid
      • Minimal build properties
      • Delete User Functionality
    • Archived
      • High Level Capabilities
      • Workflows
      • Where Sunbird ED helps you
      • ED - Mobile App (going to archive)
        • 4.10.3 to 5.0.0
        • 5.0.0 to 5.1.0
      • Setting up Sunbird ED
      • Developer Documentation
      • API Reference Documentation
      • Detailed Documentation
      • Source Code
      • Actors & Actions
      • Detailed Capabilities
      • Data
      • Terminology
      • 5.1.0-hotfix (OCI )
      • Getting started (v7)
        • Deployment Overview
        • Pre-requisites
        • Install
          • Provision Cloud Infrastructure
          • Setup Jenkins
          • Update Ansible Variables
          • Build, Provision and Deploy
          • Functional Configurations
Powered by GitBook
On this page
  • Batch Ingestion
  • Project ( Daily )
  • Observation ( Daily )
  • Survey ( Daily )
  • To run Batch Scripts in ML Analytics, follow these steps:
  • Real-time Ingestion
  • Observation
  • Survey
  • To run real-time scripts.
  1. USE
  2. Developer Guide - Overview
  3. Manage Learn
  4. ML Analytics Service

Ingestions

There are two types of ingestion

Batch Ingestion

Batch ingestion refers to a data processing technique where data is collected and processed in discrete, predefined groups or batches rather than processed in real-time or as a continuous stream.

Pyspark scripts are scheduled to ingest the data batch-wise to druid data sources.

Batch Scripts in ML Analytics

In Manage Learn batch ingestion, we have 2 types of ingestions.

  1. Delete and Re ingest: Since Druid does not support update operation, we have come up with an approach to delete the entire data source and re-ingest the complete data from Mongo so that it will have the updated data. The data in this data source is used for program dashboard CSV reports.

  2. Append: Apart from the delete and ingest we also have aggregated data sources that appends the new data. In this approach, the entire Mongo is queried and the newly aggregated data is appended to the data source.

Note: In the above-discussed approaches, we have applied a duplication logic to scripts of aggregated data sources. So in case of any failure in scheduled jobs, we will have to run the jobs manually. In such cases, deleting and re-ingest kind of data sources will not have data discrepancies due to duplicate runs because it is deleting the data and ingesting all the data, however, the append kind of ingestion might have data discrepancies if it is a duplicate run. Hence we have used Mongo to log the details and avoid such scenarios.

\

Project ( Daily )

  1. py_gather_program.py - This script gathers all program_id's from MongoDB and stores it into a text file.

  2. pyspark_project_deletion_batch.py - This script deletes all segments in the Druid project data source.

  3. pyspark_project_batch.py - This script takes program ID as input, fetches the data from MongoDB, flattens the data, and puts it into Druid.

  4. pyspark_prj_status.py - This script creates an aggregated data source.

  5. pyspark_prj_status_prglevel.py - This script creates an aggregated data source at the program level, providing a high-level view with all status information.\

Observation ( Daily )

  1. pyspark_observation_status_batch.py - This script fetches the data related to Observation submissions with all statuses from MongoDB, flattens the data, and puts it into Druid.

  2. pyspark_obs_status_batch.py - This script creates an aggregated data source for observation status level information.

  3. pyspark_obs_domain_criteria_batch.py - This script creates an aggregated data source for observation to domain_criteria level information.

  4. pyspark_obs_domain_batch.py - This script creates an aggregated data source for observation to domain level information.\

Survey ( Daily )

  1. pyspark_survey_status.py - This script fetches the data related to survey submission information with status from MongoDB, flattens the data, and puts it into Druid.

  2. pyspark_sur_status.py - This script creates an aggregated data source for survey status-level information.

These jobs will run at daily midnight

To run Batch Scripts in ML Analytics, follow these steps:

  1. Log in to the ML Analytics server using your credentials.

  2. After logging in, switch to the data-pipeline user account using the following command.

sudo su data-pipeline

  1. Once you are logged in as the data-pipeline user, navigate to the directory where the batch scripts are located. In this case, the directory is /opt/sparkjobs/ml-analytics-service/. Use the cd command to change to that directory.

cd /opt/sparkjobs/ml-analytics-service/
  1. Now, execute the run.sh script to run all the batch scripts:

./run.sh

The run.sh script likely contains commands to execute the individual batch scripts one by one, or it may execute a specific batch processing workflow. Running this script will trigger the execution of all the specified batch scripts, and they will perform their respective tasks as described in the documentation.

Real-time Ingestion

Real-time ingestion with Kafka refers to the process of continuously collecting and processing data in real time using Apache Kafka.

Real-time scripts in ML Analytics

Observation

  1. py_observation_streaming.py - This script serves as a Kafka stream processor. It consumes data from a specified Kafka topic containing observations. The script processes the incoming observations, which likely include question-level information, and then publishes the processed data to another Kafka topic.

  2. py_observation_evidence_streaming.py - This script serves as a Kafka stream processor. It consumes data from a specified Kafka topic containing observations. The script processes the incoming observations, extracting evidence-level information, and then publishes the processed data to another Kafka topic dedicated to evidence-level information.

Survey

  1. py_survey_streaming.py - This script serves as a Kafka stream processor. It consumes data from a specified Kafka topic containing a survey. The script processes the incoming survey, and then publishes the processed data to another Kafka topic.

  2. py_survey_evidence_streaming.py - This script serves as a Kafka stream processor. It consumes data from a specified Kafka topic containing survey. The script processes the incoming survey, extracting evidence-level information, and then publishes the processed data to another Kafka topic dedicated to evidence-level information.

To run real-time scripts.

Open a terminal or SSH into the server where the real-time scripts need to be executed.

Start a new Tmux session tmux new -s my_session and open that session run this CMD

python {{ script_name}}.py worker -l info

Save and close the session.

\