Data Science with Oracle Miner and R Enterprise

Posted By: naag

Data Science with Oracle Miner and R Enterprise
English | 2023 | ASIN: None | 592 pages | Epub | 10.31 MB

Data science involves a wide array of technologies and statistical algorithms. This makes it difficult to automate each and every aspect of it. However, there are some areas in data science that can be automated using scripts and workflows.
At a high level, we can classify data science automation into the following categories:
Repetitive tasks automation: Repetitive tasks are those that have to be done every time while building models. Data extraction, data cleaning, and basic data
transformations such as imputing null values and algorithm-specific transformations are some tasks that fall into this category. These are to be done even for the same set of data every time. Automating these tasks would take some of the burdens off of a data scientist so they can concentrate more on solving business problems.
Automated statistician: This is an area of data science automation where statistical routines and machine learning are automated. The system executes the best algorithm based on the provided data set. It hides the intricacies and mathematical complexity of algorithms from the user. The user needs to provide automated statisticians with data. It understands the data, creates different mathematical models, and returns the result based on the model that best explains the data. It is still at a nascent stage and an active area of research.
Problem-specific automation: This involves automating a data science process based on the problem at hand. This eases a user of carrying out the same activity for a specific problem several times. For example, the data scientist of an organization develops a model for predicting future sales of an organization. Based on an organization’s requirement, this activity has to be carried out at different time frames, say monthly or quarterly. Each time, a wide array of tasks have to be performed, such as extracting the data, preprocessing it, creating statistical models, and returning the results back to the business operation’s database. If this whole process is automated, it drives agile business decisions, better customer focus, and operationally efficient and managed resources.