top of page
Search
Writer's pictureClement Twumasi

Introduction to Advanced Data Analysis in R programming language (First Lesson/Class)

Prior knowledge of programming and statistics are not required. It's only an introduction to the first class, and we will continue every Saturdays with more advanced tasks. The main goal is to develop the interest of mathematical programming among students from Africa. However, anyone who interested is welcome to join and learn. This class will be a good introduction to data analysis with advanced statistical software like R. Below summarizes the content of this YouTube video (from Saturday's class dated October 17, 2020):


1) A brief introduction into mathematical programming, and the advantages of R to other programming languages (Python, SAS, etc.) and other statistical software like SPSS, STATA, etc. are discussed.


2)Expatiate on how R works, setting up working directories and importing data into R (Note that you can import a data saved as .csv, .txt, .xlsx, SPSS data, STATA data, SAS data, etc). Writing codes as a script, saving scripts and importing previously saved scripts for use another time. You be using R studio for everything; but after the first class, I will rather be using Jupyter notebook to run R whiles you use the R studio. The many importance of Jupyter notebook as an IDE to run R, Python and other programming languages will be explained in our second meeting.


3) How to use in-built functions, and create your own function(s) for any complicated task depending on what you want.


4) We shall analyse our first real data where we will create our own novel function to compute summary statistics of the data (see the Resource section to download the CSV data named as School_data.csv).


5) We will learn how to assign names to categorical variables coded with dummy numbers (eg. 0,1,2 etc). The novel summary statistics function we will be creating has never been created before. First, it must be automated such that for any data you use, it should be able to do the following. i) determine the type of any/all variable(s) in the data (whether categorical/qualitative or numerical/quantitative), ii) If a variable is numerical (like age, BMI, height,etc.), it should compute & return the mean, median, standard deviation, standard error, skewness and kurtosis; iii) else if a variable is categorical (like gender, marital status, etc.) it should return its percentages for all categories/levels of the variable rounded to exactly 1 decimal place. You will be given an assignment/task to add more complexity to this novel summary statistics function we shall create in class to be able to include descriptive plots as well.




12 views0 comments

Recent Posts

See All

List of Publications

Determinants of durable humoral and T cell immunity in myeloma patients following COVID-19 vaccination. European Journal of...

Opmerkingen


bottom of page