100 Questions that will help you Smash your Data Analysis and Data Science Job Interviews

100 Questions that will help you Smash your Data Analysis and Data Science Job Interviews

·

14 min read

Here are compiled 100 list of some common data Analysis and data science questions with dynamic answers to help guide you in your data professional career quest and knowledge acquisition. No doubt that Data Analysis and Data Science skills are among the skills that are high in demand and as the demands for these skills increases year after year the is also the need for people to be equipped with the right skills and knowledge for professional job interviews and other scrutinisation processes to fill those roles.

These questions were compiled to facilitate Data Scientists Network (DSN), Port Harcourt community members in leveraging their data professional skills, targeted for fresher's and experienced candidates to improve their knowledge and also to prepare them in getting their dream jobs.

The 100 questions are structured as Objectives with answers in bold font to help readers grasp answers to the questions. Readers can also copy out these questions and restructure them for their personal practices:

  1. Which of the following is characteristic of Raw Data? A) Data is ready for analysis B) Original version of data C) Easy to use for data analysis D) None of the mentioned

  2. Which of the following is not a step in data analysis? A) Obtain the data B) Clean the data C) EDA D) None of the mentioned

  3. Which of the following programming language is used for Data analysis? A) Tkinter B**) R C**) HTML D) None of the mentioned

  4. The main goal of data science is _____? A) to raise AI talents B) to solve mathematical problems C) To get Insight D) to build machines

  5. Data Science is equal to Coding ? True or False

  6. Which of the following skill is not needed for Data Science A) Coding B) Statistics C) Domain knowledge D) None of the mentioned

  7. In the field of Data Science we can say that presenting is not the same as Exploring? True or False

  8. Which of the following does not signify Web Data ? A) Html B) JSON C) XLSX D) XML

  9. Which of the following is not Visualization tool for data analysis ? A) Tableau B**) YPlot* C) power BI D)* None of the mentioned

  10. Which of this is not a major challenge of data science and analysis are A) Insufficient Data B) Poor tools C) Poor quality of data D) Irrelevant features

  11. One of this is not a Major programming language used in Data science A) R B) Python C) MS Excel D) SQL

  12. In machine learning confusion matrix has two dimensions ? A) Actual and Mixed B) Actual and predicted C) precise and specific D) Prediction and confusion

  13. Deep Learning involves ? A) System that involves label datasets B) Using artificial natural networks C) Using Sklearn for Machine Learning D) Using artificial neural networks

  14. The following are applications of Supervised machine learning in modern businesses except : A) Sentiment Analysis B) Healthcare Diagnosis C) Patterns detection D) Fraud detection

  15. Amazon is able to recommend products to their customers based on ? A) The customer needs quality references B) because of the association algorithm which identify patterns. C) it has a big platform that meets all customers needs D) Numerous customers visit Amazon to buy and get to inform others about their services.

  16. The following are regression instances except ? A) When variables are continuous in nature B) To estimate the sale of a product C) To estimate the amount of rainfall D) To estimate the gender of a person

  17. The algorithm that operates by Constructing multiple decision trees during training phase is ? A) Support Vector Machine B) Decision Tree C) Random Forest D) Clustering

  18. The following algorithm can be used for categorical output except ? A) Random Forest B) K Means Clustering C) KNN D) Naive Bayes

  19. Given a database of customer data, you are ask to automatically discover market segments and group customers into different market segments. What approach of Machine Learning type will you consider? A) Classification B) Unsupervised Learning algorithm C) Test Validation D) Regression approach

  20. Which of this Algorithm can be used for Clustering problems : A) PCA B) KNN C) Random Forest D) Decision Tree

  21. You are running a company and you want to develop learning algorithms to address the of a software to examine individual customer accounts and to decide if it has been hacked and compromised. How will you treat this problem? A) Treat as a Classification problem B) Treat as both Classification and Regression C) Treat as a Regression problem D) Treat as a unsupervised problem

  22. An important machine learning where an agent learns how to behave in an environment by performing actions and seeing the result is referred to as ? A) Deep Learning B) Supervised Learning C) Reinforcement learning D) Unsupervised learning

  23. You want to predict how many of these items will sell over the next 3 months, what kind of problem is it? A) Classification and Continuous problem B) Selection problem C) Decision tree problem D) Regression problem

  24. Given a dataset of patients diagnosed as having diabetes or not What kind of Machine Learning algorithm can we use to develop a model for this? A) principal component analysis B) Linear Regression C) Logistic Regression D) Confusion Matrix

  25. Suppose you are to build a program to filter your emails through given answers by marking them as Spam or not Spam, what is the task in this setting ? A) This is not a Machine Learning problem B) Fitting the model answers to an algorithm C) Classifying emails as spam or not spam D) the number of emails correctly classified.

  26. The following are performance Metrics for classification problems except ? A) Confusion Matrix B) F1 Score C) RMSE D) AUC

  27. The following are performance metrics for Regression problems except? A) Mean Absolute Error B) Standard Error C) Mean Square Error D) R Squared

  28. In building a Machine Learning model after getting your data which of the following step is the most important step? A) Model Evaluation B) Model Training C) Data pre-processing D) Data ingestion

  29. Some of the most important applications of classification algorithm are as follows except? A) Speech Recognition B) Forecasting oil prices C) Handwriting recognition D) Biometric identification

  30. Which of the statement is true about the performance of machine learning model with the data features. A) the use of relevant features can decrease the accuracy of your Model B) performing future selection before data modeling will decrease the model accuracy C) The performance of Machine Learning model is directly proportional to the data features D) Data features causes over-fitting in the model

  31. In a DataFrame each variable can be seen as ? A) Tuple B) Column C) Rows D) Entity

  32. Which of the following statement about DataFrame is not correct ? A) It’s core pandas data structure B) different value type can exist within a single column C) different columns can contain different data types D) Values within a single column are of the same data type.

  33. Which of this method Subset DataFrame using row and columns numbers A). .loc[ ] B). df[ ] C). .iloc[ ] D). None of the mentioned

  34. The following will give first 5 observations for the DataFrame df except? A) df.head() B). df. head(6) C). df.head(5) D). print(df.head())

  35. This method will return summary statistics for numeric columns? A) df.summary() B). df.count() C). df.describe() D). df.stats()

  36. This attribute returns a tuple of the number of rows followed by number of columns. A). df.columns B). df.(‘rows’,’columns’) C). df.columns() D). df.shape

  37. This will extract the data values in form of 2D numpy array ? A). df.extract() B). df.np() C). df.values D. None of the mentioned

  38. This will return column names ? A. df.columns B. df.column_names() C. df(columns) D. None of the mentioned

  39. To subset multiple columns, column1 & column2 of df? A. df[(columns) B. df.subset(column1, column2) C. df[[‘column1’ , ‘column2’]] D. df[colum1, column2]

  40. To add a new column, column3 to the DataFrame df by adding column1 and column2 we have: ? A). column3 = df[column1 + column2] B). df[column1 +column2] C). df[‘column3’] =df[‘column1’] +df[‘column2’] D). None of the mentioned

  41. To drop duplicate rows in column1 we use? A). df.drop_duplicates(subset=’column1') B). df.drop(column1) C). df.drop.duplicates.column1 D). None of the mentioned

  42. To count unique values in column1 of df DataFrame ? A). df.value.counts.column1 B). df[‘column’].value_counts() C). df.values[‘column1’].count() D). None of the mentioned

  43. To set column1 as the index column of df ? A) df=column1.set_index B). df=index_column() C) df.set_index(‘column1’) D). None of the mentioned

  44. The correct way of importing matplotlib is ? A). import matplotlib as plt B). Import Matplotlib.pyplot C). import matplotlib.pyplot as plt D). None of the mentioned

  45. Which of this will give the counts number of true missing values in each column of df ? A) df.count_missing_values() B) df.isna().sum() C) df.sum(missing_values) D) None of the mentioned

  46. You can load a csv into a DataFrame using this pandas function ? A). pandas=load(csv) B). pd.load_csv() C) pd.read_csv() D). None of the mentioned

  47. You can write to a csv file using ? A) df.to_csv() B). df.write_to() C). df.write_to_csv() D). None of the mentioned

  48. This method allows variables to be groupby similar to groupby() method A). .sum() B). .pivot_table() C) .avg() D) None of the mentioned

  49. What is the function of plt.show() ? A) to display class values B) to show null points C) to display plot D) None of the mentioned

  50. This takes a value as an argument and replaces each missing value ? A) df.ffillna() B) df.fill_na() C) df.replace() D) None of the mentioned

  51. Important Characteristics of Structured Data are ? A). Generality B). Dimensionality C). Resolution D). All of the Above

  52. What are some examples of data quality problems ? A) Noise and outliers B) Duplicate data C) Missing values D) All of the Above

  53. In standardization, the features will be rescaled with ? A). Mean 0 and Variance 0 B). Mean 0 and Variance 1 C). Mean 1 and Variance 0 D). Mean 1 and Variance 1

  54. Which one is a feature extraction example? A). Constructing a bag of words model B). Imputation of missing values C). Principal component analysis D). All of the Above

  55. Why do we need feature transformation? A). Converting non-numeric features into numeric B). Resizing inputs to a fixed size C). Both A and B D). None

  56. The correct way of pre processing the data should be ? A). Imputation ->feature scaling-> training B). Feature scaling->imputation->training C). Feature scaling->label encoding->training D). None

  57. Some of the Imputation methods are ? A). Imputation with mean/median B). Imputing with random numbers C). Imputing with one D). All of the above

  58. What is a Dummy Variable Trap? A). Multicollinearity among the dummy variables B). One variable predicts the value of other C). Both A and B D). None of the Above

  59. Which of the following(s) is/are features scaling techniques? A). Standardization B). Normalization C). Min-Max Scaling D). All of the Above

  60. How to handle the missing values in the dataset? A) Dropping the missing rows or columns B) Imputation with mean/median/mode value C) Taking missing values into a new row or column D) All of the above

  61. PANDAS stands for _______? A) Panel Data B) Panel Dashboard C)Panel Data analyst D)Panel Data Analysis

  62. Pandas key data structure is called? A) DataFrame B) KeyFrame C) Statistics D) Econometrics

  63. Pandas is an open-source _______ Library? A) Java B) Python C) jQuer D) Javascript

  64. Numpy stands for? A). Numerical Python B). Number In Python C). Numbering Python D). None Of the above

  65. Numpy developed by? A) Jim Hugunin B) Wes McKinney C) Travis Oliphant D) Guido van Rossum

  66. Which of the following Numpy operation is or are correct? A) Operations related to linear algebra. B) Mathematical and logical operations on arrays. C) Fourier transforms and routines for shape manipulation. D) All of the above

  67. NumPy is often used along with packages like? A) Node.js B) SciPy C) Matplotlib D) Both B and C

  68. Which of the following is contained in NumPy library? A) fourier transform B) n-dimensional array object C) tools for integrating C/C++ and Fortran code D) All of the mentioned

  69. Which of the following attribute should be used while checking for type combination input and output? A). types B).class C).type D)None of the above

  70. Which of the following function stacks 1D arrays as columns into a 2D array? A). column_stack B)com_stack C)row_stack D)All of the above

  71. What is the result of the following: int(3.99) A) 3.99 B) 3 C) 3.9 D) 3.0

  72. What is the result of the following operation: 11//2 A) 5.5 B) 5 C) 5.6 D) 5.0

  73. What is the result of the following? “hello Mike”.find(“Mike”) ? A) 6,7,8 B) 5 C) 6 D) 4,4

  74. Consider the following tuple: say_what= (‘say’, ‘what’, ‘you’, ‘will’) What is the result of this: say_what[-1] ? A) ‘will’ B) ‘say’ C) ‘what’ D) ‘you’

  75. Consider the following tuple: A= (1,2,3,4,5), What is the result of this line of code: A[1:4] ? A) (2,3,4,5) B) (2,3,4) C) (3, 4,5) D) (1,2,3,4)

  76. Consider the following tuple, A=(1, 2,3,4,5), what is the result of the following len(A) ? A) 4 B) 6 C) 5 D) 6

  77. Consider the following list, B=[1, 2, [3,’a’],[4,’b’]] What is the result of the following: B[3] [1] ? A) [4, ‘b’] B) “c” C) “b” D) [a, b]

  78. Dict = {“A” :1, “B” :”2", “C” :[3, 3,3,],”D”:(4,4,4),’E’:5, ‘F’ :6} What is the result of the following operation: Dict[“D”] ? A) 1 B) [3,3,3] C) (4,4,4) D) 4

  79. Consider the following set: {“A”, “A”}, What will be the result when the s wet is created? A) {“A”} B) {“A”, “A”} C) (“A”, “A”) D) {}

  80. What is the result of the following : type(set([1,2,3])) ? A) set B) list C) str D) dict

  81. What method do you use to add an element to a set ? A) append B) extend C) add D) merge

  82. What is the result of the following operation : {‘a’, ‘b’} & {‘a’} ? A) {‘a’, ‘b’} B) {‘a’, ‘b’, ‘a’} C) {‘a’} D) {}

  83. Consider the tuple A= ((1), [2,3], [4]) That contains a tuple and a list, what is the result of the following operation : A[2] ? A) [4] B) [2,3] C) 1 D) [1,2,3,4]

  84. Consider the tuple, A= ((11,12),[21,22]) that contains a tuple and list, what is the result of the following : A[0][1] ? A) 21 B) 11 C) 12 D) 22

  85. Consider the following list, A= [“hard rock”, 10,1.2] What will list A contain after the following command is run: del(A[1]) ? A) [10,1.2] B) [“hard rock”, 1.2] C) [“hard rock”, 10] D) [10]

  86. What is the result of this logic : True or False ? A) True B) Fasle C) Both D) None

  87. Why do we use exception handlers? A) write a file B) Read a file C) Catch errors within a program D) terminate a program

  88. Which of the following skills is not part Data Science Skills A) Story Telling B) Critical Thinking C) Philosophy D) Statistics

  89. Which of this best defines who a data scientist is ? A) Convert raw data into usable data B) To drive decisions that benefits business C) Uses Data to generate insights D) Uses insight to drive business decisions

  90. Artificial Intelligence is a subset of Machine Learning ? True or False

  91. Which of the following statements is true ? A) Data Science is the same as Artificial intelligence B) Deep Learning is an umbrella of Data Science C) Data Science uses AI, Machine Learning and Deep Learning D) All of the mentioned

  92. The following are ways of making data except: A) Visualization B) Web scrapping C) Experiments D) Surveys

  93. Big Data involves 3Vs, velocity, volume and variety, which of the following statement is correct? A) Data Science relates to the 3Vs of big data B) Data Science does not relate with big data C) Data Science can only relate with the Volume of big data. D)Data Science can only relate with the Variety of big data

  94. Statistics and Data Science have ____In common ? A) Machine Learning B) Analysis C) Deep Learning D) Coding

  95. What is the ultimate purpose of analytics: A) To evangelize data science B) To facilitate meetings between sales and marketing C) To communicate findings to the concerned D) To generate reports

  96. Which of the following is performed by Data Scientist? A) Define the question B) Create reproducible code C) Challenge results D) All of the mentioned

  97. What is a training set in a Machine Learning model? A) It is used to test the accuracy B) It is 30% of the dataset C) It is labelled data used in the model D) It is used to verify the dataset

  98. What is the output of the following lines of code?

a=1
def do(x):
  return (x+a)
print(do(1))

A) 2 B) 3 C) 1 D) 0

A) 2 B) 3 C) 1 D) 0

99. What is the output of the following:

for x in ['A', 'B', 'C']: 
     print(x+'A')

A) xA
   xB
   xC

B) AA
   BA
   CA

C) A, B, C

D) AA, BA, CA

A) B) C) D)

100. What is the output of the following few lines of code:

x=0
while(x<2):
    print(x)
x=x+1

A) 0
   1
B) 0
   1
   2
C) 1, 2, 3
D) 0

A) B) C) D)

I believe these 100 questions are of help to empower your data science and data analysis professional career quest. Good luck in your career journey.

About the Author

Gospel Orok is a data specialist and an AI enthusiast with working experience in the e-commerce and engineering industry. He is also an advocate for specialised skills in data analysis, data science and data engineering for delivering high impact solutions. He is currently one of the Community leads of Data Scientists Network (DSN). He is open for collaborations with organisations and individuals to build an AI ecosystem that develops high human capacity impact.

Feel free to join the DSN community and also get in touch with me on Linkedin and Twitter: Orok Gospel