1. In the early 1960s, most applications used which programming language?

a) Java
b) COBOL
c) C++
d) Python
➡️ Answer: b) COBOL

2. What was the main drawback of magnetic tapes used in the 1960s?

a) They were too expensive
b) They only stored text
c) They required sequential access and were unreliable
d) They could not store large data
➡️ Answer: c)

3. On average, how much of the scanned tape data was actually required?

a) 50%
b) 30%
c) 5% or less
d) 80%
➡️ Answer: c)

4. Which technology replaced magnetic tapes in the 1970s?

a) Cloud storage
b) DASD (Disk storage)
c) USB drives
d) Optical disks
➡️ Answer: b)

5. What new software system emerged with DASD?

a) ERP
b) Compiler
c) DBMS
d) Web server
➡️ Answer: c) DBMS

6. Which decade saw the rise of PCs and 4GL technology?

a) 1960s
b) 1970s
c) 1980s
d) 1990s
➡️ Answer: c)

7. Extract programs primarily did what?

a) Deleted outdated data
b) Scanned and copied selected data to another file
c) Backed up entire databases
d) Analyzed the stored data
➡️ Answer: b)

8. A network of uncontrolled extract programs was called:

a) Cloud architecture
b) Data warehouse
c) Spider web or legacy architecture
d) Multitier architecture
➡️ Answer: c)

9. What led to massive data redundancy in the mid-1960s?

a) Lack of trained programmers
b) Growth of master files and magnetic tapes
c) High cost of hardware
d) Poor CPU performance
➡️ Answer: b)

10. What key advantage did DASD provide over magnetic tape?

a) Cheaper storage
b) Direct (non-sequential) access
c) Higher security
d) Automatic backup
➡️ Answer: b)

✅ Short Questions (with answers)

1. Why were magnetic tapes considered inefficient for data processing?

Because they required sequential access, were slow to scan, unreliable, and only a small portion of scanned data was actually needed.

2. What problems arose from the massive use of master files?

Data redundancy, data coherency issues, complex maintenance, complex program development, and additional hardware requirements.

3. What is DASD and why was it revolutionary?

DASD (Direct Access Storage Device) allowed fast, non-sequential data access in milliseconds, unlike magnetic tapes.

4. What role did DBMS play in data management?

DBMS handled data storage, indexing, and access on DASD, reducing the problems caused by master files.

5. How did PCs and 4GLs change data usage in the 1980s?

They empowered end users to develop MIS/DSS applications and access data directly without relying solely on data centers.

6. What is an extract program?

A simple program that scans data, selects records based on criteria, and copies them to another file or database.

7. What is meant by a “spider web” or “legacy system” architecture?

An uncontrolled system formed by many extract programs calling other extracts, leading to complexity and inefficiency

1. According to the passage, which resource has grown faster and cheaper over time?

a) CPUs
b) Network speed
c) Storage space
d) Memory cache
➡️ Answer: c) Storage space

2. What happens to organizational data according to the passage?

a) It grows linearly
b) It doubles every year
c) It decreases over time
d) It remains constant
➡️ Answer: b)

3. Which example of data size corresponds to 1 TB?

a) A small novel
b) A pickup van filled with paper
c) 50,000 trees worth of printed paper
d) All words ever spoken
➡️ Answer: c)

4. Which company was mentioned with a data warehouse size of 24 TB?

a) France Telecom
b) WalMart
c) CERN
d) SLAC
➡️ Answer: b)

5. A company with 1 TB of data may not have a data warehouse because:

a) Size alone does not define a DWH
b) It is too expensive
c) Data cannot be stored beyond 500 GB
d) OLTP systems are faster
➡️ Answer: a)

6. Which statement about historical data is emphasized?

a) It is rarely useful
b) It cannot predict the future
c) It is the best predictor of the future
d) It must always be deleted
➡️ Answer: c)

7. Operational systems in banks are designed around:

a) Events
b) Customer-centric views
c) Functional lines of business
d) Machine learning models
➡️ Answer: c)

8. Why does a business need a data warehouse?

a) It replaces OLTP systems
b) It integrates data across the entire organization
c) It reduces data size
d) It avoids data entry
➡️ Answer: b)

9. Which category of user typically uses a data warehouse?

a) Programmers
b) Clerks
c) Knowledge workers / Executives
d) Database administrators
➡️ Answer: c)

10. A data warehouse allows what kind of query access?

a) Fixed and predefined
b) Ad-hoc and unpredictable
c) Only SQL stored procedures
d) Only OLTP queries
➡️ Answer: b)

✅ Short Questions and Answers

1. Why is storage more important than CPU according to the passage?

Because storage capacity is increasing and becoming cheaper at a much faster rate compared to CPU performance.

2. How fast does organizational data typically grow?

It doubles every year, showing exponential growth.

3. Why is size alone not enough to define a data warehouse?

Because a DWH is a concept involving integration, history, and ad-hoc access—not just large storage.

4. Why is historical data important for a DWH?

Because history provides insight into future trends and helps in accurate decision-making.

5. What problem exists in operational systems of banks?

They are organized by separate lines of business and do not provide a unified customer view.

6. What type of decisions does a DWH support?

Intelligent, analytical, and strategic decisions based on integrated and historical data.

7. What kind of queries do DWH users generally generate?

Complex, ad-hoc queries that are not known in advance.

8. Who are knowledge workers?

Executives, analysts, and managers who use data for decision-making, not for clerical tasks.

9. What is meant by “intelligent enterprise”?

An organization that uses integrated data and analytical tools to make smarter business decisions.

10. What is one limitation of the standard definition of a DWH?

It says the DWH should be complete, but in reality, it can never be fully complete

Lecture No 1,2,3

A. MCQs from Data Marts (20 Questions)

1. Data marts are created from:

a) ERP system
b) Data warehouse
c) OLTP system
d) Backup files
➡️ Answer: b

2. Data marts are mainly created to serve:

a) Entire organization
b) Hardware vendors
c) Different departments
d) Only customers
➡️ Answer: c

3. A data mart is usually:

a) Smaller than a data warehouse
b) Larger than a DWH
c) Same size as a DWH
d) Not related to DWH
➡️ Answer: a

4. Data marts provide data for:

a) Department-specific needs
b) Only marketing team
c) Only OLTP operations
d) Only executives
➡️ Answer: a

5. A data mart contains:

a) Detailed real-time data
b) Summarized departmental data
c) Raw transactional files
d) Network logs
➡️ Answer: b

6. Data marts get their data through:

a) ETL from DWH
b) Website logs
c) Audio files
d) Random uploads
➡️ Answer: a

7. A data mart is closest to:

a) Mini data warehouse
b) Backup system
c) Cache memory
d) OLTP database
➡️ Answer: a

8. Data marts help departments by providing:

a) Faster access to relevant data
b) Unlimited storage
c) System passwords
d) Network security
➡️ Answer: a

9. Which is TRUE about data marts?

a) They do not contain historical data
b) They serve a specific business area
c) They replace the data warehouse
d) They contain unstructured data
➡️ Answer: b

10. Data marts reduce:

a) OLTP load
b) Department-level query time
c) Internet usage
d) Hardware errors
➡️ Answer: b

11. The main purpose of data marts is to:

a) Store emails
b) Support targeted analysis
c) Perform backups
d) Replace MIS systems
➡️ Answer: b

12. Data marts improve:

a) Network latency
b) Departmental decision-making
c) Mobile app performance
d) CPU speed
➡️ Answer: b

13. Data marts extract data from DWH using:

a) ETL
b) HTML
c) Debuggers
d) Antivirus
➡️ Answer: a

14. Data marts are part of the:

a) Transaction system
b) Data warehousing architecture
c) Operating system
d) Internet browser
➡️ Answer: b

15. Department-level reports are usually generated from:

a) Data marts
b) Data lakes
c) Antivirus logs
d) Mobile apps
➡️ Answer: a

16. Data marts help avoid:

a) Full DWH scans
b) Hardware failures
c) Power outages
d) Data duplication
➡️ Answer: a

17. Data marts are built to improve:

a) Localized analytics speed
b) Software installation
c) WiFi speed
d) Employee attendance
➡️ Answer: a

18. Data flowing from DWH to data marts is:

a) Cleansed and summarized
b) Encrypted audio
c) Raw video
d) Browser history
➡️ Answer: a

19. Data marts mainly focus on:

a) Subject areas
b) Whole enterprise
c) Only IT department
d) Security logs
➡️ Answer: a

20. A common use of data mart is:

a) Marketing analysis
b) Web designing
c) Antivirus scanning
d) Cloud hosting
➡️ Answer: a

B. MCQs from Denormalization (20 Questions)

21. De-normalization usually speeds up:

a) Data deletion
b) Data retrieval
c) Software installation
d) Index rebuilding
➡️ Answer: b

22. Denormalization is done to reduce:

a) Joins
b) Columns
c) Tables
d) Memory
➡️ Answer: a

23. Denormalization often introduces:

a) Update anomalies
b) Speed of CPU
c) Better normalization
d) Smaller storage
➡️ Answer: a

24. A major benefit of denormalization is:

a) Faster read operations
b) Fewer users
c) Lower disk cost
d) Perfect consistency
➡️ Answer: a

25. Denormalization increases:

a) Data redundancy
b) Normal forms
c) Encryption
d) Memory speed
➡️ Answer: a

26. Denormalization is often used in:

a) Data warehouses
b) Programming languages
c) Disk formatting
d) Operating systems
➡️ Answer: a

27. Which process does NOT improve with denormalization?

a) Update speed
b) Retrieval speed
c) Aggregation
d) Query performance
➡️ Answer: a

28. Denormalization can help resolve:

a) Aggregates
b) Index corruption
c) Backups
d) RAM shortages
➡️ Answer: a

29. Denormalization minimizes:

a) Joins
b) Indexes
c) Partitions
d) SQL users
➡️ Answer: a

30. Denormalization may slow down:

a) Updates
b) Select queries
c) Reports
d) Dashboards
➡️ Answer: a

31. Denormalization is usually performed after:

a) Normalization
b) Backup
c) Compression
d) Formatting
➡️ Answer: a

32. Denormalization adds:

a) Redundant data
b) Encryption keys
c) Passwords
d) Primary keys
➡️ Answer: a

33. Denormalization is useful when:

a) Read operations are heavy
b) Writes are dominant
c) Data is very small
d) Tables have one row
➡️ Answer: a

34. Denormalization may cause:

a) Inconsistency
b) Faster updates
c) Smaller tables
d) No redundancy
➡️ Answer: a

35. Denormalization helps reduce:

a) Complex joins
b) RAM usage
c) CPU usage
d) Disk failures
➡️ Answer: a

36. A drawback of denormalization is:

a) Data duplication
b) Slow SELECT
c) Loss of schema
d) Fewer tables
➡️ Answer: a

37. De-normalization is NOT suitable when:

a) There are frequent updates
b) There are heavy reads
c) There are large aggregates
d) There are complex joins
➡️ Answer: a

38. Data redundancy in denormalization leads to:

a) Update anomalies
b) More CPU cores
c) Fewer records
d) Less storage
➡️ Answer: a

39. Denormalization is preferred in systems that need:

a) Fast reporting
b) Fast updates
c) Minimal storage
d) Many small tables
➡️ Answer: a

40. Denormalization aligns best with:

a) Data warehousing needs
b) Real-time OLTP backups
c) Antivirus software
d) Operating system logs
➡️ Answer: a

MCQs from Bus vs Train Analogy (OLTP vs DWH)

1. In the bus vs train analogy, buses represent:

a) Data warehouse
b) OLTP systems
c) ETL process
d) Batch processing
➡️ Answer: b

2. In the analogy, trains represent:

a) OLTP systems
b) Real-time systems
c) Data warehouse batch operations
d) ERD modeling
➡️ Answer: c

3. Why can't bus and train modes be interchanged?

a) They use different fuels
b) They have different optimization goals
c) They travel different cities
d) They are not computerized
➡️ Answer: b

4. Trains run only twice a day because:

a) They cannot operate at high speed
b) They are optimized for bulk loads
c) They require more drivers
d) They have limited tracks
➡️ Answer: b

5. The analogy explains the difference between:

a) ERP and CRM
b) OLTP and DWH workloads
c) SQL and NoSQL
d) Data mart and data lake
➡️ Answer: b

MCQs from “Historical Data & OLTP Purging”

6. OLTP systems usually do NOT keep:

a) Real-time data
b) Current customer data
c) Long-term historical data
d) Daily transactions
➡️ Answer: c

7. Why does a DWH store historical data?

a) For faster transactions
b) For future trend analysis
c) To replace ERP
d) To reduce storage cost
➡️ Answer: b

8. OLTP systems typically purge old customer data after:

a) 1 day
b) 1 week
c) 1 year
d) 10 years
➡️ Answer: c

9. Historical data helps identify:

a) SQL errors
b) Why a customer left
c) Hardware failures
d) Index corruption
➡️ Answer: b

10. DWH combines what type of data?

a) Only real-time data
b) Only archived data
c) Operational + historical data
d) Only external data
➡️ Answer: c

MCQs from “How Much History?”

11. Amount of history kept depends mainly on:

a) Manager’s choice
b) Industry type
c) Laptop speed
d) User interface
➡️ Answer: b

12. Telecom companies typically store how much history?

a) 2 years
b) 18 months
c) 65 weeks
d) 5 years
➡️ Answer: b

13. Retailers store at least 65 weeks of data to analyze:

a) Employee salaries
b) Mobile usage
c) Seasonal buying patterns
d) ATM withdrawals
➡️ Answer: c

14. Insurance companies store:

a) 1 year of data
b) 65 weeks of data
c) 7 years of data
d) 18 months
➡️ Answer: c

15. Why is older historical data less valuable?

a) It becomes corrupt
b) Users cannot read it
c) It has less predictive power
d) It takes too much RAM
➡️ Answer: c

MCQs from "Batch vs Real-Time Updates"

16. DWH traditionally updates data:

a) In real-time
b) Hourly only
c) In periodic batches
d) Every 5 seconds
➡️ Answer: c

17. ATM systems require:

a) Daily updates
b) Real-time updates
c) Weekly updates
d) No updates
➡️ Answer: b

18. DWH does NOT need real-time data because:

a) Users prefer old data
b) It supports long-term decisions
c) It cannot handle new data
d) It is used only for backups
➡️ Answer: b

19. DWH update frequency mainly depends on:

a) Management politics
b) Volume and importance of data
c) Employee training
d) Network type
➡️ Answer: b

20. Traditional DWH is updated:

a) Every minute
b) As needed by strategy
c) Only yearly
d) Never
➡️ Answer: b

MCQs from “Deviation from Purist Approach”

21. A purist is someone who wants:

a) Fastest performance
b) Everything exactly by the book
c) Real-time BI
d) Cloud systems
➡️ Answer: b

22. As DWH grows, traditional characteristics such as non-volatility are:

a) Removed
b) Strengthened
c) Less strictly followed
d) Made mandatory
➡️ Answer: c

23. Why is non-volatility sometimes compromised?

a) To save money
b) Users demand fresher data
c) DBMS cannot handle old data
d) ETL becomes obsolete
➡️ Answer: b

24. Removing old data and adding fresh data breaks which rule?

a) Integrated
b) Subject-oriented
c) Time-variant
d) Non-volatile
➡️ Answer: d

25. Adding data daily instead of monthly deviates from:

a) OLTP rules
b) Traditional DWH design
c) SQL standards
d) Hardware requirements
➡️ Answer: b

26. The boundary between DWH and OLTP is blurring because of:

a) Cheaper laptops
b) Demand for near real-time analytics
c) New programming languages
d) Better power supply
➡️ Answer: b

27. Shadow tables are used for:

a) Data security
b) Faster batch updates
c) File compression
d) Removing redundancy
➡️ Answer: b

28. Real-time DWH updates occur using:

a) CDs
b) Shadow tables/log files
c) Printed reports
d) Backup tapes
➡️ Answer: b

29. Batch transformation rules are applied:

a) Before loading into DWH
b) After deleting data
c) Only during indexing
d) During software installation
➡️ Answer: a

30. Real-time DWH updates happen:

a) Per transaction
b) Once a year
c) Only when server reboots
d) During backups
➡️ Answer: a

✅ IMPORTANT SHORT QUESTIONS

1. What does the bus vs train analogy explain?

Difference between OLTP (frequent, small operations) and DWH (bulk, periodic loads).

2. Why does DWH store historical data?

To analyze trends, customer behavior, and predict future actions.

3. Why do OLTP systems purge old data?

To maintain performance and reduce storage cost.

4. Why is historical data valuable?

It helps understand why customers behave or leave.

5. How much history should a DWH store?

Depends on industry, economic value, and storage cost.

6. Why do telecom companies keep only 18 months of data?

Because call detail records are huge in volume.

7. Why retailers keep 65 weeks data?

For seasonal and yearly pattern comparison.

8. What is the purist approach?

Strictly following classical DWH rules.

9. Why are DWH rules deviating today?

Users demand fresh and near real-time data.

10. What breaks the non-volatility rule?

Removing old data and adding fresh data frequently.

11. How are real-time updates performed?

Using log files or shadow tables per transaction

Lecture No 4

A. Conceptual MCQs

1. The bus vs. train analogy in data warehousing highlights that:

A. OLTP and DWH can be interchanged
B. OLTP and DWH serve different purposes
C. Both support continuous frequent transactions
D. DWH cannot store history

Answer: B

2. In the bus–train analogy, the bus represents:

A. Data warehouse
B. OLAP
C. OLTP
D. Meta-data

Answer: C

3. DWH stores historical data mainly to:

A. Support daily business operations
B. Understand long-term patterns and customer behavior
C. Ensure real-time updates
D. Reduce storage cost

Answer: B

4. Operational systems (OLTP) usually purge old customer data after:

A. 3 months
B. 6 months
C. 1 year
D. Never

Answer: C

5. Why does a data warehouse keep even the data of old or left customers?

A. For daily withdrawals
B. For ATM transactions
C. For analyzing customer behavior
D. To reduce storage

Answer: C

6. Telecommunication companies typically store historical data for:

A. 6 months
B. 18 months
C. 3 years
D. 7 years

Answer: B

7. Retailers require at least 65 weeks of data to analyze:

A. ATM withdrawals
B. Tax audits
C. Seasonal patterns
D. Employee performance

Answer: C

8. Insurance companies keep how much historical data for actuary analysis?

A. 1 year
B. 65 weeks
C. 18 months
D. 7 years

Answer: D

9. DWH is NOT a complete repository of data because:

A. It cannot store history
B. Old data loses economic value
C. OLTP purges it
D. DWH updates are real-time

Answer: B

10. Traditional DWH updates occur:

A. In real-time
B. Only after user approval
C. In periodic batch mode
D. After every transaction

Answer: C

11. OLTP requires real-time updates because:

A. Hardware is faster
B. Historical analysis is needed
C. Otherwise financial fraud may occur
D. Users demand monthly reports

Answer: C

12. A purist is someone who:

A. Follows modern techniques
B. Accepts new approaches
C. Follows strict traditional “by-the-book” rules
D. Creates flexible systems

Answer: C

13. One deviation from traditional DWH architecture is:

A. No historical data
B. Real-time or near real-time loading
C. No ETL
D. No OLAP

Answer: B

14. Daily or hourly loading uses:

A. Only OLTP tables
B. Shadow tables or log files
C. Manual entries
D. Flat files only

Answer: B

B. Typical Queries MCQs

15. OLTP queries usually return:

A. Thousands of rows
B. Millions of rows
C. Few rows
D. No rows

Answer: C

16. DWH queries generally return:

A. 1–10 rows
B. Only PK rows
C. Thousands or millions of rows
D. No rows

Answer: C

17. OLTP queries typically use:

A. Primary index
B. Foreign index
C. Primary key
D. No indexing

Answer: C

18. DWH queries usually use:

A. Primary key
B. Composite key only
C. Primary index
D. No index

Answer: C

19. Selectivity in OLTP queries is usually:

A. Low
B. Medium
C. High
D. Variable

Answer: C

20. Selectivity in DWH queries is usually:

A. Low
B. Medium
C. High
D. None

Answer: A

21. OLTP tables are generally:

A. Denormalized
B. Lightly normalized
C. Fully normalized
D. No normalization

Answer: C

22. DWH tables are generally:

A. Fully normalized
B. Denormalized or lightly normalized
C. Unstructured
D. Raw only

Answer: B

C. Response Time MCQs

23. OLTP response time is in:

A. Minutes
B. Seconds to milliseconds
C. Hours
D. Days

Answer: B

24. OLAP/DWH complex queries run in:

A. Seconds to minutes
B. Only seconds
C. Microseconds
D. Weeks

Answer: A

25. Data mining queries may take:

A. Seconds
B. Minutes
C. Hours
D. Days

Answer: C

D. Architecture MCQs

26. Data marts are created to:

A. Replace OLTP
B. Store local backups
C. Serve departmental needs
D. Convert OLAP to OLTP

Answer: C

27. The step before OLAP in DWH architecture is:

A. Data mining
B. Creating data cubes
C. ETL
D. Extract → Transform → Load

Answer: B

28. The biggest difficulty in DWH design is:

A. Too many developers
B. Changing business requirements
C. No hardware available
D. No data available

Answer: B

29. Performance problems in VLDB occur because:

A. Hardware is cheap
B. Algorithms behave non-linearly
C. Data is always small
D. OLTP is used

Answer: B

30. In VLDB, difference between O(n log n) and O(n²) becomes significant when:

A. Rows < 100
B. Rows < 1000
C. Rows in millions or billions
D. Never

Answer: C

E. SDLC vs CLDS MCQs

31. SDLC begins with:

A. Design
B. Programming
C. Requirements gathering
D. Testing

Answer: C

32. CLDS begins with:

A. Requirements
B. Design
C. Data
D. Implementation

Answer: C

33. SDLC is:

A. Data-driven
B. Requirement-driven
C. Index-driven
D. Algorithm-driven

Answer: B

34. CLDS is:

A. Requirement-driven
B. Data-driven
C. Table-driven
D. No-driven

Answer: B

35. CLDS ends with:

A. Testing
B. Design
C. Understanding requirements
D. ETL

Answer: C

36. Reason SDLC fails for DWH is:

A. Hardware is slow
B. Requirements change rapidly
C. OLAP is slow
D. Data is small

Answer: B

F. Implementation Steps MCQs

37. Metadata repository is created in:

A. Phase I
B. Phase II
C. Phase III
D. No phase

Answer: A

38. Data cleansing occurs in:

A. Phase I
B. Phase II
C. Phase III
D. Only after DWH finish

Answer: B

39. OLAP is implemented in which phase?

A. Phase I
B. Phase II
C. Phase III
D. None

Answer: B

40. Deployment and system management occur in:

A. Phase I
B. Phase II
C. Phase III
D. None

Answer: C

41. Hardware selection should be done after:

A. Building reports
B. Business requirement analysis
C. Cube creation
D. Mining

Answer: B

42. The DWH must be designed to support:

A. One application
B. Only OLTP
C. Multiple applications and workloads
D. Only mining

Answer: C

43. The key driver of warehouse design should be:

A. Hardware
B. Software
C. Business needs
D. Storage cost only

Answer: C

44. Hardware cost is:

A. Increasing
B. Decreasing
C. Same
D. Irrelevant

Answer: B

45. Software complexity is mainly caused by:

A. Algorithms
B. Storage
C. UPS power
D. GUI

Answer: A

46. ETL includes:

A. Extract, Transfer, Localize
B. Extract, Transform, Load
C. Execute, Test, Load
D. Extract, Test, Link

Answer: B

47. When users get used to DWH, they demand:

A. Less data
B. More fresh/up-to-date data
C. No reports
D. Smaller databases

Answer: B

48. Shadow tables are used for:

A. Reporting
B. Batch or real-time updates
C. Data mining
D. Indexing

Answer: B

49. OLAP queries support:

A. Drill-down and roll-up
B. Transaction updates
C. Deleting customer data
D. ATM withdrawals

Answer: A

50. Large list selection queries typically take:

A. Seconds only
B. Minutes
C. Days
D. Microseconds

Answer: B

Lecture No 5

Which subject area is often the first data warehouse an organization builds?
A) Telecom DWH B) Financial DWH C) HR D) Insurance
Answer: B
Why is financial DWH appealing to start with?
A) Largest data volume B) Easy to get management attention C) No anomalies D) No integration needed
Answer: B
A drawback of financial data warehouses is:
A) Inability to match balances to the last rupee B) Excessive call detail records C) Long insurance cycles D) Lack of dates
Answer: A
Telecommunications DWH is dominated by:
A) Low volume data B) Call level detail volume C) HR records D) Financial summaries
Answer: B
Which is NOT a way to handle call level detail?
A) Store only a few months B) Store selective detail C) Store all call detail forever cheaply without concern D) Summarize/aggregate
Answer: C
Insurance DWH typically stores:
A) Only 1 month of data B) Very old data for actuarial analysis C) No dates D) Only real-time transactions
Answer: B
Insurance environments are unique because they have:
A) Few dates B) Many dates of many kinds C) No historical needs D) Short business cycles
Answer: B
Retailers typically keep how much history for season comparison?
A) 1 week B) 65 weeks C) 2 years D) 7 years
Answer: B
Telecoms commonly keep approximately how much history?
A) 18 months B) 7 years C) 65 weeks D) 3 months
Answer: A
Which application can justify DWH ROI by detecting unusual purchase patterns?
A) Inventory management B) Fraud detection C) Data backup D) OS installation
Answer: B
Profitability analysis in DWH helps to identify:
A) Which servers to buy B) Which customers are profitable or not C) How to purge OLTP data D) Network topology
Answer: B
Direct mail marketing via DWH primarily improves:
A) Hardware speed B) Targeting accuracy and cost savings C) Data encryption D) OLTP throughput
Answer: B
Customer retention modeling uses:
A) Live transaction only B) Historical behavior patterns C) Only demographics D) Only product catalogs
Answer: B
Which DWH type is dominated by a single major subject area?
A) Financial B) Human Resources C) Telecom D) Insurance
Answer: B
One reason financial DWH may not reconcile to operational systems is:
A) Different accounting periods across systems B) Data never changes C) No ETL process D) No users
Answer: A
Insurance claims often take:
A) Seconds B) Days C) Several years to settle D) Never settled
Answer: C
Which DWH application uses clustering & classification?
A) Direct mail B) Fraud detection C) Profitability analysis D) All of the above
Answer: D
Which statement is true about historical data value?
A) Older data always more valuable B) Value usually decreases further back in time C) All history is equally valuable D) Historical data is irrelevant
Answer: B
A telecom DWH that cannot work at aggregate level must:
A) Store call-level detail B) Purge everything C) Use only summaries D) Use OLTP instead
Answer: A
Profitability models often predict customer value over:
A) 1 day B) Next 3–5 years C) 1 hour D) Next minute
Answer: B
Which DWH type needs many different date fields?
A) Retail B) Telecom C) Insurance D) HR
Answer: C
A primary benefit of DWH for fraud detection is:
A) Faster OLTP updates B) Ability to spot deviations from normal patterns C) Less storage use D) Eliminate need for ETL
Answer: B
Direct mail targeting uses which DWH data?
A) Call detail records and behavior patterns B) Hardware logs C) Backup tapes D) Operating system patches
Answer: A
Which DWH subject often has the smallest data volume?
A) Financial B) Telecom C) Retail D) Web logs
Answer: A
A reason telecoms might store call detail for only a few months is:
A) Legal restrictions B) Call detail huge volume and storage cost C) No analytics need D) Hardware incompatibility
Answer: B
Long operational business cycles are a characteristic of:
A) Banks B) Insurance companies C) Retail shops D) Fast-food outlets
Answer: B
Which DWH application helps restructure pricing strategies?
A) Fraud detection B) Profitability analysis C) ETL D) OLTP
Answer: B
Yield management as a DWH application is mainly for:
A) Predicting crop yield B) Optimizing pricing and inventory C) Network routing D) Employee scheduling
Answer: B
Which is true about human resources data warehouse?
A) Dominated by call detail B) Only one major subject area C) No historical needs D) Requires 7-year history always
Answer: B
Which DWH application can save marketing expense by small-target lists?
A) OLTP B) Direct mail/database marketing C) OS updates D) System backups
Answer: B

✅ 5 Important Short Questions (with brief answers)

Q: Why is financial DWH often chosen first?
A: Finance is central (nerve center), touches all areas, usually smaller in volume and easy to get management support.
Q: Why are telecom DWHs challenging?
A: They are dominated by enormous call-level detail volume; summaries may not suffice for many analyses.
Q: What special needs do insurance DWHs have?
A: Very long history retention for actuarial analysis, many date types, and long business cycles (claims may take years).
Q: How does DWH support fraud detection?
A: By comparing current behavior to historical patterns and flagging deviations (e.g., unusual locations or purchase types).
Q: Why is historical data retention length industry-dependent?
A: It depends on data volume, storage cost, and the economic value of older data (e.g., retailers need ~65 weeks, insurance ~7 years, telecom ~18 months)

Lecture No 6

What are the two main goals of normalization?
A) Increase redundancy & speed B) Eliminate redundant data & ensure meaningful dependencies C) Improve UI & UX D) Add more indexes
Answer: B
Normalization is best described as:
A) Combining tables B) Decomposing a table into smaller tables C) Backing up data D) Indexing data
Answer: B
Which normal form requires atomic values and no repeating groups?
A) 2NF B) 3NF C) 1NF D) 4NF
Answer: C
A relation is in 2NF if:
A) It is in 1NF and every non-key column is fully dependent on the entire PK. B) It is only in 1NF. C) It has no primary key. D) It is denormalized.
Answer: A
3NF eliminates which dependency type?
A) Partial dependency B) Transitive dependency C) Multivalued dependency D) None
Answer: B
A transitive dependency means:
A) A -> B and B -> C so A -> C (via B) B) Composite key only C) Multi-valued attributes D) No dependency
Answer: A
Lossless decomposition means:
A) Data is lost after decomposition B) Original table can be reconstructed by natural join C) Data is encrypted D) Tables are merged
Answer: B
Which of these is TRUE about normalization guidelines?
A) They are strict laws B) They are guidelines C) They forbid any denormalization D) They only apply to NoSQL
Answer: B
In the example, which composite key was used to identify Marks?
A) (SID, Course) B) (Campus, Degree) C) SID alone D) Course alone
Answer: A
If a non-key column depends on part of a composite key, the table is violating:
A) 1NF B) 2NF C) 3NF D) 4NF
Answer: B
Which new table was created to move SID, Degree and Campus into for 2NF?
A) PERFORMANCE B) REGISTRATION C) CAMPUS_DEGREE D) STUDENT_CAMPUS
Answer: B
REGISTRATION was in 2NF but not in 3NF because of:
A) Partial dependency B) Transitive dependency C) Multi-valued dependency D) No primary key
Answer: B
To achieve 3NF, which table was created?
A) STUDENT_CAMPUS and CAMPUS_DEGREE B) PERFORMANCE only C) REGISTRATION2 D) NONE
Answer: A
Which normal form removes multi-valued dependencies?
A) 2NF B) 3NF C) 4NF D) 5NF
Answer: C
Which normal form is mostly of academic interest?
A) 3NF B) 4NF C) 5NF D) 1NF
Answer: C (5NF is more academic)
Normalization usually reduces:
A) Storage space & anomalies B) Query complexity always C) All hardware costs D) User load
Answer: A
Major downside of 3NF compared to 2NF is:
A) Less storage B) Worse performance and complexity C) No keys D) No joins
Answer: B
Which form is generally recommended to ensure at minimum?
A) 1NF B) 2NF C) 3NF D) 5NF
Answer: B
De-normalization is typically used to:
A) Increase redundancy for performance B) Ensure lossless join C) Achieve 5NF D) Remove primary keys
Answer: A
An insertion anomaly example from lecture:
A) Can add any course without student B) Can't add a student’s campus until student registers for a course C) No anomalies exist D) Foreign key error
Answer: B
An update anomaly example:
A) Changing campus requires updating multiple rows B) Deleting a course has no effect C) Adding degrees is easy D) No redundant data
Answer: A
A deletion anomaly example:
A) Deleting a student may remove degree info for that campus B) Deleting a row never removes needed info C) It increases performance D) It removes duplicate rows only
Answer: A
Functional dependency SID → Campus means:
A) SID uniquely determines Campus B) Campus determines SID C) No relation D) Multi-valued attribute
Answer: A
In 3NF, non-key attributes must be dependent on:
A) Nothing B) Entire primary key C) Only other non-key attributes D) Primary key only (no transitive)
Answer: D
Example table PERFORMANCE was in which normal form?
A) 1NF B) 2NF C) 3NF D) Not normalized
Answer: C
Normalization by projection means:
A) Creating smaller tables using SELECT of columns B) Merging tables C) Dropping indexes D) Creating views only
Answer: A
Which of the following increases slightly when moving to 3NF in example?
A) Query speed always B) Storage requirement (about 7%) C) Number of users D) Network latency
Answer: B
Normalization guidelines are said to be cumulative; meaning:
A) 3NF implies 2NF and 1NF are already satisfied B) 1NF implies 3NF C) They are unrelated D) 2NF forbids 1NF
Answer: A
Why might designers deliberately NOT fully normalize (purist approach ignored)?
A) To improve performance and simplicity B) To reduce redundancy always C) To satisfy academic rules D) Because normalization is illegal
Answer: A
A correct condition for lossless decomposition is:
A) Reconstructed table equals original with no extra or missing info B) Decomposed tables are empty C) Data is encrypted D) Foreign keys removed
Answer: A

✅ 10 Short (Important) Questions — quick answers

Q: What is normalization?
A: Process of organizing data by splitting tables to eliminate redundancy and ensure meaningful dependencies.
Q: Name two goals of normalization.
A: Eliminate redundant data; ensure data dependencies make sense.
Q: Give one example of an insertion anomaly.
A: Cannot add a student’s campus record until the student registers for a course.
Q: Define 1NF in one line.
A: All attributes must be atomic (single-valued) and no repeating groups.
Q: What extra condition does 2NF add beyond 1NF?
A: No partial dependency — every non-key attribute must depend on the entire primary key.
Q: What extra condition does 3NF add beyond 2NF?
A: No transitive dependencies — non-key attributes must depend only on the primary key.
Q: What is a transitive dependency?
A: A → B and B → C implies A → C (non-key attribute determines another non-key attribute).
Q: Why is lossless decomposition important?
A: So original table can be exactly reconstructed without loss or phantom data.
Q: When might you denormalize intentionally?
A: When performance (read/query speed) and simplicity outweigh redundancy costs, common in DSS/DWH.
Q: Which normal form is usually enough in practice and why?
A: 2NF is generally recommended — it removes major redundancy while keeping design simpler; 3NF may hurt performance

Lecture No 7

MCQs (1–30)

What is denormalization?
A. Randomly duplicating data
B. Selectively transforming normalized relations to improve performance
C. Always removing primary keys
D. Encrypting data
Answer: B
Which phrase best describes denormalization from the lecture?
A. Chaos and disorder
B. Controlled crash for performance without loss of information
C. Mandatory step after normalization
D. Replacing OLTP systems
Answer: B
Primary aim of denormalization in DSS is to:
A. Reduce storage cost only
B. Reduce query processing time and joins
C. Increase number of foreign keys
D. Remove all indexes
Answer: B
Denormalization should be applied:
A. Indiscriminately everywhere
B. Carefully and consciously with cost-benefit analysis
C. Only to OLTP systems
D. Only by DBAs with no business input
Answer: B
Denormalization can take forms such as:
A. Combining tables, splitting tables, adding data
B. Deleting all indexes
C. Encrypting the entire database
D. Removing backups
Answer: A
Which statement is true about dimensional modeling and denormalization?
A. They are identical concepts
B. Dimensional modeling sometimes involves collapsing tables but is distinct from general denormalization
C. Dimensional modeling forbids denormalization
D. Denormalization means no dimensions
Answer: B
An early study (Inmon) suggested partially normalized DSS may be:
A. Slower than fully normalized designs
B. An order of magnitude faster for some workloads
C. Always worse than normalized designs
D. Unrelated to performance
Answer: B
Denormalization improves performance primarily by:
A. Increasing joins
B. Reducing number of tables and joins
C. Increasing index counts always
D. Making data volatile
Answer: B
Which is NOT a direct benefit of denormalization mentioned?
A. Minimize joins and foreign keys
B. Resolve aggregates faster
C. Eliminate need for ETL
D. Reduce number of rows retrieved from primary table
Answer: C
Golden rule of denormalization given in lecture is:
A. Always denormalize everything
B. When in doubt, don’t denormalize
C. Denormalize only after backups
D. Denormalize only in OLTP
Answer: B
Which guideline is recommended before denormalizing?
A. Skip cost analysis
B. Do cost-benefit analysis (frequency, storage, join time)
C. Double storage without analysis
D. Remove triggers immediately
Answer: B
Redundant data provides performance benefit at query time but leads to liability at:
A. Insert time only
B. Update time (maintenance overhead)
C. No overhead ever
D. Hardware procurement stage
Answer: B
Which technique is a safe and common denormalization approach for one-to-one relations?
A. Collapsing the two tables into one
B. Splitting every column into separate table
C. Deleting foreign keys
D. Converting to flat files only
Answer: A
Collapsing one-to-one tables typically results in:
A. More foreign keys
B. Reduced storage and fewer indexes/foreign keys
C. Increased update anomalies always
D. Loss of business meaning
Answer: B
Which of these is NOT listed as an area where denormalization applies?
A. Star schemas abundance
B. Fast time-series access
C. Heavy update, few queries environment
D. Fast aggregate results
Answer: C
Derived attributes used in denormalization refer to:
A. Raw transactional rows only
B. Summaries like totals, balances and aggregates
C. Indexes only
D. SQL stored procedures
Answer: B
Pre-joining is a denormalization technique meaning:
A. Joining tables at query time always
B. Physically storing join results to avoid runtime joins
C. Removing joins entirely from schema
D. Using no foreign keys
Answer: B
Splitting tables horizontally/vertically is used to:
A. Increase number of joins needed
B. Improve performance by partitioning or separating seldom-used columns/rows
C. Remove primary keys
D. Force full table scans
Answer: B
Adding redundant reference columns helps to:
A. Increase update complexity without benefit
B. Reduce joins and index usage for frequent queries
C. Prevent any queries from running
D. Always reduce storage
Answer: B
Which condition makes denormalization especially attractive?
A. Many updates and few queries
B. Few updates but many join queries
C. No users at all
D. Strict transactional consistency required constantly
Answer: B
One recommended maintenance tool to keep redundant data consistent is:
A. Triggers to maintain duplicated copies
B. Removing all constraints
C. Manual file edits only
D. Disabling backups
Answer: A
When evaluating denormalization, you should weight frequency of use vs:
A. Number of developers only
B. Cost of additional storage and join acquisition time
C. Color of UI
D. Number of fonts in reports
Answer: B
Which hierarchical example was given as good candidate for denormalization?
A. CPU cores B) Geography (Province → Division → District → City → Zone) C) Usernames D) File permissions
Answer: B
Denormalization usually speeds up:
A. Data modification (updates) B. Data retrieval C. Schema normalization D. Hardware costs
Answer: B
A risk of heavy denormalization for both online and batch systems is:
A. Improved update performance B. Adverse effect on modification performance C. Guaranteed correctness D. Elimination of testing
Answer: B
One of the five principal denormalization strategies is:
A. Encryption B. Collapsing tables C. Removing all keys D. Using only XML
Answer: B
Collapsing tables is least harmful when the relationship is:
A. One-to-many B. One-to-one C. Many-to-many (always) D. No relation
Answer: B
Which of the following is a reason to avoid denormalization?
A. When update costs outweigh query benefits B. When queries are slow C. When aggregation is needed D. When star schema exists
Answer: A
Denormalization decision should be driven by:
A. Hunches only B. Measured analysis (query patterns, storage, maintenance) C. Developer preferences only D. Server OS
Answer: B
A correct summary sentence from the lecture:
A. Denormalize everything and forget normalization
B. Strike a balance between normalized and denormalized forms for query patterns and domain needs
C. Normalization has no place in DSS
D. Always use flat files for DSS
Answer: B

🔎 Short Questions (10) — Important & concise answers

Q: Define denormalization in one line.
A: Selectively transforming normalized relations into physical structures that reduce joins and improve query performance without losing information.
Q: Why is denormalization used in DSS?
A: Because DSS queries are read-heavy and join-intensive; denormalization brings related data closer, speeding up query processing.
Q: Name two main performance benefits of denormalization.
A: Fewer joins (reduced join cost) and reduced rows/indexes to scan (faster retrieval).
Q: Give four general denormalization strategies.
A: Collapsing tables, pre-joining, splitting tables (horizontal/vertical), adding redundant reference columns; also derived attributes/summaries.
Q: When is collapsing tables safe?
A: When two tables have a one-to-one relationship and their attributes are frequently used together.
Q: What is a core guideline before denormalizing?
A: Perform a cost-benefit analysis considering frequency of use, additional storage, and join times.
Q: Why are triggers mentioned in denormalization guidelines?
A: Triggers can maintain consistency of duplicated data copies by updating all copies on modification.
Q: Which kinds of DWH parts are good candidates for denormalization?
A: Star schema facts/dimensions, time-series access, aggregate-heavy areas, complex hierarchical dimensions (e.g., geography), and parts with few updates but many joins.
Q: What’s the “golden rule” about denormalization?
A: When in doubt, don’t denormalize.
Q: How does denormalization relate to dimensional modeling?
A: They overlap (dimensional models often collapse tables), but dimensional modeling is a distinct design approach with its own rules

· Lecture No 8

1. Splitting a table into multiple tables based on common column values is called:

A. Vertical splitting
B. Horizontal splitting
C. Pre-joining
D. Redundancy

Answer: B

2. The main goal of horizontal splitting is to:

A. Reduce primary keys
B. Spread rows across hardware for parallelism
C. Remove redundant attributes
D. Add derived values

Answer: B

3. Horizontal splitting helps reduce:

A. Headers
B. I/O overhead
C. CPU clock speed
D. Memory swapping

Answer: B

4. A benefit of horizontal splitting is:

A. More joins
B. Increased anomalies
C. Enhanced security
D. Larger tables

Answer: C

5. Vertical splitting divides a table based on:

A. Rows
B. Columns
C. Keys
D. Partitions

Answer: B

6. In vertical splitting, the primary key is:

A. Removed
B. Repeated in split tables
C. Ignored
D. Auto-generated

Answer: B

7. Vertical splitting is useful when:

A. Keys are duplicated
B. Columns are rarely accessed
C. Too many joins exist
D. Data is numeric only

Answer: B

8. Vertical splitting reduces:

A. Redundancy
B. Number of joins
C. Header size
D. Columns in the primary key

Answer: C

9. Users of vertically split tables see them as:

A. Multiple tables
B. A single view
C. Hidden partitions
D. Index files

Answer: B

10. Pre-joining is used to:

A. Normalize data
B. Avoid runtime joins
C. Reduce storage
D. Remove redundancy

Answer: B

11. Pre-joining is ideal when tables have:

A. One-to-one relation
B. One-to-many relation
C. No relation
D. Many-to-many relation

Answer: B

12. A drawback of pre-joining is:

A. No performance gain
B. High update anomalies
C. Repetition of master data
D. Complex queries

Answer: C

13. Pre-joining is commonly used in:

A. Banking
B. Market basket analysis
C. Finance
D. Security systems

Answer: B

14. In pre-joining, master data is moved into:

A. Another master table
B. Detail table
C. Temporary table
D. Index table

Answer: B

15. Adding redundant columns helps reduce:

A. Referential integrity
B. Runtime joins
C. Storage cost
D. Partition size

Answer: B

16. Redundant columns are usually:

A. Frequently used in joins
B. Unnecessary columns
C. Derived fields
D. Part of primary key

Answer: A

17. Adding redundant columns increases:

A. Storage
B. Speed of joins
C. Table count
D. Index traversal

Answer: A

18. Adding redundant columns is very similar to:

A. Vertical splitting
B. Pre-joining
C. Horizontal splitting
D. Null suppression

Answer: B

19. A problem with redundant columns is:

A. Slow retrieval
B. Update overhead
C. More partitions
D. Loss of RI

Answer: B

20. Derived attributes are added to:

A. Speed up calculations
B. Increase joins
C. Reduce redundancy
D. Enforce normalization

Answer: A

21. Derived attributes are suitable when the value:

A. Changes frequently
B. Requires daily updates
C. Stays fairly constant
D. Is unknown

Answer: C

22. A derived attribute becomes:

A. Inaccurate
B. Non-repeatable
C. Absolute and consistent
D. Hard to update

Answer: C

23. Adding derived attributes reduces:

A. Query processing time
B. Security
C. Storage overhead
D. Disk usage

Answer: A

24. An example of a derived attribute is:

A. Roll number
B. Age
C. Name
D. Gender

Answer: B

25. Derived attributes are efficient when:

A. Ratio of detail to derived rows is 1:1
B. Rows are stored yearly
C. Ratio of detail rows to derived rows is 10:1
D. Tables are normalized

Answer: C

26. Horizontal splitting helps in:

A. Campus-based queries
B. Yearly summaries
C. Both A and B
D. None

Answer: C

27. Horizontally split tables result in:

A. More page faults
B. More rows per block
C. Larger blocks
D. Fewer partitions

Answer: B

28. A benefit of horizontal splitting is:

A. Graceful degradation
B. Sudden failure
C. No indexes
D. No redundancy

Answer: A

29. Vertical splitting is most useful when tables have:

A. Wide headers
B. Many identical rows
C. Too many partitions
D. Numeric fields

Answer: A

30. Pre-joining avoids massive joins in:

A. Small tables
B. Very large tables
C. Temporary tables
D. Index tables

Answer: B

31. Redundant columns preserve:

A. Referential integrity
B. Unique constraints
C. Normal form
D. Primary key size

Answer: A

32. Splitting tables is especially useful in:

A. OLTP systems
B. Distributed DSS
C. ER modeling
D. Indexing

Answer: B

33. Horizontal splitting can be based on:

A. Primary key
B. Campus location
C. All attributes
D. Null values

Answer: B

34. Vertical splitting requires:

A. Foreign key removal
B. Duplicate primary keys in all tables
C. Combining rows
D. Complex joins

Answer: B

35. Pre-joining increases:

A. Table size
B. CPU usage
C. Data inconsistency
D. Header size

Answer: A

36. Derived attributes improve:

A. Update cost
B. Query speed
C. Compression
D. Normalization

Answer: B

37. Redundant columns are added when:

A. Joins are rare
B. Joins are frequent
C. Attributes are null
D. Data is static

Answer: B

38. A disadvantage of redundancy is:

A. Faster joins
B. More update anomalies
C. Smaller tables
D. Reduced storage

Answer: B

39. Splitting a table into years is an example of:

A. Vertical splitting
B. Horizontal splitting
C. Merging
D. Pre-joining

Answer: B

40. In pre-joining, a master table’s data is repeated for:

A. Every record
B. Each detail record
C. Each partition
D. Once only

Answer: B

1. What is horizontal splitting?

Horizontal splitting divides a table into multiple tables based on row values, such as splitting data by campus or year.

2. Why do we use horizontal splitting in a data warehouse?

To improve performance by reducing I/O, supporting parallel processing, and making queries scan fewer rows.

3. What is vertical splitting?

Vertical splitting divides a table by columns, placing frequently used columns in one table and rarely used ones in another, while repeating the primary key.

4. When is vertical splitting useful?

When certain columns are rarely accessed or the table has a wide header, causing unnecessary I/O.

5. What is pre-joining?

Pre-joining physically combines master and detail tables to avoid expensive run-time joins in large tables.

6. What is the drawback of pre-joining?

It increases storage because master table data is repeated in every detail row.

7. Why are redundant columns added?

To eliminate frequent joins by storing frequently-referenced attributes directly in the detail table.

8. What is the disadvantage of adding redundant columns?

It increases update overhead and storage but still maintains RI constraints.

9. What are derived attributes?

Attributes calculated once and stored, such as Age or Grade Points, to reduce calculation at query time.

10. When is it beneficial to store derived attributes?

When derived values change rarely and queries frequently require them, making storage cheaper than repeated computation

Lecture No 9

1. Denormalization effects on performance are generally:

A. Always positive
B. Always negative
C. Unpredictable
D. Neutral
Answer: C

2. Before denormalizing, a model should be normalized to:

A. 1NF
B. 2NF
C. 3NF
D. BCNF
Answer: C

3. One major underestimated factor in denormalization decisions is:

A. Storage
B. Maintenance
C. CPU cost
D. Security
Answer: B

4. Pre-joining generally affects:

A. Storage
B. Performance
C. Ease of use
D. All of the above
Answer: D

5. In the health-care example, the master-to-detail ratio assumed is:

A. 1:1
B. 1:2
C. 1:3
D. 1:10
Answer: B

6. After pre-joining, the detail table header becomes:

A. 10 bytes
B. 40 bytes
C. 60 bytes
D. 90 bytes
Answer: D

7. Pre-joining increases storage by approximately:

A. 5%
B. 10%
C. 12.5%
D. 30%
Answer: C

8. Which query becomes expensive after pre-joining?

A. Simple count
B. Count distinct
C. Average
D. Sum
Answer: B

9. Sorting cost is generally:

A. O(1)
B. O(n)
C. O(log n)
D. O(n log n)
Answer: D

10. Denormalized detail table rows are:

A. Fewer
B. Same
C. Twice as many
D. Half as many
Answer: C

11. Overall I/O degradation after pre-joining is approximately:

A. 2 times
B. 3 times
C. 5 times
D. 10 times
Answer: C

12. Best solution for performance after pre-joining is to also keep:

A. Logs
B. Backups
C. Normalized master table
D. Indexes
Answer: C

13. Adding redundant columns leads to __________ table scans.

A. Faster
B. Slower
C. No effect
D. Smaller
Answer: B

14. Copying Sales_Person to detail table increases table scan time by:

A. 5%
B. 10%
C. 16%
D. 50%
Answer: C

15. Redundant columns without discipline can turn a fact table into:

A. Index structure
B. Small table
C. Flat file
D. Archive
Answer: C

16. Increasing table width decreases:

A. Rows per block
B. Performance
C. Efficiency
D. All of these
Answer: D

17. Maintenance cost increases when:

A. Keys change
B. Indexes are removed
C. Table is compressed
D. Views are added
Answer: A

18. Hash partitioning distributes data:

A. Sequentially
B. Evenly/randomly
C. Based on ranges
D. Based on clusters
Answer: B

19. Hash-based splitting is:

A. Easily reversible
B. Sometimes reversible
C. Not reversible
D. Always reversible
Answer: C

20. Range partitioning may create:

A. Even distribution
B. Hot spots
C. Random distribution
D. Hash buckets
Answer: B

21. Round-robin splitting:

A. Supports partition elimination
B. Creates skew
C. Evenly distributes data
D. Requires hashing
Answer: C

22. Expression partitioning groups data by:

A. Random number
B. Hash key
C. Logical expression
D. Primary key
Answer: C

23. Expression partitioning can create hot spots similar to:

A. Hashing
B. Round robin
C. Range partitioning
D. Replication
Answer: C

24. Horizontal splitting may fail due to:

A. Column width
B. Irreversible splitting
C. Too many indexes
D. Lack of primary key
Answer: B

25. Vertical splitting reduces table width to improve:

A. Storage
B. Scan speed
C. Backup time
D. Metadata size
Answer: B

26. Vertical splitting divides a table based on:

A. Key values
B. Date
C. Frequency of access
D. Hashing
Answer: C

27. In vertical splitting, the frequently accessed portion is:

A. Larger
B. Smaller
C. Wider
D. Irreversible
Answer: B

28. Performance improvement with vertical splitting is around:

A. 50%
B. 200%
C. 500%
D. 1000%
Answer: C

29. Joins in vertical splitting affect:

A. Fast queries
B. Infrequent queries
C. Indexing only
D. None
Answer: B

30. Moving a 5-byte column into the frequent partition slows scans by:

A. 10%
B. 15%
C. 20%
D. 25%
Answer: D

31. Vertical splitting requires adding:

A. Foreign keys
B. Join key
C. Surrogate key
D. Cluster keys
Answer: B

32. Join key overhead becomes significant with:

A. Few records
B. Millions of records
C. Billion-row tables
D. Small tables
Answer: C

33. Importance of partition elimination is high in:

A. OLTP
B. Data Warehouse
C. File systems
D. Backup systems
Answer: B

34. Round-robin is mainly used for:

A. Permanent tables
B. Temporary tables
C. Fact tables
D. Dimension tables
Answer: B

35. Hot spots occur when:

A. Data equally distributed
B. Recent data accessed heavily
C. No queries run
D. Table compressed
Answer: B

36. Hashing requires:

A. Primary key range
B. Well-selected partitioning key
C. Surrogate key
D. No key
Answer: B

37. Partitioning improves performance through:

A. More CPU
B. Divide & conquer
C. Extra memory
D. Index removal
Answer: B

38. Pre-joining is best when:

A. Queries rarely join tables
B. Joins are frequent on large tables
C. Only small data exists
D. Data is static
Answer: B

39. Loss of information occurs when redundant column added without:

A. 1:M relationship
B. Surrogate key
C. Index
D. Logging
Answer: A

40. Main motivation for denormalization is:

A. Aesthetics
B. Performance
C. Portability
D. Security
Answer: B

1. What are the four major trade-offs of denormalization?

Storage, Performance, Ease-of-use, and Maintenance.

2. Why must a logical model be fully normalized before applying denormalization?

So that the baseline (pure model) is correct and all denormalization effects can be clearly documented and controlled.

3. What is the main storage impact of pre-joining master and detail tables?

It increases storage because master data is repeated for every detail row.

4. Why does performance sometimes decrease after pre-joining?

Because queries require count distinct, sorting, and scanning a larger, wider table.

5. What causes a 5× performance degradation in some pre-join queries?

Sorting due to count distinct
2× more rows
2.5× larger header size

6. What is the primary risk of adding redundant columns?

Table becomes wider → fewer rows per block → more I/O → lower performance.

7. How does denormalization increase maintenance complexity?

Redundant data must be updated across many large tables, and any key changes must propagate everywhere.

8. Why are hash-based horizontal partitions difficult to reverse?

Hashing randomly distributes rows, making original ordering impossible to reconstruct.

9. What is the disadvantage of range-based horizontal splitting?

It often creates “hot spots” where recent partitions receive most of the load.

10. Why must columns be carefully chosen in vertical splitting?

Placing a frequently used column in the wrong split forces a join, destroying performance benefits

· Lecture No 10

OLAP stands for:
A. Online Transaction Processing
B. On-Line Analytical Processing
C. Online Log Access Protocol
D. Off-Line Analytical Planning
Answer: B
OLAP is primarily a:
A. Physical database design technique
B. Framework for analytical processing
C. Transaction processor
D. Backup strategy
Answer: B
Typical OLAP implementations are:
A. Fully normalized
B. Highly or completely denormalized
C. Always 1NF only
D. Always in-memory only
Answer: B
A data warehouse without OLAP is:
A. Common and sufficient
B. Nearly unthinkable for decision support
C. Best practice for OLTP
D. Only for backups
Answer: B
OLAP supports analysis that is:
A. Predefined only
B. Ad-hoc, interactive, iterative
C. Single-step and fixed
D. Only for clerical tasks
Answer: B
Which of the following is NOT true?
A. OLAP generates information for non-routine tasks
B. OLAP is intended to be user-driven
C. OLAP requires the user to know SQL
D. OLAP must be intuitive to users
Answer: C
The thought-process example started analysis by:
A. Geography only
B. Time (quarterly sales)
C. Product only
D. Employee data
Answer: B
After finding a lossy quarter, the decision maker drills down to:
A. Ignore the quarter
B. Monthly sales by region and product
C. Reboot the server
D. Contact IT only
Answer: B
Which OLAP operation narrows data to more detailed level?
A. Roll Up
B. Drill Down
C. Slice
D. Pivot
Answer: B
Which OLAP operation aggregates to less detail?
A. Drill Down
B. Roll Up
C. Slice
D. Dice
Answer: B
Facts in OLAP represent:
A. Descriptive categories
B. Quantitative measures (e.g., sales $)
C. User accounts
D. Indexes
Answer: B
Dimensions in OLAP represent:
A. Aggregates only
B. Descriptive categories (time, geography, product)
C. Hardware specs
D. SQL queries
Answer: B
Dimension hierarchies example for time is:
A. Hour → Minute → Second
B. Week → Month → Quarter → Year
C. Day → Month → Week
D. Zone → City → Province
Answer: B
OLAP is best described as:
A. A classification of applications
B. A storage engine
C. A programming language
D. A network protocol
Answer: A
OLAP objective includes all EXCEPT:
A. Fast decision-making support
B. Iterative analysis
C. Ad-hoc queries
D. Continuous transactional updates
Answer: D
MOLAP refers to:
A. Relational OLAP only
B. Multidimensional OLAP (cube-based)
C. Machine-learning OLAP
D. Micro OLAP
Answer: B
Compared to OLTP, OLAP queries typically retrieve:
A. Smaller amount of data per transaction
B. Large aggregated data sets
C. Single rows only
D. Only real-time transactions
Answer: B
OLTP systems are typically:
A. Denormalized
B. Fully normalized
C. Multidimensional
D. Cube-based
Answer: B
Typical OLAP user is:
A. Clerk entering transactions
B. Knowledge worker / analyst / executive
C. Backup operator
D. Network admin
Answer: B
OLAP systems aim to answer most queries under:
A. 5 seconds (FAST in FASMI)
B. 1 minute
C. 10 minutes
D. 1 hour
Answer: A
In FASMI, the “S” stands for:
A. Secure
B. Shared (multi-user security & sharing)
C. Simple
D. Scalable
Answer: B
In FASMI, “M” stands for:
A. Memory
B. Multidimensional
C. Metrics
D. Modeling
Answer: B
Which FASMI letter requires the system to provide all relevant information no matter where it resides?
A. F (Fast)
B. A (Analysis)
C. S (Shared)
D. I (Information)
Answer: D
Why is it infeasible to pre-write all possible OLAP queries?
A. Programmers are too slow
B. The user does not know questions in advance; queries are ad-hoc and multidimensional aggregates are many
C. Databases cannot store queries
D. Queries do not help analysis
Answer: B
A multidimensional aggregate corresponds to:
A. A single row only
B. A possible OLAP query at a specific hierarchy level
C. A primary key
D. A user account
Answer: B
OLAP helps support the human thought process by enabling:
A. Static reports only
B. Iterative exploration (slice, dice, drill)
C. Only batch jobs
D. Only pre-scheduled alerts
Answer: B
Typical OLAP table types are:
A. Flat tables only
B. Multi-dimensional tables / cubes
C. System catalogs
D. Temporary logs
Answer: B
Which is a key difference in data “age” between OLTP and OLAP?
A. OLTP: Historical 5–10 years; OLAP: Current 60–90 days
B. OLTP: Current (60–90 days); OLAP: Historical (5–10 years)
C. Both same age
D. OLTP: Never current; OLAP: Real-time only
Answer: B
Which feature of OLAP ensures many users can access shared confidential data securely?
A. FAST
B. ANALYSIS
C. SHARED (security & sharing)
D. MULTIDIMENSIONAL
Answer: C
If queries take too long in OLAP, what happens to decision making?
A. No impact
B. Thought process is broken and users get distracted
C. Users gain more insight
D. System becomes transactional
Answer: B

✅ 10 Short Questions (concise answers)

Q: What is the primary role of OLAP in relation to the data warehouse?
A: OLAP performs analysis on data provided by the data warehouse — supporting ad-hoc, interactive, iterative decision-making.
Q: List three characteristics of analysis supported by OLAP.
A: Ad-hoc, interactive, iterative (drill down/roll up/drill across).
Q: Define a FACT and a DIMENSION.
A: FACT = numeric measure (e.g., sales $); DIMENSION = descriptive category for filtering/reporting (e.g., time, product, geography).
Q: Give two OLAP navigation operations.
A: Drill Down (more detail) and Roll Up (less detail).
Q: Why can’t all OLAP queries be pre-written?
A: Queries are user-driven and exploratory; users don’t know their questions in advance and the space of multidimensional aggregates is huge.
Q: What is the FASMI test used for?
A: To evaluate OLAP systems on Fast, Analysis, Shared, Multi-dimensional, and Information capabilities.
Q: Typical OLTP vs OLAP difference in table structure?
A: OLTP uses normalized flat tables; OLAP uses multidimensional tables/cubes or denormalized structures.
Q: Who is the typical OLAP user?
A: Knowledge workers — analysts, managers, executives (not clerks).
Q: What does “multidimensional” mean in OLAP?
A: Data is viewed across multiple dimensions (e.g., time × geography × product) often organized in hierarchies.
Q: Why must OLAP be fast (under ~5 seconds)?
A: To maintain the analyst’s thought process and support interactive, iterative exploration without breaking concentration.

Bank Details

Ad Code

CS614 Data Warehousing MCQs and Short Questions -Midterm files | cs614 midterm 2025

A. MCQs from Data Marts (20 Questions)

1. Data marts are created from:

2. Data marts are mainly created to serve:

3. A data mart is usually:

4. Data marts provide data for:

5. A data mart contains:

6. Data marts get their data through:

7. A data mart is closest to:

8. Data marts help departments by providing:

9. Which is TRUE about data marts?

10. Data marts reduce:

11. The main purpose of data marts is to:

12. Data marts improve:

13. Data marts extract data from DWH using:

14. Data marts are part of the:

15. Department-level reports are usually generated from:

16. Data marts help avoid:

17. Data marts are built to improve:

18. Data flowing from DWH to data marts is:

19. Data marts mainly focus on:

20. A common use of data mart is:

B. MCQs from Denormalization (20 Questions)

21. De-normalization usually speeds up:

22. Denormalization is done to reduce:

23. Denormalization often introduces:

24. A major benefit of denormalization is:

25. Denormalization increases:

26. Denormalization is often used in:

27. Which process does NOT improve with denormalization?

28. Denormalization can help resolve:

29. Denormalization minimizes:

30. Denormalization may slow down:

31. Denormalization is usually performed after:

32. Denormalization adds:

33. Denormalization is useful when:

34. Denormalization may cause:

35. Denormalization helps reduce:

36. A drawback of denormalization is:

37. De-normalization is NOT suitable when:

38. Data redundancy in denormalization leads to:

39. Denormalization is preferred in systems that need:

40. Denormalization aligns best with:

MCQs from Bus vs Train Analogy (OLTP vs DWH)

1. In the bus vs train analogy, buses represent:

2. In the analogy, trains represent:

3. Why can't bus and train modes be interchanged?

4. Trains run only twice a day because:

5. The analogy explains the difference between:

MCQs from “Historical Data & OLTP Purging”

6. OLTP systems usually do NOT keep:

7. Why does a DWH store historical data?

8. OLTP systems typically purge old customer data after:

9. Historical data helps identify:

10. DWH combines what type of data?

MCQs from “How Much History?”

11. Amount of history kept depends mainly on:

12. Telecom companies typically store how much history?

13. Retailers store at least 65 weeks of data to analyze:

14. Insurance companies store:

15. Why is older historical data less valuable?

MCQs from "Batch vs Real-Time Updates"

16. DWH traditionally updates data:

17. ATM systems require:

18. DWH does NOT need real-time data because:

19. DWH update frequency mainly depends on:

20. Traditional DWH is updated:

MCQs from “Deviation from Purist Approach”

21. A purist is someone who wants:

22. As DWH grows, traditional characteristics such as non-volatility are:

23. Why is non-volatility sometimes compromised?

24. Removing old data and adding fresh data breaks which rule?

25. Adding data daily instead of monthly deviates from:

26. The boundary between DWH and OLTP is blurring because of:

27. Shadow tables are used for:

28. Real-time DWH updates occur using:

29. Batch transformation rules are applied:

30. Real-time DWH updates happen: