Skip to main content

Posts

Understanding Data File Formats: CSV, Avro, Parquet & ORC Explained

Understanding Data File Formats: CSV, Avro, Parquet & ORC Explained When building data pipelines and analytics platforms (Snowflake, Databricks, Athena, Hive), the file format you choose affects performance, cost, and reliability. This guide explains the four common formats — CSV , Avro , Parquet , and ORC — when to use each, and why concepts like schema and streaming matter. Row-Based vs Columnar Formats (Quick) Type How it stores data Examples Best for Row-based Stores each record (row) consecutively CSV, Avro Streaming, single-row operations, event pipelines Columnar Stores values of each column together Parquet, ORC Analytical queries, compression, large-scale reads File Format Comparison (Detailed) Format What it is Schema Serializat...

What is a Stage in Snowflake?

What is a Stage in Snowflake? A stage in Snowflake is a temporary or permanent storage location for data files before they are loaded into Snowflake tables or after they are unloaded. Think of it as a data landing zone or buffer area . Stages can store various file types such as: CSV JSON Parquet Avro or ORC Types of Stages in Snowflake Snowflake provides three main types of stages: Stage Type Category Description Example User Stage Internal Automatically created for each user. Accessible only by that user. @~ Table Stage Internal Automatically created for each table. Used to load/unload data specific to that table. @%my_table Named Stage Internal / External Explicitly created by users. Can point to internal Snowflake storage or external cloud storage....

How to choose right visual for Data Visualization in Power BI ?

Data visualization is a crucial aspect of any business or organization, as it helps to present complex data in a more understandable and visually appealing way. It enables decision-makers to quickly understand trends and patterns in the data, which can help to inform their decisions and strategy. One of the key tools for data visualization is Power BI, a powerful software platform that allows users to create interactive dashboards and reports. When using Power BI, it is important to choose the right visual for your data, as this will help to effectively communicate the insights you have gleaned from the data. So, how do you choose the right visual for your data visualization in Power BI? Here are a few key tips: 1.      Know your data : Before you start creating your data visualization, it is essential to understand the data you are working with. This includes understanding the structure of the data, the types of data you have, and any relationships or patterns in th...

SUMX Function in DAX - PowerBI - With Example

DAX (Data Analysis Expression) is a powerful language used to define calculations in Power BI, Excel, and other Microsoft tools. One of the useful functions in DAX is the SUMX function, which allows you to perform a sum across a table, using a formula that you specify. To understand how the SUMX function works, let's consider a simple example. Suppose you have a table of sales data, with the following columns: Date Customer Product Quantity Price 1/1/2022 John A 2 $10 1/1/2022 Mary B 3 $20 ...

What is Power BI & its components ?

Power BI is a business intelligence service that allows users to visualize and analyze data from various sources, such as databases, Excel spreadsheets, and web services. It provides a range of features and tools for creating interactive dashboards, reports, and charts, and allows users to share their insights with others in their organization. Power BI is available as a cloud-based service or as a desktop application, and can be used to connect to a wide variety of data sources, including on-premises databases, cloud-based data storage systems, and web-based data services. It also includes features for collaboration, such as the ability to publish dashboards and reports to the web and share them with others in an organization. Overall, Power BI is designed to help users make more informed decisions by providing a powerful and user-friendly platform for data visualization and analysis. The main components of Power BI are: 1.      Power BI Desktop : This is a Windows ...

DATA ANALYTICS LIFE CYCLE: WHAT IS IT? HOW TO APPROACH?

WHAT IS IT? Data analytics is the process of examining and manipulating data in order to gain insights and draw conclusions. It involves collecting, cleaning, and organizing data, as well as performing statistical and machine learning techniques to analyze the data. The data analytics life cycle is a structured approach that guides organizations through the various stages of the data analytics process. APPROACH? 1 OBJECTIVE : First and foremost, Understand the Objective, and the purpose. Why are you doing this? What outcome do you want from it? If these questions are not clear, the rest is in vain. It is the same way that we do in SDLC (Software Development Life Cycle) model, If the requirement is not clear, then you might develop or test the software wrongly. 2 UNDERSTANDING THE DATA : Once you have understood the “Objective”, understanding the data is crucial. Suppose your data contains rows and columns in excel or on a server, check what each row comprises with resp...