Power bi duplicate column

Power bi duplicate column DEFAULT

Microsoft Excel: How to use Power Query to display a list of duplicate values or records

excel.jpg

Whether duplicate records are good or bad depends on specific conventions that you determine, not Excel. For the most part, duplicate data is common because many records repeat the same values within the same column. 

SEE: 83 Excel tips every user should master (TechRepublic)

On the other hand, a duplicate record—where all of the values are repeated—can spell trouble when reporting and analyzing the data set at large. It's easy to find duplicates; you can filter them out of a data set or format them using a conditional format rule. What you can't easily do in Excel is display only the duplicate records. The good news is that doing so is easy using Power Query, so in this article, I'll show you how to use Power Query to display a list of duplicates.

I'm using Microsoft 365, but Power Query is available through 2010, as an add-in. Excel Online doesn't fully support Power Query, but you can run queries. You can download the demonstration .xlsx file or work with your own data. This article assumes you have basic Excel skills, but even a beginner should be able to follow the instructions to success.

What is Power Query?

Power Query lets you connect to foreign and local data and then lets you transform that data that so you can use it in Excel without changing the source data. It's easy to use but unfortunately, most users are unfamiliar with it. Although we're using a simple feature within Power Query, this article isn't a basic introduction to Power Query. Now, let's actually use Power Query to display duplicates.

Define duplicate

To work efficiently with duplicates, you need to define what a duplicate is within the context of your data and how you use it. Any value that occurs more than once within the same column is a duplicate. For instance, many records in a tracking data set might have the same delivery date or customer. These are duplicate values, and they are common. 

SEE: Windows 10: Lists of vocal commands for speech recognition and dictation (free PDF) (TechRepublic)

The term can also define a record where every value in the record is repeated in another record. In other words, the entire record is a duplicate. For instance, two records that contain the same delivery date, customer and invoice number might cause a problem; you wouldn't want to invoice a client twice for the same order. These are duplicate records, and you'll usually want to delete one of them. 

For our purposes in this article, we'll use Power Query to display a list of both duplicate types: values repeated in the same column and values repeated across all columns.

As you can see in Figure A, the demonstration sheet contains duplicate values. In such a small data set, duplicates aren't difficult to spot. Working with them is a different matter, especially if the data set is large. We also have one duplicate record.

Figure A

pqfindduplicates-a.jpg

How to list duplicate values with Power Query

Let's use Power Query to see values repeated in the columns. To do so, click anywhere inside the data set, click the Data tab, and then do the following:

  1. In the Get & Transform Data group, click From Sheet. The resulting window shows the data in Power Query (Figure B).
  2. Select the column that you want to check for duplicates. In this case, the date column is already selected, so let's use it.
  3. On the Home tab (in Power Query, not Excel), click the Keep Rows dropdown in the Reduce Rows group.
  4. In the resulting dropdown list, choose Keep Duplicates.

Figure B

pqfindduplicates-b.jpg

As you can see in Figure C, the data set repeats two dates at least once. To see duplicates in the other columns, select a column and repeat step 3. For example, Figure D shows the duplicate value in the personnel column.

Figure C

pqfindduplicates-c.jpg

Figure D

pqfindduplicates-d.jpg

Now you know that at least two columns repeat at least one value. If you like, check for duplicates in each column; you'll find that every column repeats a value at least once.

What if you want to see if there's a duplicate record? Let's tackle that next.

How to list duplicate records with Power Query in Excel

To quickly recap, a duplicate record repeats values across all columns. To check the data set for duplicate records, select all of the columns in Power Query. To do so, hold down the Shift key while you click each column. Then, choose Keep Duplicates from the Keep Rows dropdown. Figure E shows the result. As you might have guessed already, the result is the same as the personnel query in the last section.

Figure E

pqfindduplicates-e.jpg

Granted, this is a simple example, and the results were easy to predict. That won't always be true, especially in a large data set.

How to use the results of your Power Query search

Seeing the records might not be enough. Fortunately, you can return the results to Excel as a sheet. Simply click Close & Load in the Close group. Doing so will create a new sheet and save the resulting data set, as shown in Figure F. Once the data is in Excel, you can use it as you would any other data set.

Figure F

pqfindduplicates-f.jpg

This is a simple use for Power Query. Take some time to become familiar with the different options so you can apply it to more complex tasks. 

Microsoft Weekly Newsletter

Be your company's Microsoft insider by reading these Windows and Office tips, tricks, and cheat sheets. Delivered Mondays and Wednesdays

Sign up today

Also see

Sours: https://www.techrepublic.com/article/how-to-use-power-query-to-display-a-list-of-duplicate-values-or-records-in-excel/

Duplicate vs Reference Query in Power BI

In Power BI Desktop, there are multiple ways to copy a query in Power Query Editor such as COPY, DUPLICATE, and REFERENCES. However, these actions have different use and purpose in Power Query and Power BI.

Duplicate vs Reference Query in Power BI

In this post, we are going to explain in details What are the main differences Duplicate vs Reference Power Query and When you should use each option to copy a query in Power BI?

Duplicate vs Reference Vs Copy in Power BI?

1) Reference in Power BI

As we earlier mentioned, the Reference option is used to take a copy from the original query.

reference a table in power query in Power BI

But what exactly happened when you copy a query using Reference option:

  1. It copies the original query without any custom steps.

> Original Query

As you can see, the original query has some of the applied steps as shown below:

copy query without steps from original query in Power BI

> Reference Query

And when you reference the original query, you will note that all the applied steps will be removed from the new Reference query as shown below:

reference query without steps from original query in Power BI | Power Platform Geeks
  1. Only the SOURCE step will be available in the new Reference table and can’t be edited.
can't edit source step in the power query editor in Power BI
  1. The Reference query doesn’t require more processing because it just acts as a pointer to the original query in memory and does not create a new object in the memory.
  2. The Reference query mainly depends on the main query, so
    • Any changes in the original query will affect the Reference table.
    • The new applied steps in the original query will be applied to the Reference table.

Example:

When adding a new column or renamed or delete a column in the original query, these changes will be automatically reflected into the Reference table. but it will not be added as a new step in the Reference query.

 Reference vs Duplicate Power BI

However, any custom steps in the Reference table will not be applied or affect the original query.

 Duplicate vs Reference Power BI

When you should use Reference Query in Power BI?

You should use Reference Query in Power Query Editor in Power BI at the following cases:

  1. If you need to take a copy from the original query without custom steps however you can add different steps.
  2. If you need to take a copy from the original query that still referenced and not isolated from the original query.
Query dependencies in Power Query

Note: using References quires severely may lead to circular references.

2) Duplicate in Power BI

Again, the Duplicate option is also used to take a copy from the original query.

duplicate a table in power query in Power BI | Power Platform Geeks

But what exactly happened when you copy a query using Duplicate option:

  1. It copies the entire original query with all applied steps.

> Original Query

As you can see, the original query has applied steps as shown below:

copy query without steps from original query in Power BI

> Duplicate Query

When you duplicate the original query, you will note that all the steps will be copied to the Duplicate query as shown below:

when happened when you duplicate query in Power Query
  1. The Duplicate query requires more processing because it creates a new object in the memory.
  2. The new Duplicate query will be isolated from the original query that means
    • Any changes in the original query will NOT affect the Duplicate query and vice versa.
  3. Unlike the Reference query, you can change the query source in the Duplicate query without affecting the original query.
edit query in Power Query in Power BI

You might also like to read

When you should use Duplicate Query in Power BI?

You should use Duplicate Query in Power Query Editor in Power BI at the following cases:

  1. If you need to take an exact copy from the main query with all applied steps.
  2. If you need to take an isolated copy from the original query.
  3. If you need to add additional steps with different configurations without affecting the original query.

3) Copy Table in Power BI

The Copy and Paste option is also used to copy a query in Power Query editor. it seems as a Duplicate query but it actually neither Reference action nor Duplicate action!

copy table in power BI | Power Platform Geeks

Practically, if you performed a copy-paste for a table with a normal query that not depends on other queries, it will act as a Duplicate query.

However, if you performed a copy-paste for a table that depends on other queries like (reference table), it will copy all dependencies queries as well.

The below example we clarify what will happened when use copy-paste option instead of Duplicate or Reference options.

Example (Copy-Paste a query with no query dependencies):

In this example, we will take a copy from the ‘Power Platform Geeks” query that doesn’t depend on other quires, so when we perform a copy-paste, it will copy this query with all applied steps and the copied query will be isolated from the original query.

copy paste table Power BI

Example (Copy-Paste a query with query dependencies):

In this example, as you can see in the Query Dependencies, the “Reference” query is already depends on “Power Platform Geeks“.

Query dependencies in Power Query

So when you copy and paste the “Reference” table that depends on other queries, it will generate two copies for the “Reference” query as well as the dependacy query that is ‘Power Platform Geeks” with all applied steps.

copy paste table query Power BI | Power Platform Geeks

Conclusion

In the end, the COPY, DUPLICATE, and REFERENCES are options to copy a query in Power Query Editor. However, these actions have different purpose and usage.

So in this post we have tried to clarify the main differences between Duplicate vs Reference vs Copy Power Query in Power BI, and when you should use each option!

Download

Download the BIPX file that used in the article from GitHub at Duplicate vs Reference vs Copy in Power BI.

See Also

Related

Sours: https://devoworx.net/duplicate-vs-reference-query-powerbi/
  1. Microsoft white mouse
  2. Deleted ecodiesel mpg
  3. Swr amps for sale

PowerQuery Table.Join duplicate column names

Is there a trick how to alias them similarly to a SQL

Yes, there is, Table.PrefixColumns:

Improved version of @Alexis Olson's answer.

Even inner join allows equal column names for keys only. All other columns have to have unique names. If your data schema is not stable you can find that errors happen here and there, regardless of join kind. Nested join, proposed by Alexis, is good. But you still have to list column names. That's +1 place to change.

If you code by hand (not using GUI much), there is better approach — convert tables to lists of records:

All fields remain accessible, for example:

But of course such nested record fields can't be used for keys or another join. And they are not directly observable with GUI.

answered Sep 21 at 3:35

Sours: https://stackoverflow.com/questions/65570209/powerquery-table-join-duplicate-column-names
Remove Duplicate Does not Work in Power Query for Power BI Here is the Solution

How to add or duplicate rows based on the values of a column? | Power Query | Excel Forum

Hi,

Agree that this was very clever and worked for me:

if [Pge1.Split] ="Yes" then "1-2" else null)Smile

I went a little further (pasted from advanced view):

#"Add number of rows" = Table.AddColumn(#"Filtered Rows", "Custom", each if [Frequency]="Yes" then Text.Repeat("x",BlankCount) else null),
#"Split Column by Position" = Table.ExpandListColumn(Table.TransformColumns(#"Add number of rows", {{"Custom", Splitter.SplitTextByRepeatedLengths(1), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),

where BlankCount is a number from a named range in the spreadsheet - this number determined the number of repeats required.  e.g. setting this as 3 creates a field value of "xxx"

Split is then by character (again creating rows) so each "x" created a new row.

Sours: https://www.myonlinetraininghub.com/excel-forum/power-query/how-to-add-or-duplicate-rows-based-on-the-values-of-a-column

Duplicate power column bi

Duplicate

When you work with tables and queries in Power Query and Power BI, you get the option to copy them through these actions: Duplicate, or Reference. It has been always a question in my sessions and courses that what is the actual difference between these two actions. The explanation is simple but very important to understand. Because when you know the difference, you will use it properly. In this short blog post, I’ll explain what is the Reference, and the difference of that with Duplicate. To learn more about Power BI; read Power BI from Rookie to Rock Star book.

If you are looking to copy an entire query with all of its steps, then Duplicate is your friend. Let’s see this in action. As an example, Let’s assume that we got the data from a web page that shows us the best seller’s movies information. If you have done the movies example of my book previously, the website is BoxOfficeMojo. Here is the link to the page:

https://www.boxofficemojo.com/alltime/world/?pagenum=1&p=.htm

In Power Query, we got data From Web, and selected this source;

Let’s say for this table, we do some transformations. For example; removing extra “^” character from the last column (Year column);

and then some other transformations, so we end up with a number of steps for this query.

After doing all these transformations, you realize that this data is only for the first hundred best seller movies because that web page doesn’t have the remaining movies. To get the remaining, you need to navigate to page 2, which has a different URL, but the same data structure.

Well, what you need to do? You have to do all those steps on page 2 as well. Let’s keep this example static and basic, (Because in complex scenarios when you have many pages, you may use functions and parameters to loop through all pages and combine them all together. If you are interested to learn about that, read my post here). Let’s say you want to do all those steps that you have done for page one, now for page two. To do that; you can leverage Duplicate.

Create a duplicate of Box Office Mojo (I called it; Box Office Mojo Page 1)

When you create the Duplicate query, it will be an exact copy of the first query, with all steps of it. These two queries are exactly like each other. No difference!

Duplicate copies a query with all the applied steps of it as a new query; an exact copy.

After creating the copy, then you can go to the source step to change the URL:

Using Duplicate, you managed to copy a query with all steps in it, and then make changes in your new query. Your original query is intact.

Duplicate is the option to choose, when you want to copy a query, but do a different configuration in steps.

Reference is another way of copying a query, However, the big difference is that; When you reference a query, the new query will have only one step: sourcing from the original query. A referenced query, will not have the applied steps of the original query. Let’s see this option in action. Continuing the example above; let’s say we want to create a new query that is the result of combining the page 1 and page 2 result. However, we do NOT want to change any of the existing queries, because we want to use those as the source for other operations.

With a right click on Box Office Mojo Page 1, I can create a Reference.

Reference will create a new query which is a copy of the Box Office Mojo Page 1, but only contains one single step:

The only step in the new query is sourcing the data from the original query. What does it mean? It means if you make changes in the original query, then this new query will be impacted.

Reference will create a new query which has only one step: Getting data from the original query.

Now we can use this query, to append to the Box Office Mojo Page 2;

The result would be a query that contains both pages;

To learn more about append and the difference of that with Merge, read my blog post here. In this example; we used Reference option to create a copy of the original query, and then continue some extra steps. There are many other usages for the Reference.

Reference is a good choice, when you want to branch a query into different pathes. One path that follows a number of steps, and another that follows a different steps, and both are sharing some steps in the original query.

After doing the append in this example, it is a good idea to uncheck the enable load on Page 1 and Page 2 queries to save some memory in Power BI.

Query Dependency

Finding out that which query is dependent (or referenced from) which query can be a bit challenging when you have too many queries. That is why we have the Query Dependency menu option in the View tab of Power Query;

For our example above, this is the query dependency diagram;

Now that you know there are two options when you copy a query, let’s have a closer look at their difference.

Isolation from the Original or Dependency to the Original

Duplicate creates a new copy with all the existing steps. The new copy will be isolated from the original query. You can make changes in the original or the new query, and they will NOT affect each other. Reference, on the other hand, is a new copy with only one single step: getting data from the original query. If you make a change in the original query, the new query will be impacted. For example; If you remove a column from the original query, the new query will not have it if it used the Reference method for copying.

Limitation of the Reference

You can not use referenced queries in all situations. As an example; If you have a Query 1, and then you created a reference from that as Query 2. You cannot use the result from Query 2 in Query 1! It will create a circular reference. You are combining a query with a reference to the query itself, It is impossible!

Some actions that invoke Reference or Duplicate

There are some actions in the Power Query that trigger Reference or Duplicate, let’s check those options:

Append Queries as New / or Merge Queries as New is a Reference action

These two actions are creating a reference from the original query and then they do Append or Merge with other queries.

Add as New Query is a Duplicate action

Believe it or not, when you right click on a column or cell and select Add as New Query, you are creating a duplicate of the original query.

This can be misleading sometimes, because you may expect the new query to source from the original, and with the change of original, this query also to change. However, the truth is that this is a duplicate action, and after this action, your original query and the new copy will be isolated from each other.

Copy and Paste is Neither Duplicate Nor Reference!

This is another misconception that Copy and Paste are similar to Duplicate. It is not, and it is not Reference either. When you do this action on a simple query (I mean a query that is not sourced from any other queries), then you get a result similar to Duplicate.

But when you do the Copy and Paste on a query that is sourced from other queries; the result is a copy of all original queries. Here is the result of Copy and Paste on Box Office Mojo All Pages (which is sourced from Page 1 and Page 2);

Duplicate and Reference are two different actions, and they are also different from Copy and Paste of a query. Duplicate will give you an exact copy of the query with all steps, Reference will create a reference to the original query instead as a new query. Duplicate is a good option to choose when you want the two copies to be isolated from each other, Reference is a good option when you create different branches from one original query. There are some actions in Power Query that trigger Duplicate or Reference as listed in this blog post. Hope this was a good post for you to understand the difference between these two actions clearly, and use them wisely from now on.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on Youtube

Reza Rad

Reza Rad

Trainer, Consultant, Mentor

Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Sours: https://radacad.com/reference-vs-duplicate-in-power-bi-power-query-back-to-basics
Power BI Remove Duplicate Records And Keep Most Recent

Learn how to remove duplicates keep the last record on power query.

Today I was helping a customer with a problem that seemed quite simple on the surface.  She had a data table containing historical customer sales orders (each customer has many orders on different dates).  The objective was to filter this table in Power Query and just load one record for each customer – the one that was the last order date.  To illustrate the problem more clearly, I have adapted the scenario using the Adventure Works database so you can remove duplicates keep the last record on power query.

Adventure Works Example

The Sales table contains all the historical sales transactions by customer (identified by CustomerKey) and each transaction has an Order Date. The objective is to filter this table in Power Query so as to keep only the last entry for each customer (the last entry is the most recent order date).  At the first instance, the solution seems to be simple.  In Power Query, you would think that you simply:

  • Sort the table by Order Date in descending order.
  • Select the customer key column and then remove duplicates.

But when you do this in Power Query, it does not work as expected. As you can see in the Sales table below, each customer has many transactions with different order dates.

remove duplicates keep last record power query

In Power Query, I sorted by OrderDate descending, then removed duplicates as shown below.

remove duplicates keep last record power query

But the solution is not correct – the order dates for some of the customers are actually not the last orders. The table on the left below shows the original data sorted by OrderDate for each customer. The table on the right below shows the results from power query. If you compare the full data on the left with the Power Query results on the right, you can see that PQ has returned the wrong order date for some customers.

Removing Duplicates Expected Incorrect

Why Doesn’t it Work?

I can’t say that I have a deep technical understanding of the problem, but I do have a conceptual understanding.  When you select “sort column”, it is reasonable to expect that the entire table is sorted before proceeding to the next step.  In reality, it is only the data that is loaded in memory that is sorted.  The remaining data on disk is not included in the sort.  Power Query also uses a concept called “lazy evaluation”.  In short this means that if you add a step in the code, and that step is not technically needed to produce the final result, then that step is actually never executed (even though it is there in the instructions) – weird I know, but very efficient.

Table.Buffer to the Rescue

Before I share this solution, let me point out there are other ways to solve the problem, specifically using group by. However the purpose of this article is to broaden readers understanding of Power Query and introduce the table.buffer function.

I am pretty sure I learnt this tip from Imke Feldman at The BIccountant (or possibly Chris Webb).  Both are absolute wizes at this stuff.  To solve the problem you will need to get in and make some manual changes to the M code.  To do this, first make sure you turn on the formula bar.  Go to the View menu and select “formula bar”.

remove duplicates keep last record power query formula bar

When I click on the step that sorts the table (desc) by OrderDate, the M code was as follows:

Removing Duplicates 4

To solve the problem, I need to force Power Query to load all the data into memory, forcing the sort to be completed now before proceeding.  All I did was to wrap the line of code above inside the Table.Buffer( ) function as shown below.

Removing Duplicates 5

The rest of the steps remain the same. The Table.Buffer( ) function forces the entire set of data to be loaded into memory after sorting and the hence the next step of removing duplicates works correctly on the entire data set.

The resulting table looks as follows:

Removing Duplicates 2

These results are now correct as you can see in the table below.  The OrderDate (Incorrect Solution) column is the result without using Table.Buffer( ) and the OrderDate (Correct Solution) column is the result of using Table.Buffer( ).  You can see several customers have different results.  The correct result can be manually validated against the raw data.

Removing Duplicates Solution

Here is the sample workbook and the source data that I used in this blog post.

Removing Duplicates Sample

Sours: https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/

You will also like:

How to do Duplicate Columns in Power Query

You can also duplicate the columns easily using the Power Query Editor in Power BI. This can be helpful when you have columns that you want to duplicate & make some temporary/permanent alterations to it in the Power Query Editor but not to your source data. In this tutorial, we learn 3 cool steps to duplicate columns with power query.

Duplicate Columns Using The Power Query Editor

Suppose you have the source data as shown below. Here, the marked column is the one that we want to duplicate.

Step 1: Select the Column that you want to duplicate

After you load the data source into the Power Query Editor, you have to find the column that you wish to duplicate and select the same. As shown in the picture below, we are going to duplicate the salary column.

DUPLICATE COLUMNS

Step 2: Duplicate the selected column

When you select the column to be duplicated, the go-ahead to do the following changes on the ribbon.

Go to Add Column tab >Duplicate Column option

DUPLICATE THE SELECTED COLUMNS

Step 3: Save Data

After you duplicate the column you get the result immediately on the screen of the Power Query Editor. When done with your work, just click on the save icon on the top to save the changes for next time.

DATA TABLE WITH DUPLICATED COLUMN

You can also learn: 

In this way, you can easily duplicate columns with Power Query in Excel as well as Power BI. If you wish to learn Power Query for Data Cleaning and Data Transformation Techniques, then check our latest Power Query Course for Power BI & Excel, which comes with the Lifetime Access and 24×7 Online Support.  

Sours: https://yodalearning.com/tutorials/how-do-duplicate-columns-power-query/


101 102 103 104 105