So You Want to Deploy Power BI Project files (PBIPs)?

July 1, 2024 ~ sqlswimmer ~ 5 Comments

Have you heard the news about the new Power BI Project files? Okay, maybe not news anymore since it was announced over a year ago. Just in case you hadn’t heard, Microsoft is using a new format “payload” that is human readable (it’s json) instead of a binary format like the original .PBIX. This is great news for source control, you can now easily see the differences between versions, so you know exactly what changed.

This new “payload” format essentially “unzips” the contents of the pbix and stores it in an unzipped format. This payload consists of a .pbip file and one or more folders containing all the parts and pieces you need for your report and/or semantic model.

When it was announced there was a collective cheer from Power BI source control advocates heard ’round the world. Since it’s preview release, Microsoft has also added GIT integration with Fabric workspaces. This makes it so easy to incorporate source control for all (or almost all) of your Fabric artifacts, including Power BI.

But what happens when your organization already has a mature CI/CD process in place using Azure DevOps? Do you really want to break from that pattern and have it controlled somewhere else? That’s what this post is about, using Azure DevOps CI/CD pipelines to deploy your Power BI Project files (.pbip).

I’m going to share my experience in hopes that it will save you some time if this is the route you need to take.

Prerequisites

Power BI premium capacity workspace or Fabric workspace – For Power BI workspaces, this can be a PPU workspace or a dedicated capacity SKU; for Fabric workspaces, this can be any workspace backed by any F SKU
Azure DevOps Repo – Repository for your source code and pipelines
Service Principal – Used by the Azure DevOps pipeline to authenticate to the Power BI service, this account will also need at least contributor permission on the workspaces you are deploying to
Fabric PowerShell command-lets – Rui Romano at Microsoft has created these and made them publicly available via GitHub – they serve as a wrapper for the Fabric APIs
PowerShell 7.0 or higher – The Fabric PowerShell Command-lets require at least PowerShell 7.0 or higher
Power BI Desktop March 2024 or later – You will need this to create the Power BI project files

Decisions To Make

There are some decisions that will need to be made before you get started. These decisions should be carefully thought out before you proceed.

Will your organization be separating semantic models from reports, which is a best practice for encouraging semantic model reuse? This becomes important when thinking about how to structure your repo.
- I chose to separate my semantic models from reports, to encourage semantic model reuse.
How will your organization structure your repo? Are you creating a separate repo for Power BI artifacts? What will the folder structure look like for Power BI items in your repo? This becomes important for scalability.
- I chose to use a folder structure that had the deploy type (semantic model or report) at the top, followed by the name of the workspace. The path for semantic models would look something like <repo root>\Datasets\<semantic model workspace name>\<your pbip file/payload>. (I purposefully chose to use the word “datasets” instead of semantic models because you are limited to the number of characters in the path to 256, so saving characters where I can.) For reports, it would look something like <repo root>\Reports\<report workspace name>\<your pbip file/payload>.
Does your organization have the PowerShell skills? I’m going to assume yes, since your organization already has a mature CI/CD process in place using Azure DevOps. This will be important when it comes to building payloads for deploy.
- Most of the PowerShell you will need is around the IO file system, but you will also need to be familiar with looping and conditional statements.

Creating the Pipelines

In Azure DevOps, you have pipeline pipelines (no, that is not a type-o) and release pipelines. This has always confused me, they are both pipelines, but “pipeline pipelines” just sound weird to me. My OCD brain needs something to distinguish them, so I call pipelines “build pipelines”. For release pipelines, well, my brain accepts “release pipelines”, so all good there. But I digress.

Build Pipeline

I used the build pipeline to build my payload of files needed for deploy based on the files that have changed since the last commit. Now you may be asking, why do you need to build a payload? We know what files changed, so what more do we need? Well, that’s where the PowerShell Fabric command-lets come in. You can either deploy a single item or you can deploy multiple items. The catch is the parameter for the item(s) to deploy is a folder, not a single file.

I did a bit of poking around in the command-lets code and discovered it’s deploying the .SemanticModel and/or .Report folder(s) when it calls the Fabric API. These folders are part of the “unzipped” payload of the Power BI Project and they contain all those parts and pieces that are need for your semantic model and/or report, so you have to deploy all those files/folders. But if you made a change that only affected one file in one of those folders, it won’t show up when you look at only the files that changed since the last commit. This is why you have to build a payload of files based on the file(s) that changed. This is where those PowerShell file system command-lets come in, along with looping and conditional statements. Once you have that payload of files, you need to put them in a place where your release pipeline can pick them up and proceed with the actual deploy.

Release Pipeline

I used the release pipeline to do the actual deploy of the files in the payload created by the build pipeline. This is where those PowerShell Fabric command-lets come into play. I used PowerShell again to inspect the payload to determine what parameters to pass to the command-lets, then did the deploy. Because I though carefully about how to structure my repo, I was able to easily deploy on a per workspace basis with a little bit of PowerShell looping. This ensures a very scalable solution. It doesn’t matter if I make changes to semantic models/reports in more than one workspace, if the changes are in the same commit, they all go, regardless of workspace.

Assumptions

I did make some assumptions when I created these pipelines,

This process will only to be used for Development build/release
- Why am I mentioning this? Because there’s this pesky thing called connections. In the paradigm I am using, where we separate the semantic models from the reports (to encourage semantic model reuse), in development, I am assuming the connection to the semantic model in the report will not change in a development deploy. This means that whatever the connection is in the report, it will be the connection when it goes to the Power BI service.
Semantic models will already exist in the Power BI service that are used by reports
- When you separate the semantic model from the report, when you create the report, the semantic model must already exist in the Power BI service in order to create that connection in the report. This means that you will need to check in/sync your local branch with the remote branch where your semantic model creation/changes live before you can create any reports that use those semantic models.
When deploying to any environment other than development, you will either have to use a different release pipeline that will modify the connection or modify your release pipeline to modify connections
- There are options for editing the connection of a report/dataset. You can use the PowerShell Fabric command-lets to do this. The catch is that you need to have a really good naming convention in place to make this happen dynamically. (This is still on my to-do list, so I’m sure there will be another blog post coming once I get it done.)

I hope you found this post useful. These are things that I wish I had known before I started, so I thought they might be useful to others. I’m working on anonymizing my code so I can make it available via GitHub. Stay tuned for details.

Where to Store Index DDL

November 23, 2018November 23, 2018 ~ sqlswimmer ~ Leave a comment

Recently I was asked my opinion, via Twitter, on where to store the index DDL for a single database that had multiple clients with varied usage patterns. You can see the whole thread here.

It’s a great question and there were some interesting suggestions made. My approach to this scenario is kind of a hybrid of all the suggestions and comments.

I’ve had to deal with this kind of thing in the past and what I found worked best is to create a single file for each client that contains the client specific DDL. I wish I could take credit for this idea, but it wasn’t mine, it belonged to a co-worker. At first I resisted and thought it was a bad idea. I mean really, mixing DDL for more than one object in a single script just seemed wrong and goes against every fiber of my OCD organizational self. But in the end, this is what worked best in our environment.

Our initial thought was to include our index DDL with the table, but use variables to name the index objects that were specific to the client. This way the index names would never collide, but then that kind of defeated the whole purpose of different indexing strategies for different clients. Thankfully we scrapped that idea before we implemented it.

We tried creating separate files for each table that had client specific DDL in each file. That became a nightmare when it came time to deploy and maintain. We had to build logic in our pre & post deployment scripts to handle that.

Then we tried separating the index DDL files out by client, so we ended up with a bazillion index DDL files for each table. Okay, may not a bazillion, but it was a lot and it was even more of a nightmare to maintain.

We settled on the approach I mentioned earlier, one DDL file per client that held all the DDL that was specific to the client, not specific to any object. We found it was much easier to maintain and deploy. We defaulted each of our client specific DDL files to be NOT included in the build. When it came time to do a build/deploy for a specific client, we would set the option to include the client specific file in the build. We were not using continuous integration, so this may not work if that is what your shop is doing. Or it may work with just a few tweaks to your process. It did work for our situation and it worked well.

I don’t think there is a single correct answer to this question. Like almost everything else in our tech world, the answer is going to be, “it depends”. Figure out what works in your environment and then go with it. It might take a few trial and error runs to get it right, but you’ll figure out what works best over time with a little persistence.

I’d love to hear your thoughts on this.

My Build and Deploy Process (as Requested by Bill Fellows)

November 5, 2015November 4, 2015 ~ sqlswimmer ~ 3 Comments

Recently I attended Reg-Gate’s SQL in the City event in Seattle, WA. I was in Seattle for the annual PASS Summit, you can read about my Summit adventures here. While at RedGate’s event, I attended a session that called on SQL Server Data Tools (SSDT) users. RedGate wanted to get a better handle on what pain points we had in SSDT with respect to source control. I use Team Foundation Server (TFS) as my source control product and it ties in very nicely with SSDT.

After this discussion, Bill Fellows (B | T), asked if I would be willing to blog or speak about my own build and deploy process for databases. Well, given that I am so NOT a speaker type, the seed for this blog post was planted.

I will not be diving into technical detail on how to do all these things (that would be a very large book), but more giving an overview of the features of TFS and why I think they are important and how I use them. Think of this as the 50,000 foot overview of my build and deploy process.

Integration with SSDT (Visual Studio)

Since I don’t want to use umpteen gazillion tools for development, I want my source control to integrate seamlessly with Visual Studio. TFS does this better than any other product I’ve used, probably because they are both Microsoft products. This way I don’t have to use multiple IDEs to work on SSIS, SSRS, SSAS and database development projects. I have one tool with the same source control experience for all.

Design your Branches Accordingly

TFS uses the Branch paradigm to split code out for different development efforts. I like this paradigm, it’s easy to visualize and makes sense to me. Designing your branches is probably the most important part of the source control process. Think of this as your data model, if you get this wrong, you will pay dearly for it in the end. Think about how your organization is structured and how your code moves through the development process. What environments do you have: Development, QA, Staging, Hotfix, etc.? How does your code move through those environments? Is it strictly one-way or can your code move in more than one direction?

Gated Check-ins

Because no matter how many times you tell your developers to do a build locally before checking in their changes, someone will inevitably forget. The last thing you want is bad code getting into your code base. Then you’re left with all your developers sitting around while changes are backed out/corrected, we all know what happens when developers sit around with idle hands. Gives me nightmares just thinking about it.

Automated Builds

This is so important. You most likely have more than one developer working on code. You want to make sure that all those changes they are making are not stomping all over each other and breaking things. Just because developers can get their code past the gated check-in, doesn’t mean it won’t break something else. You should actually be doing this for all your environments, not just development. In a large shop I recently worked in, we scheduled our automated builds twice per day. The first one was for 3 a.m., which allowed enough time for correction before staff came in if a build failed. The second one was at lunch time. This one allowed us a “sneak peek” at the big picture before the nightly processes kicked off. While TFS does provide some default build templates, so many of us have such custom applications and database projects that you may have to learn how to write xaml, I did.

Build Notifications

This is one of my favorite “tattle tale” features of TFS. You can set up notifications to find out when things are checked in successfully, when check-ins fail, when builds fail, all kinds of things. Use this feature. I can’t stress this enough, USE THIS FEATURE!

Power Tools

While TFS has some great features, some of them are a bit hard to navigate/use. This is where Power Tools comes in. It’s available freely for download from MSDN. It makes some great features just a click away, instead of having to write some obtrusive custom code to get what you want – like, who has what checked out in a branch or wild card searching or copying a query or cloning builds, etc. The list is quite extensive.

Default Settings

All of these things don’t really do a lot of good unless you change the default settings for source control in SSDT. One of the biggest bang for your buck settings is to automatically get code when you open a solution. By default this is not enabled, silly I know, but it’s not. The other setting is to check out objects automatically when they are edited. These two settings will make your source code life much easier.

Wrapping it up

I’m not going to lie, getting this all set up in TFS is no small effort. This is your livelihood, treat it as such. Do your research into how your company’s processes currently work and then compare them to how you want them to work. Once you have all that you can come up with a build and deploy process that works for you.

Good luck!

Managing Security – TSQL2sday # 63

February 10, 2015February 5, 2015 ~ sqlswimmer ~ 2 Comments

A big thank you goes out to Kenneth Fisher ( b | t ) for hosting this month’s TSQL2sday party. Security is a big deal. How many times have you opened the paper (I’m dating myself, I know – no one reads an actual newspaper anymore, it’s all online now) in the last 6 months and there’s a story about another security breach, more records compromised or flat out stolen? Too many. While securing your data is probably the key to keeping your current employment status, there’s also a piece of security that is quite often overlooked and could be the reason for a resume generating event. Recovering from a failed server when you don’t use any of the HA features that are now available.

The scenario:
Your production server has failed and you don’t use any of those new fancy HA features like Always On Availability Groups, Log Shipping or even Database Mirroring. Your server hosts a standalone instance for the HR/Payroll department. Payroll must be processed in the next two hours or your company will be out of compliance with Federal Regulations and face heavy fines, not to mention all the really mad employees who won’t get their paychecks on time. I don’t know about you, but I do NOT want to be responsible for every employee not getting a paycheck, including myself.

You have a good backup plan in place, you take full, differential and log backups on a schedule that meets the minimum required data loss SLA and send those backups to a remote SAN data store. Your Sysadmin stands up a new standalone server for you in 30 minutes. You install and configure SQL Server in about 60 minutes (those pesky service packs and cumulative updates can take quite a bit of time). Now you are left with 30 minutes to get your databases restored and functioning. No sweat! Easy as 1..2..3, right? Wrong!

You restore your database only to discover that all your logins no longer exist on your brand new server. No problem, just recreate the logins and give them brand new passwords (SQL Authentication). All will be right with the world. You give your HR/Payroll department the okay to proceed and you catch your breath with 20 minutes to spare. The phone rings 5 minutes later, it’s HR/Payroll and it’s not working. They are getting invalid login errors. You have that momentary flashback to when you helped with the application install 4 years ago – the vendor hard coded the password into their application code, so you can’t just change it or give it a new password. That’s when you remember that you created a job to script the logins with their passwords on a weekly basis and saved the results off to file on that same remote SAN data store as the backups. Hallelujah! You find your script on the remote SAN data store, clean up the logins you created, then execute the script with the logins and their passwords. HR/Payroll is back up and running with 4 minutes to spare.

Paychecks for everyone!

While some of this may seem far-fetched, it’s based on an actual incident very early in my career. I may have embellished a little, but you get the point. You need to make sure you can recreate any login on your server at any time due to disaster/failure. If you can’t, you may just be looking for a new job.

To this day I still script the logins on all my servers on a weekly basis. I store that file in a secure location on a remote server. I’ve never had to use one since this original incident, but it’s nice to know that I can recreate the logins if I ever need to. Can you?

Policy-Based Management, How Sweet It Is

October 24, 2012 ~ sqlswimmer ~ Leave a comment

I have just recently changed employers and in doing so, I’ve changed my role from Data Architect to DBA/BI Developer. I had been doing heads down development work for the last 4 plus years and not a lot of admin work. Given that, I missed out on something that I think is just fantastic. Policy-Based Management (BPM), it was a new feature with SQL Server 2008. I spent a lot of time when I was a DBA figuring out how to monitor my servers. I used to use a combination of home grown scripts and the SQLH2 utility. But PBM makes a DBA’s life so much easier, especially when your shop can’t afford all the cool monitoring tools that are available today.

Basically PBM allows a DBA to manage multiple instances of SQL Server on a consistent basis. You can use one server to create all your policies on and then export/import them on other servers. While this is still better than the hodgepodge of scripts you had before, it’s still a bit cumbersome. Enter Enterprise Policy Management Framework (EPM) from Codeplex. It allows you to extend the PBM via your Central Management Server and provide some very cool SSRS reports. Because EPM is driven by PowerShell, it can be used on SQL Server versions from 2000 to 2012.

I am so excited to be going back to a dual role of DBA/developer, and Policy-Based Management and Enterprise Policy Management Framework make it that much sweeter.

Paying it Forward

August 22, 2012 ~ sqlswimmer ~ 3 Comments

My company has recently put forth an effort to hire fresh college graduates and interns. I think this is a great thing. A long time ago, in a galaxy far far away, I was a fresh graduate, with no real world experience and my first employer took a big chance in hiring me. I was really lucky with my first employer, I had a couple of fabulous mentors that taught me not just technical skills, but life skills, that I still use today. Employers are getting in on the ground floor and it’s in their best interest to invest resources in what will hopefully be a long, successful and rewarding career. “Paying it forward”, as it were.

As a data professional, there are some essentials will make your life so much easier down the road. The first one is Standards and Best Practices. Establish these early and they will become habit. Now I could drone on forever about particular Standards and Best Practices, but that’s not my point. My point is that if you start off your career by adhering to standards and best practices, it won’t come as a shock to you later when they are forced upon you. You may not like the particulars of them, but it won’t feel so icky the first time you have to use them if you are used to following something. In working with our new employees, I always take the time to point them to our standards and explain why we have them. They are always appreciative of this, especially the explanation.

Another essential is code review. This is a big bang for your buck. It’s a learning experience for everyone involved. We use an Agile Methodology in our development life cycle. So our teams are fairly small, they usually consist of two to three general developers a technical architect and a data architect or DBA type person. Following the Agile Methodology, anybody should be able to work on anything in the product/iteration back log. For the general developer, they can usually write T-SQL, but most times it won’t be nearly as efficient (or dare I say pretty) as if the data architect/DBA type person wrote it. Code review allows for the team to become familiar with a task they are not directly involved with and provides valuable feedback to the person who wrote it. They may not be as familiar with the subtleties of the business rules or have no idea that when you use three nested select statements in the predicate you will most likely end up doing a table scan on that table with 30 million records, three times! All they know is that their query returns and it returns the correct answer.

Take the time to “pay it forward”, it can only benefit you in the long run and the person you “paid” will be that much better for it.