Uploading PDF's to SharePoint Online
The problem...
I have something like 1200 PDfs, and the list is growing, of deliverables that we own the copywrite to that we want to put in a central place called SharePoint Online. This library will then get the benefit of being searchable, have associated metadata, and should free up time from a bunch of people for having to search for this information manually.
The solution...
This took a little while to get to, and some of the iterative steps that transpired between inception and solution are long since gone. That's OK, that's how the process worked. I had to laugh at myself a bit when I started searching for some solutions to the problems I ran into, and found my own posts on the topic...
Anyhow, on to the general program idea.
I don't know where all the content we were going to reference is, but our amazing team of office administrators do. They wound up collecting shortcuts (like, Windows document shortcuts) in a folder that referenced the files they were wanting to use. The shortcuts would be named a standard way (project number, deliverable type, and an index counter). This same identifier would go into a spreadsheet - along with the associated metadata, collected by our equally amazing records management person.
That left me needing to collect data from a spreadsheet (hello ImportExcel), and read data from a Shortcut .lnk file (Get-Shortcut
from PoshFunctions.)
So - I'd read the content of the excel file, look for a corresponding shortcut file. If the shortcut target existed, watermark the PDF (using itext), then pack all this up and upload it to SharePoint Online using PnP Powershell.
Easy.
Tools / Resources used
- ImportExcel PowerShell module - https://github.com/dfinke/ImportExcel
- PoshFunctions PowerShell module - https://github.com/riedyw/PoshFunctions
- SharePoint PnP PowerShell module - (https://pnp.github.io/powershell
- iText7 dotNET library - https://itextpdf.com/products/itext-7
- Watermarking example - https://kb.itextpdf.com/home/it7kb/examples/watermark-examples
Notes from the travels along the way
Most of this was straightforward. Outside of some variable definitions, my main function is pretty straightforward:
1function Load-FinalReports() {
2 [CmdletBinding(SupportsShouldProcess = $true)]
3 param(
4 # How many records to load
5 [Parameter()]
6 [Int]
7 $HowManyToLoad = 100,
8
9 # Validate only. Do not post to SPO.
10 [Parameter()]
11 [Switch]
12 $ValidateOnly
13 )
14
15 $Metadata =
16 $ShortcutsPath =
17 $SPOUploadList = join-path $PSScriptRoot "ReportsAddedToSPO.xml"
18 $URL =
19 $Library = "Final Reports"
20
21 # Load content from sheet
22 $SheetData = Get-FinalReportSheetContent -XLSFilename $Metadata -SheetName "Ready to Add"
23
24 # Get shortcut target information
25 $RecordsToUpload, $RecordsWithErrors = Validate-SheetLinks -SheetInfo $SheetData -ShortcutsPath $ShortcutsPath
26
27 # Show errors. These are Marilyn's problems.
28 write-warning "Records with errors:"
29 $RecordsWithErrors | select-object ID, ErrorType | Out-String | Write-Output
30
31 if (!$ValidateOnly) {
32 # import the data we think we've uploaded to SPO
33 $LoadedToSPO = [System.Collections.ArrayList]@()
34 if (test-path $SPOUploadList) {
35 $LoadedToSPO = [System.Collections.ArrayList]@(import-clixml -Path $SPOUploadList)
36 }
37
38 # Remove the data we've already uploaded from our list of content to upload
39 $RemoveFromDataToLoad = [System.Collections.ArrayList]@()
40 foreach ($Record in $RecordsToUpload) {
41 if ($Record.ID -in $LoadedToSPO.ID) {
42 write-verbose "$($Record.ID) has already been uplaoded?"
43 $RemoveFromDataToLoad.Add($Record) | out-null
44 }
45 }
46 foreach ($Record in $RemoveFromDataToLoad) {
47 $RecordsToUpload.Remove($Record)
48 }
49
50 # Do the uploading.
51 Connect-PnPOnline -ClientId $_CLIENT_ID -Url $URL -Tenant $_TENANT -CertificatePath $_CERTIFICATE
52
53 $Counter = 0
54 foreach ($Record in $RecordsToUpload) {
55 $Counter++
56 if ($Counter -gt $HowManyToLoad) {
57 break
58 }
59 Write-Progress -Activity "Loading to SharePoint" -Status "Record $($Counter) of $($RecordsToUpload.Count)" -PercentComplete (100 * $Counter / $RecordsToUpload.Count)
60 $LoadState = Add-SPORecord -RecordToLoad $Record -Library $Library
61 if ($LoadState) {
62 $LoadedToSPO.Add($Record) | out-null
63 }
64 }
65 Write-Progress -Activity "Loading to SharePoint" -Status "Complete" -Completed
66 $LoadedToSPO | Export-Clixml -Path $SPOUploadList -WhatIf:$false
67 }
68}
69
70Load-FinalReports -HowManyToLoad 5000
A few parameters exist: -HowManyToLoad
should be pretty clear - it limits how many records I am importing at a given time. This only applies when the records are getting loaded to SharePoint Online, and not when collecting data from the XLS or shortcuts.
The function Get-FinalReportSheetContent
does what it says on the tin, except in the function, I collect a hash of the XLS file and save it. On subsequent runs, I'll compare the saved hash against the current file hash, and if it's mismatched, I reload data from the XLS file. If it's the same, I instead load the content from an existing XML file (that I write to disk at the end of the function). This saves some time during import in the event that the data remains unchanged.
Validate-SheetLinks
also does what it says on the tin, but it returns two arrays - successful data and error data. Successful data is the information passed in from Get-FinalReportSheetContent
with added properties (the target path for the file to be uploaded), while the error data generates a list of IDs that either have a problem with the shortcut, or the target file is not a PDF. Both these error conditions require manual intervention.
This exercise taught me that I can return multiple values from a function in PowerShell.
We do some checks and manipulation of the array of data to be uploaded (if we think the file has already been uploaded, we'll skip it), then connect to SharePoint Online using PnP, and an AzureAD Application, and start uploading content.
This is where stuff gets spicy.
Taxonomy from a Term Store
We decided early on to put several terms into the site's Term Store instead of an individual's library taxonomy terms. This would (should?) enable some common data between different libraries within the same site.
This poses a small challenge when adding content. The formatting of a value for the term store is Term Group|Termset|Term
- so we have our group is "Project Data", Termset is "Author" and the Term is each individual author. The metadata in the spreadsheet has this information as comma separated, so a small function was written to handle this conversion:
1function Fix-AuthorsForTermStore($TextString) {
2 $Authors = $null
3 if ($TextString.Count -gt 0) {
4 $Authors = $TextString -split ','
5 $Authors = $Authors.Trim()
6 $Authors = $Authors | ForEach-Object { if ($_.Length -gt 0) { "Project Data|Authors|$($_)" } }
7 } else {
8 $Authors = $null
9 }
10 return $Authors
11}
Also to note, that if a term was missing from the term store, the record would still get created in SharePoint Online, but the insertion would generate a warning. I'm sure there's a way to capture these errors, but I opted for the manual method (read the console output).
Improperly formatted metadata
In the early stages, we assumed that some of the metadata would be a single line. This turned out not to be true and, during the record insertion, would cause all the metadata to fail. The file would load, but all the field (except for the uploaded file) would be blank. This surfaced in a few places (single line / multiline), but also in an early bug in the Fix-AuthorsForTermStore
function where the term would sometime return blank instead of null.
Watermarking
Initial Setup
This turned out taking a lot more time and energy than expected, but the output turned out to be right on point. I'm using the iText7 C# libraries to do this work for me, which needs a bit of work get running in PowerShell (see Nuget in PowerShell).
So.. I bring in all the libraries using something like this:
1Install-Package -Name itext7 -ProviderName nuget -RequiredVersion 7.1.17 -Scope CurrentUser -Destination ./lib -SkipDependencies
2Install-Package -Name Portable.BouncyCastle -ProviderName nuget -RequiredVersion 1.8.9.0 -Destination ./lib -SkipDependencies
3Install-Package -Name Common.Logging -ProviderName nuget -RequiredVersion 3.4.1.0 -Destination ./lib -SkipDependencies
4Install-Package -Name Common.Logging.Core -ProviderName nuget -RequiredVersion 3.4.1.0 -Destination ./lib -SkipDependencies
I'm staying in the 7.1 branch of iText7, since this allows me to use the .NET 4.0 compiled DLL's. 7.2+ is compiled for 6.5 and above (I think).
The result here is that all the libraries I need are in the .\lib
folder. Next, I need to import them. The old iText libraries were a few DLLs, but in newer versions all the pieces are split into multiple parts. Importing all the DLL's can be accelerated with something like:
1$DLLFiles = get-childitem -path (join-path $PSScriptRoot "lib") -Filter "*.dll" -Recurse | Where-Object { $_.DirectoryName -like "*net40*" -or $_.DirectoryName -like "*net45*" }
2foreach ($DLLFile in $DLLFiles ) {
3 try {
4 Add-Type -Path $DLLFile.FullName
5 # Write-Host $DLLFile.FullName
6 } catch {
7 # write-warning $DLLFile.FullName
8 }
9}
Making it easier to use in PowerShell
So, that brings all the libraries into the environment, but I would then need to reference the fully qualified name of the library - like iText.Kernel.Pdf.PdfReader
. This was a bit tedious, but a Reddit post helped solve this. By using the ..using
operator (which I didn't know existed), I am able to reference PdfReader
without the full namespace:
1using namespace iText.IO.Font;
2using namespace iText.IO.Font.Constants;
3using namespace iText.Kernel.Colors;
4using namespace iText.Kernel.Font;
5using namespace iText.Kernel.Pdf;
6using namespace iText.Kernel.Pdf.Canvas;
7using namespace iText.Kernel.Pdf.Extgstate;
8using namespace iText.Layout;
9using namespace iText.Layout.Element;
10using namespace iText.Layout.Properties;
11using namespace iText.Commons;
This has to exist at the top of my script, but otherwise does the trick. As an alternative, you can add custom accelerators:
1$Accelerators = [Psobject].Assembly.GetType('System.Management.Automation.TypeAccelerators')
2$Accelerators::Add("Rectangle", "iText.Kernel.Geom.Rectangle")
3$Accelerators::Add("BadPasswordException", "iText.Kernel.Crypto.BadPasswordException")
Applying the watermark
The iText site has some great examples of watermarking, found here: https://kb.itextpdf.com/home/it7kb/examples/watermark-examples. This was the foundation of the script that I wrote, with some modifications.
I wanted the watermark to be corner to corner, instead of printed horizontally along the page. I also wanted the font to be "as big as possible". Reading https://kb.itextpdf.com/home/it7kb/faq/how-to-choose-the-optimal-size-for-a-font was a bit unclear on find the font size, but wound up being computed like so:
1$fontSize = 1000 * [math]::Sqrt([math]::pow($pageSize.GetWidth(),2) +[math]::pow($pageSize.GetHeight(),2))*.8 / $font.GetWidth($textString)
2[Paragraph]$Paragraph = [Paragraph]::new($textString)
3$Paragraph.SetFont($Font).SetFontSize($fontSize)
4"Font size is $($fontSize)"
Simply put, I'm finding how long the angled line is Pythagorean theorem, multiplying by 1000 then by 0.8 (so it's 80% maximum widdth), then dividing it by $font.GetWidth($textstring)
(which has already been defined with a font face).
Dealing with passwords
Some of the PDFs may have had passwords on them; turns ou the iText reader object had a method to just.. ignore passwords:
1[PdfReader]$PDFInput = [PdfReader]::new($inputFile)
2$PDFInput.SetUnethicalReading($true)
Which is really quite nice, and makes having to handle the password (which may be different...) really nice. At the tail end, I do want to re-password protect the file with the original restrictions:
1$enc = [system.Text.Encoding]::UTF8
2
3[WriterProperties]$writerProperties = [WriterProperties]::new()
4$writerProperties.SetStandardEncryption($null, $enc.GetBytes("supersecurepassword"), [EncryptionConstants]::ALLOW_PRINTING -bor [EncryptionConstants]::ALLOW_MODIFY_ANNOTATIONS -bor [EncryptionConstants]::ALLOW_SCREENREADERS, [EncryptionConstants]::ENCRYPTION_AES_128)
5
6[Pdfwriter]$PDFOutput = [PdfWriter]::new($outputFile, $writerProperties)
Note how all the encryption options are bitwise OR'd together.
All wrapped up
The function, and required setup, can be found at the end of this post!
Final Notes
During testing, I accidentally unset the textString
parameter. This generated an error somewhere around line 68 - on the canvasWatermark3
:
1Exception calling "ShowTextAligned" with "7" argument(s): "Object reference not set to an instance of an object."
2At Watermark PDF\test.ps1:68 char:4
3+ $canvasWatermark3.ShowTextAligned($Paragraph, $x, $y, $i, ...
This was a bit of red herring - the actual issue is that $Paragraph
is undefined, not $canvasWatermark3
.
Final script
All said and done, the final script takes the following shape. It isn't perfect (some variables in weird places, some stuff not a variable that should be, etc), but it did the trick.
1using namespace iText.IO.Font;
2using namespace iText.IO.Font.Constants;
3using namespace iText.Kernel.Colors;
4using namespace iText.Kernel.Font;
5using namespace iText.Kernel.Pdf;
6using namespace iText.Kernel.Pdf.Canvas;
7using namespace iText.Kernel.Pdf.Extgstate;
8using namespace iText.Layout;
9using namespace iText.Layout.Element;
10using namespace iText.Layout.Properties;
11using namespace iText.Commons;
12
13#Requires -Modules ImportExcel, PoshFunctions
14
15$_CLIENT_ID =
16$_TENANT =
17$_CERTIFICATE = join-path $PSScriptRoot "Certificate for connecting to SPO.pfx"
18
19# Load library files
20$DLLFiles = get-childitem -path (join-path $PSScriptRoot "lib") -Filter "*.dll" -Recurse | Where-Object { $_.DirectoryName -like "*net40*" -or $_.DirectoryName -like "*net45*" }
21foreach ($DLLFile in $DLLFiles ) {
22 try {
23 Add-Type -Path $DLLFile.FullName
24 # Write-Host $DLLFile.FullName
25 } catch {
26 # write-warning $DLLFile.FullName
27 }
28}
29
30# Type accellerators so I don't have to wreite the fully qualified name each time.
31$Accelerators = [Psobject].Assembly.GetType('System.Management.Automation.TypeAccelerators')
32$Accelerators::Add("Rectangle", "iText.Kernel.Geom.Rectangle")
33$Accelerators::Add("BadPasswordException", "iText.Kernel.Crypto.BadPasswordException")
34
35function Set-Watermark($inputFile, $outputFile, $textString = "For Internal Use Only") {
36 $Opacity = 0.25
37
38 $enc = [system.Text.Encoding]::UTF8
39
40 [WriterProperties]$writerProperties = [WriterProperties]::new()
41 $writerProperties.SetStandardEncryption($null, $enc.GetBytes("LibrarY07"), [EncryptionConstants]::ALLOW_PRINTING -bor [EncryptionConstants]::ALLOW_MODIFY_ANNOTATIONS -bor [EncryptionConstants]::ALLOW_SCREENREADERS, [EncryptionConstants]::ENCRYPTION_AES_128)
42
43 [PdfReader]$PDFInput = [PdfReader]::new($inputFile)
44 $PDFInput.SetUnethicalReading($true)
45
46 try {
47 [Pdfwriter]$PDFOutput = [PdfWriter]::new($outputFile, $writerProperties)
48
49 [PdfDocument]$pdfDoc = [PdfDocument]::new($PDFInput, $PDFOutput)
50
51 [PdfFont]$Font = [PdfFontFactory]::CreateFont([FontProgramFactory]::CreateFont([StandardFonts]::HELVETICA))
52
53 [PdfExtGState]$gs1 = [PdfExtGState]::new()
54 $gs1.SetFillOpacity($Opacity)
55
56 for ($i = 1; $i -le $pdfDoc.GetNumberOfPages(); $i++) {
57 [PdfPage]$pdfPage = $pdfDoc.GetPage($i);
58 [Rectangle]$pageSize = $pdfPage.GetPageSizeWithRotation();
59
60 # When "true": in case the page has a rotation, then new content will be automatically rotated in the
61 # opposite direction. On the rotated page this would look as if new content ignores page rotation.
62 $pdfPage.SetIgnorePageRotationForContent($true);
63
64 $x = ($pageSize.GetLeft() + $pageSize.GetRight()) / 2;
65 $y = ($pageSize.GetTop() + $pageSize.GetBottom()) / 2;
66 [PdfCanvas]$over = [PdfCanvas]::new($pdfDoc.GetPage($i));
67 $over.SaveState();
68 $over.SetExtGState($gs1);
69
70 $fontSize = 1000 * [math]::Sqrt([math]::pow($pageSize.GetWidth(), 2) + [math]::pow($pageSize.GetHeight(), 2)) * .8 / $font.GetWidth($textString)
71 [Paragraph]$Paragraph = [Paragraph]::new($textString)
72 $Paragraph.SetFont($Font).SetFontSize($fontSize)
73 "Font size is $($fontSize)"
74
75 [Canvas]$canvasWatermark3 = [Canvas]::new($Over, $PageSize)
76 $canvasWatermark3.ShowTextAligned($Paragraph, $x, $y, $i, [TextAlignment]::CENTER, [VerticalAlignment]::TOP, -45);
77 $canvasWatermark3.Close()
78
79 $over.RestoreState();
80 }
81 $pdfDoc.Close()
82 } catch [BadPasswordException] {
83 Write-Warning "Incorrect password specified for file."
84 } catch [System.IO.IOException] {
85 write-warning "Could not open the output file for writing. Is it open in Adobe?"
86
87 } finally {
88
89 $PDFInput.Close()
90 try { $PDFOutput.Close() } catch {}
91 try { $pdfDoc.Close() } catch {}
92 }
93}
94
95function Fix-AuthorsForTermStore($TextString) {
96 $Authors = $null
97 if ($TextString.Count -gt 0) {
98 $Authors = $TextString -split ','
99 $Authors = $Authors.Trim()
100 $Authors = $Authors | ForEach-Object { if ($_.Length -gt 0) { "Project Data|Authors|$($_)" } }
101 } else {
102 $Authors = $null
103 }
104 return $Authors
105}
106
107function Get-FinalReportSheetContent {
108 [CmdletBinding()]
109 param (
110 # XLS File
111 [Parameter(Mandatory)]
112 [String]
113 $XLSFilename,
114
115 # Sheet Name
116 [Parameter(Mandatory)]
117 [String]
118 $SheetName,
119
120 # Force
121 [Parameter()]
122 [Switch]
123 $Force
124 )
125
126 process {
127 $XLSFile = get-item $XLSFilename
128
129 $CacheFileMD5 = join-path $PSScriptRoot "$($XLSFile.BaseName).cacheMD5.txt"
130 $CacheFile = join-path $PSScriptRoot "$($XLSFile.BaseName).cache.xml"
131
132 if ($Force) {
133 remove-item $CacheFileMD5
134 remove-item $CacheFile
135 }
136
137 # Get the MD5 of the current file
138 $CurrentFileMD5 = (Get-FileHash -Path $XLSFilename).Hash
139 $CacheMD5 = get-content -Path $CacheFileMD5 -ErrorAction SilentlyContinue
140
141 $XLSContent = [System.Collections.ArrayList]@()
142
143 # Check that the MD5's match.
144 if ($CurrentFileMD5 -eq $CacheMD5) {
145 $XLSContent = Import-Clixml $CacheFile
146 } else {
147 # MD5 do not match. Remove the old cache file
148 # This may not exist already. That's OK.
149 Remove-Item $CacheFile -ErrorAction SilentlyContinue
150
151 $Counter = 0
152 $XLSData = import-excel $XLSFile.FullName -WorksheetName $SheetName
153 $FilteredData = $XLSData | Where-Object { $null -ne $_."ID" }
154
155 foreach ($Record in $FilteredData) {
156 $Counter++
157 Write-Progress -Activity "Importing records from XLS" -Status "Record $($Counter) of $($FilteredData.Count)" -PercentComplete (100 * $Counter / $FilteredData.Count)
158 $UniqueIDs = [System.Collections.ArrayList]@(($Record.'ID' -split ',').Trim())
159 foreach ($uniqueID in $uniqueIDs) {
160 $RecordToLoad = $Record | ConvertTo-Json | ConvertFrom-Json
161 $XLSContent.Add($RecordToLoad) | out-null
162 }
163 }
164 Write-Progress -Activity "Importing records from XLS" -Status "Ready" -Completed
165
166 # Save the cache file
167 $XLSContent | Export-Clixml -Path $CacheFile
168 $CurrentFileMD5 | Set-Content -Path $CacheFileMD5 -Force
169 }
170
171 return $XLSContent
172 }
173}
174
175function Validate-SheetLinks {
176 [CmdletBinding()]
177 param (
178 # Sheet Information
179 [Parameter(Mandatory)]
180 [Object[]]
181 $SheetInfo,
182
183 # Shortcuts Path - Folder containing shortcuts
184 [Parameter(Mandatory)]
185 [String]
186 $ShortcutsPath
187 )
188
189 process {
190 $ShortcutsWithProblems = [System.Collections.ArrayList]@()
191 $FilesReadyToUpload = [System.Collections.ArrayList]@()
192 $Counter = 0
193 foreach ($Record in $SheetInfo) {
194 $Counter++
195 Write-Progress -Activity "Validating Shortcuts" -Status "Record $($Counter) of $($SheetInfo.Count)" -PercentComplete (100 * $Counter / $SheetInfo.Count)
196 $ShortCutFile = Join-Path $ShortcutsPath "$($Record.ID).lnk"
197
198 # Test if the file exists
199 $FoundFile = test-path $ShortCutFile
200 if (!$FoundFile) {
201 $Record | Add-Member -MemberType NoteProperty -Name "ErrorType" -Value "Shortcut target does not exist"
202 $ShortcutsWithProblems.Add($Record) | out-null
203 continue
204 }
205
206 # Test if the file is a PDF
207 $FileObject = get-item ((Get-Shortcut -Path $ShortCutFile).TargetPath)
208 if ($FileObject.Extension -ne ".pdf") {
209 $Record | Add-Member -MemberType NoteProperty -Name "ErrorType" -Value "Shortcut target is not PDF"
210 $ShortcutsWithProblems.Add($Record) | out-null
211 continue
212 }
213
214 $Record | Add-Member -MemberType NoteProperty -Name TargetFilePath -Value ((Get-Shortcut -Path $ShortCutFile).TargetPath)
215 $FilesReadyToUpload.Add($Record) | Out-Null
216 }
217 Write-Progress -Activity "Validating Shortcuts" -Status "Done" -Completed
218
219 return $FilesReadyToUpload, $ShortcutsWithProblems
220 }
221}
222
223function Add-SPORecord {
224 [CmdletBinding(SupportsShouldProcess = $true)]
225 param (
226 # Record
227 [Parameter(Mandatory)]
228 [Object]
229 $RecordToLoad,
230
231 # Library
232 [Parameter(Mandatory)]
233 [String]
234 $Library
235 )
236
237 process {
238 $AuthorsArray = Fix-AuthorsForTermStore ($RecordToLoad."Authors")
239 $ReviewersArray = Fix-AuthorsForTermStore ($RecordToLoad."Reviewers")
240
241 if ($PSCmdlet.ShouldProcess($RecordToLoad.ID, "Add")) {
242 $ValuesHashMap = @{
243 Project = $RecordToLoad."Project";
244 Job = $RecordToLoad."Job";
245 Year = $RecordToLoad."Year";
246 _ExtendedDescription = $RecordToLoad."Report Title";
247 Client = $RecordToLoad.Client;
248 Date = $RecordToLoad."Month/Day";
249 PDF = (Get-Culture).TextInfo.ToTitleCase($RecordToLoad.'PDF');
250 ReportType = $RecordToLoad."Report Type";
251 Notes = $RecordToLoad.Note;
252 Notice = $RecordToLoad.Notice;
253 ExecSummary = $RecordToLoad."Exec Summary"
254 }
255 if ($null -ne $AuthorsArray) {
256 $ValuesHashMap."Authors" = @($AuthorsArray)
257 }
258 if ($null -ne $ReviewersArray) {
259 $ValuesHashMap."Reviewers" = @($ReviewersArray)
260 }
261
262 WRite-verbose $RecordToLoad.TargetFilePath
263 write-host "ID: $($RecordToLoad.ID)"
264 $ValuesHashMap | Out-String | write-verbose
265
266 $SourceFileObject = get-item $RecordToLoad.TargetFilePath
267 $WatermarkedFilePath = join-path ($env:TEMP) $SourceFileObject.Name
268 Set-Watermark -inputFile $RecordToLoad.TargetFilePath -outputFile $WatermarkedFilePath
269
270 $AddFileResult = Add-PnPFile -Path $WatermarkedFilePath -Folder $Library -Values $ValuesHashMap
271
272 Remove-item $WatermarkedFilePath
273
274 return $AddFileResult
275 }
276 return $null
277 }
278}
279
280function Load-FinalReports() {
281 [CmdletBinding(SupportsShouldProcess = $true)] # In function header
282 param(
283 # How many records to load
284 [Parameter()]
285 [Int]
286 $HowManyToLoad = 100,
287
288 # Validate only. Do not post to SPO.
289 [Parameter()]
290 [Switch]
291 $ValidateOnly
292 )
293
294 $Metadata = Join-Path $PSScriptRoot '.\list.xlsx'
295 $ShortcutsPath =
296 $SPOUploadList = join-path $PSScriptRoot "ReportsAddedToSPO.xml"
297 $URL =
298 $Library = "Final Reports"
299
300 # Load content from sheet
301 $SheetData = Get-FinalReportSheetContent -XLSFilename $Metadata -SheetName "Ready to Add"
302
303 # Get shortcut target information
304 $RecordsToUpload, $RecordsWithErrors = Validate-SheetLinks -SheetInfo $SheetData -ShortcutsPath $ShortcutsPath
305
306 # Show errors. These are Marilyn's problems.
307 write-warning "Records with errors:"
308 $RecordsWithErrors | select-object ID, ErrorType | Out-String | Write-Output
309
310 if (!$ValidateOnly) {
311 # import the data we think we've uploaded to SPO
312 $LoadedToSPO = [System.Collections.ArrayList]@()
313 if (test-path $SPOUploadList) {
314 $LoadedToSPO = [System.Collections.ArrayList]@(import-clixml -Path $SPOUploadList)
315 }
316
317 # Remove the data we've already uploaded from our list of content to upload
318 $RemoveFromDataToLoad = [System.Collections.ArrayList]@()
319 foreach ($Record in $RecordsToUpload) {
320 if ($Record.ID -in $LoadedToSPO.ID) {
321 write-verbose "$($Record.ID) has already been uplaoded?"
322 $RemoveFromDataToLoad.Add($Record) | out-null
323 }
324 }
325 foreach ($Record in $RemoveFromDataToLoad) {
326 $RecordsToUpload.Remove($Record)
327 }
328
329 # Do the uploading.
330 Connect-PnPOnline -ClientId $_CLIENT_ID -Url $URL -Tenant $_TENANT -CertificatePath $_CERTIFICATE
331 $Counter = 0
332 foreach ($Record in $RecordsToUpload) {
333 $Counter++
334 if ($Counter -gt $HowManyToLoad) {
335 break
336 }
337 Write-Progress -Activity "Loading to SharePoint" -Status "Record $($Counter) of $($RecordsToUpload.Count)" -PercentComplete (100 * $Counter / $RecordsToUpload.Count)
338 $LoadState = Add-SPORecord -RecordToLoad $Record -Library $Library
339 if ($LoadState) {
340 $LoadedToSPO.Add($Record) | out-null
341 }
342 }
343 Write-Progress -Activity "Loading to SharePoint" -Status "Complete" -Completed
344 $LoadedToSPO | Export-Clixml -Path $SPOUploadList -WhatIf:$false
345 }
346}
347
348Load-FinalReports -HowManyToLoad 5000