Pittsburgh PLI/DOMI Violations Report

This dataset originally housed Department of Permits, Licenses, and Inspections violations (2015-2020), and has now been expanded to include violations logged by other units (including DOMI) from 2020-06-01 until the present. These data are used to manage and track all updates to casefiles by city employees and can be used to understand when citations/investigations/court proceedings are issued, the nature and location of the violation, and the status of the casefile at any point in time. By using addresses or parcel numbers, which are contained in these data, users can also display information on geospatial maps.

Collection/Interpretation

It is important to understand the distinction between violations and casefiles, and how updates to a casefile are represented in the dataset. A casefile refers to one or more violations. When an initial investigation is conducted each of these violations is recorded separately. The investigation will result in a new status for all of violations (“VIOLATIONS FOUND”). The subject of the investigation will be informed of this outcome and must address the problem(s). There will be a follow-up inspection at this point, and depending on the results, further steps will be taken (follow-up investigations, criminal complaints issued, court proceedings, etc.)

Each violation for each casefile is represented as a unique row in the dataset. As explained above, there will be a minimum of two updates for each violation (the initial and follow-up investigation). Though the investigation of all violations in a casefile is conducted simultaneously, each investigation is represented as a unique row. Thus, for a property with three violations there will be a minimum of six rows (both investigations for each violation). It is possible to track the entire case history by observing all rows for each casefile.

Each violation is cited according to the violation_code_section field.

The casefile_number represents the only UUID for each casefile (the entire group of violations). By using the casefile_number and violation_code_section fields in combination, one can track the history each violation for a given casefile. Combining the above fields with investigation_date renders a UUID for each record.

DOMI (the Department of Mobility and Infrastructure), PLI, and Environmental Services all use this system to log violations (though publication of Environmental Services violations is still pending). In most cases, the department involved in the casefile can be extracted from the casefile_number field (beginning with the 4th character). For instance, a casefile_number like CF-PLI-2021-025422, represents a violation reported by PLI. The remaining casefile IDs start with "O-"; these are PLI violation codes from an old ticketing system.

Preprocessing/Formatting

All string text (most fields) were converted to UPPERCASE data. The data are manually entered and often contain non-uniform formatting. While several solutions for cleaning the data exist, including allowing the user to clean the data after accessing it here, text field values were transformed to UPPERCASE to ensure the data were uniformly formatted in this case. Future improvements to this ETL pipeline may approach this problem with a more sophisticated technique.

Data and Resources

Field Value
accessLevel public
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier d660edf8-9157-45ad-a282-50822badfaae
modified 2023-05-14T07:45:14.554373
publisher City of Pittsburgh
resource-type Dataset
source_datajson_identifier true
source_hash 9545547d992daaa3fe3add690e67cb81ce8cc912
source_schema_version 1.1
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • AmeriGEO
  • AmeriGEOSS
  • CKAN
  • GEO
  • GEOSS
  • National
  • North America
  • United States
  • _etl
  • building-code-violation
  • business
  • code-violation
  • commercial
  • inspections
  • licenses
  • permits
  • pli
  • property-maintenance-code
  • residential
  • uniform-property-maintenance-code
  • violations
isopen False
license_id notspecified
license_title License not specified
maintainer Western Pennsylvania Regional Data Center
maintainer_email ip.analytics@pittsburghpa.gov
metadata_created 2025-09-24T18:34:41.561731
metadata_modified 2025-09-24T18:34:41.561743
notes This dataset originally housed Department of Permits, Licenses, and Inspections violations (2015-2020), and has now been expanded to include violations logged by other units (including DOMI) from 2020-06-01 until the present. These data are used to manage and track all updates to casefiles by city employees and can be used to understand when citations/investigations/court proceedings are issued, the nature and location of the violation, and the status of the casefile at any point in time. By using addresses or parcel numbers, which are contained in these data, users can also display information on geospatial maps. ## Collection/Interpretation It is important to understand the distinction between violations and casefiles, and how updates to a casefile are represented in the dataset. A casefile refers to one or more violations. When an initial investigation is conducted each of these violations is recorded separately. The investigation will result in a new status for all of violations (“VIOLATIONS FOUND”). The subject of the investigation will be informed of this outcome and must address the problem(s). There will be a follow-up inspection at this point, and depending on the results, further steps will be taken (follow-up investigations, criminal complaints issued, court proceedings, etc.) Each violation for each casefile is represented as a unique row in the dataset. As explained above, there will be a minimum of two updates for each violation (the initial and follow-up investigation). Though the investigation of all violations in a casefile is conducted simultaneously, each investigation is represented as a unique row. Thus, for a property with three violations there will be a minimum of six rows (both investigations for each violation). It is possible to track the entire case history by observing all rows for each casefile. Each violation is cited according to the `violation_code_section` field. The `casefile_number` represents the only UUID for each __casefile__ (the entire group of violations). By using the `casefile_number` and `violation_code_section` fields in combination, one can track the history each violation for a given casefile. Combining the above fields with investigation_date renders a UUID for each record. DOMI (the Department of Mobility and Infrastructure), PLI, and Environmental Services all use this system to log violations (though publication of Environmental Services violations is still pending). In most cases, the department involved in the casefile can be extracted from the `casefile_number` field (beginning with the 4th character). For instance, a `casefile_number` like CF-**PLI**-2021-025422, represents a violation reported by PLI. The remaining casefile IDs start with "O-"; these are PLI violation codes from an old ticketing system. ## Preprocessing/Formatting All string text (most fields) were converted to UPPERCASE data. The data are manually entered and often contain non-uniform formatting. While several solutions for cleaning the data exist, including allowing the user to clean the data after accessing it here, text field values were transformed to UPPERCASE to ensure the data were uniformly formatted in this case. Future improvements to this ETL pipeline may approach this problem with a more sophisticated technique.
num_resources 5
num_tags 21
title Pittsburgh PLI/DOMI Violations Report