Datasets
Datasets are arranged by type. Click on a dataset name to expand its description (including date range) as well as to download a .csv of current data and a .pdf codebook that documents included variables. For information on how to cite this data please visit our how to cite page.
All U.S. datasets include two sets of codes: U.S. Policy Agendas codes (PAP) and the international, Comparative Agendas Project codes (CAP). For analysis across projects and countries, we recommend the CAP codes.
Media
-
Congressional Quarterly Almanac
This dataset contains information from all articles in the main chapters of the CQ Almanac. Each CQ Almanac articles typically covers one legislative initiative; when an article contains information about several different public laws or bills, it is divided so that each record in our dataset contains information about one legislative initiative. Each record is coded according to our policy content scheme. Several other variables concerning each legislative initiative (e.g., bill numbers, Public Law number if applicable, committees involved, primary sponsors, etc.) are also included. Identification variables link our records to the original CQ source material as well as to our Public Laws dataset. A note of caution: article length has varied over the span of this dataset.
The codebook for congressional committee codes is available here.
14444 observations spanning
the years 1948
to 2015
download
dataset
download codebook
-
New York Times Front Page
This dataset, by Amber Boydstun, includes every New York Times front page story from 1996-2006, coded according to the Comparative Agendas Project subtopic scheme. An abbreviated form of the dataset can be found in the link below. The full dataset with all variables is available here. More information about data collection, variables, and topic coding can be found in the data codebook.
Please cite this dataset as:
Boydstun, Amber. 2013. Making the News: Politics, the Media, and Agenda Setting. The University of Chicago Press.
31034 observations spanning
the years 1996
to 2006
download
dataset
download codebook
-
New York Times Index
This dataset is a systematic random sample of the New York Times Index. The sample includes the first entry on every odd-numbered page of the Index. Each entry is coded by CAP and U.S. Policy Agendas major topics and includes other variables such as the length, date and location of the story and whether it addressed government actions.
56910 observations spanning
the years 1946
to 2016
download
dataset
download codebook
-
New York Times Index Weights
This dataset provides information on the number of pages in the New York Times Index and an estimate of the number of articles per page for each of the years included in our Index dataset. These weights address the occasional newspaper format changes that systematically alter the number of articles on each page and the variation in the size of the New York Times and its Index over time.
This dataset is not available in the trends tool.
download
dataset
-
Policy Frames Codebook
Just as the Comparative Agendas Codebook provides issue categorizations that allow apples-to-apples comparisons across policy agendas, the Policy Frames Codebook provides frame categorizations that allow apples-to-apples comparisons across issues.
This dataset is not available in the trends tool.
download
dataset
download codebook
-
TV News Policy Agenda Data
This dataset, by Joe Uscinski, includes over 65,000 TV News stories from the Vanderbilt archive (1968-2010), coded according to an adjusted version of the Policy Agendas major topic coding scheme. More information (sampling, topic codebook, citation, and author contact) can be found in data codebook below. A tabulated dataset by quarter is also available by request.
This dataset is not available in the trends tool.
download
dataset
download codebook
Parliamentary & Legislative
-
Congressional Hearings
This dataset contains information summarizing each U.S. Congressional hearing. Using the Congressional Information Service (CIS) Abstracts, we code each hearing by our system of policy content codes. Other variables, including committee and subcommittee, are also available. Identification variables link our records to the original CIS source material. Note: Research making use of the congressional hearings dataset should bear in mind that the hearings for the last year available on our website are incomplete. This is due to the CIS archival system. Each CIS-year update of the hearings dataset includes hearings from multiple previous years. As a result, please be mindful of which version of the dataset you use (indicated by the year and version number suffix at the end of the file name). For previous versions of the dataset beginning in 2018, please email policyagendas@gmail.com.
The codebook for congressional committee codes is available here.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Congressional Hearings.
103077 observations spanning
the years 1946
to 2021
download
dataset
download codebook
-
Congressional Research Service Reports
This dataset includes reports from the Congressional Research Service (CRS) between 1997 and 2021. CRS provides expert information to members of Congress across a broad range of policy issues. Members of Congress may privately request a report from CRS on a given policy topic.
Please cite this data as:
Fagan, E.J., Bryan D. Jones, Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Congressional Research Service Reports.
16744 observations spanning
the years 1997
to 2021
download
dataset
download codebook
-
Public Law Titles
This dataset contains information about titles within public laws. Beginning in the 1960s, Congress often divided longer laws into subsections, which addressed discrete topic areas. Congress called these subsections, “titles.” A single public law may contain no titles, or several titles that span multiple topic areas.
Each title is coded by our policy content scheme and other variables. Identification variables allow linkage to the CQ Almanac dataset and the Public Laws dataset. The dataset directly links users to the full text (starting with the 104th Congress) and bill summary (starting with the 93rd Congress) information found on THOMAS and other public domain websites. The codebook for congressional committee codes is available here.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Public Law Titles.
36366 observations spanning
the years 1948
to 2022
download
dataset
download codebook
-
Public Laws
This dataset contains information about public laws. Each record is coded by our policy content scheme and other variables. Identification variables allow linkage to the CQ Almanac dataset. The dataset directly links users to the full text (starting with the 104th Congress) and bill summary (starting with the 93rd Congress) information found on THOMAS and other public domain websites. The codebook for congressional committee codes is available here.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Public Laws.
21968 observations spanning
the years 1948
to 2022
download
dataset
download codebook
-
Roll Call Votes
The Congressional Roll Call Voting dataset codes every congressional roll call vote using the Policy Agendas Project content coding system. In addition, this dataset standardizes information from multiple sources into an easily utilized format. As of September 2022, we have streamlined the variables that we collect and offer for download in the RC dataset by relying entirely on voteview.com, the "gold standard" of roll call voting databases used in political research. Our data contain a handful of additional variables beyond what is used by the team at voteview.com, such as US PAP and CAP policy codes along with a detailed description of every vote. We are deeply grateful for the team at voteview.com for letting us use their data.
Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, and Luke Sonnet (2022). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/
59724 observations spanning
the years 1947
to 2024
download
dataset
download codebook
Prime Minister & Executive
-
Executive Orders
This dataset contains information about each executive order issued by the President of the United States. Each record is coded according to our policy content scheme and other variables including the presidents party, whether the order was issued during a time of divided government, and whether the order was issued at the beginning or end of a presidential term.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Rebecca Eissler, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Executive Orders.
4459 observations spanning
the years 1945
to 2021
download
dataset
download codebook
-
Presidential Veto Rhetoric
This dataset, begun by Sam Kernell and extended by Jonathan Lewallen, includes 1618 veto threats made by the President of the United States from 1985-2016, coded according to the Policy Agendas Project subtopic scheme. More information about data collection, variables, and topic coding can be found in the data codebook.
Please cite this data as:
Kernell, Sam, and Jonathan Lewallen.
This dataset is not available in the trends tool.
download
dataset
download codebook
-
State of the Union Speeches
This dataset contains information on each quasi-statement in the Presidential State of the Union Speeches. Each quasi-statement is coded according to our system of policy content categories and other variables. Users can directly link to full text versions of the speech for further analysis. Note: The main file does not include President Carter's outgoing State of the Union. If you would also like that data download this file.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Rebecca Eissler, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: State of the Union Speeches.
25109 observations spanning
the years 1946
to 2025
download
dataset
download codebook
Political Parties
-
Democratic Party Platform
This dataset, compiled by Christina Wolbrecht at the University of Notre Dame, contains information on each quasi-statement in the Democratic party platform. Each quasi-statement is coded according to our system of policy content categories and other variables.
Please cite this data as:
Wolbrecht, Christina, Brooke Shannon, E.J. Fagan, Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Democratic Party Platform.
18680 observations spanning
the years 1948
to 2024
download
dataset
download codebook
-
Republican Party Platform
This dataset, compiled by Christina Wolbrecht at the University of Notre Dame, contains information on each quasi-statement in the Republican party platform. Each quasi-statement is coded according to our system of policy content categories and other variables.
Please cite this data as:
Wolbrecht, Christina, Brooke Shannon, E.J. Fagan, Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Republican Party Platform.
19177 observations spanning
the years 1948
to 2024
download
dataset
download codebook
Judiciary
-
Supreme Court Cases
The Supreme Court dataset contains information on each case on the Courts docket, and is the only publicly available dataset to examine the Courts agenda from a policy perspective. Cases are coded according to policy content and include additional variables such as the Courts ruling in cases in which one was issued. The accompanying codebook addresses Court-specific coding issues and serves as a reference guide for those unfamiliar with the Courts terminology and procedures.
Please cite this data as:
Bird, Christine, Michelle Whyman, Bryan D. Jones, Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Supreme Court Cases
10129 observations spanning
the years 1901
to 2068
download
dataset
download codebook
Budget
-
Budget Authority (Adjusted)
This dataset provides annual data, adjusted for inflation, of U.S. Budget Authority. Using Office of Management and Budget Functions and Subfunctions, we have revised the data to be consistent across time. We utilize the most recent OMB deflator to generate inflation-adjusted variables. For more information about working with budget data, generally, there is a page with resources. In addition to the abridged codebook available below, the comprehensive codebook is available here.
Please cite this data as:
Jones, Bryan D., and Chris Koski. 2017. Policy Agendas Project: Budget Authority
8165 observations spanning
the years 1947
to 2017
download
dataset
download codebook
-
Budget Authority-Policy Crosswalk
This file compares the U.S. Policy Agendas Project topic codes with the OMB codes used in the Budget Authority dataset to assess how well they correspond. A "1" represents nearly complete correspondence, while a "5" represents significant divergence.
Please cite this data as:
Jones, Bryan D., and Chris Koski. 2017. Policy Agendas Project: Budget Crosswalk
This dataset is not available in the trends tool.
download
dataset
-
Budget Outlays
This dataset, compiled by Bryan D. Jones, Frank R. Baumgartner and John Lovett, provides two 'synthetic' series of annual, long-term budget outlays. There is no single series reporting expenditures (outlays) for the US Federal Government since the founding of the Republic. However, two separate data series are available for US Federal Expenditures, compiled by the Treasury Department and Office of Management and Budget. The Treasury Series runs from 1791 to 1970, and the OMB series covers 1940 to the present. From these data sources, two synthetic budget series are constructed by merging data from the US Treasury with data from OMB. The series labeled Treasury Synthetic uses Treasury data from 1791 through 1970, OMB afterward. OMB Synthetic uses Treasury numbers until 1940, OMB afterward. For a complete description of these data sources, methods used to construct the series, and variable descriptions, please see the corresponding codebook.
Please cite this data as:
Jones, Bryan D., Frank R. Baumgartner, and John Lovett. 2015. Policy Agendas Project: Budget Outlays
This dataset is not available in the trends tool.
download
dataset
download codebook
-
Tax Expenditures
The Tax Expenditure data set is based on the Congressional Joint Committee on Taxation’s annual five-year estimates of federal tax expenditures informally referred to as ‘Bluebooks’ and is compiled annually by Christopher Faricy. Contact cgfaricy@syr.edu with any questions.
When using this data, please include the following citation: “The data used here was originally collected by Christopher Faricy. Christopher Faricy does not bear any responsibility for the analyses reported here.”
If just using social welfare data please see below.
Faricy, Christopher G. Welfare for the Wealthy: Parties, Social Spending, and Inequality in the United States. Cambridge University Press, 2015.
640 observations spanning
the years 1979
to 2016
download
dataset
download codebook
Public Opinion & Interest Groups
-
Encyclopedia of Associations
Since 1956, Gale Research, later Thomson/Gale, has published a printed volume entitled the Encyclopedia of Associations. The database on which the book is based also serves as a web-based research tool available through libraries and entitled Associations Unlimited. While not originally designed with the idea of dynamic analysis in mind, the accumulated volumes of the EA in fact allow a researcher considerable opportunity for analyzing trends over time. The Policy Agendas Project (PAP) has used the annual volumes of the EA to compile a time-series database of all associations, coded both by the EA subject categories as well as by the major topics of the PAP. Forty-two editions of the EA have been published from 1956 to 2005. We have compiled a simple list of each group and coded it into the PAP topic classification system. Complete data are available in 5-year intervals from 1970-2005 as well as estimated annual counts for the full period. A description of coverage and important details concerning the lag between reported copyright years and the information they represent is included in the full dataset codebook. Note that as of March 2014, we have implemented a 4 year lag in the annual dataset, with the previous Year variable now listed as CopyrightYear. Below is a link to the annual imputed counts dataset used in the trends analysis tool (with corresponding codebook). The full 1970-2005 dataset (with corresponding codebook) and a recently published article about the dataset is also availible on request. Please email policyagendas@gmail.com.
972 observations spanning
the years 1966
to 2001
download
dataset
download codebook
-
Gallup's Most Important Problem
This dataset contains responses to Gallup's Most Important Problem question aggregated at the annual level and coded by major topic. Years with missing observations (1953/1955) are those in which there were no corresponding MIP data available. Contact us for quarterly MIP data if needed.
For those interested in MIP data at the individual level, Colton Heffington, Brandon Beomseob Park and Laron K. Williams have coded the data according to a number of coding schemes, including CAP. That data is available here.
Pleas cite this data as:
Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A. Epp, Shruti Khandekar, Daniel Little. 2025. Policy Agendas Project: Most Important Problem
1575 observations spanning
the years 1947
to 2023
download
dataset
download codebook
-
Policy Moods
The policy specific moods data set, compiled by James A. Stimson and K. Elizabeth Coggins, was created to supplement the traditional Global Mood measure in an effort to provide scholars with as many policy specific mood measures as possible. The global mood database, which consists of nearly 400 survey questions and almost 8,000 administrations across 70 years, was disaggregated to generate longitudinal measures of public opinion in specic policy domains. By matching each survey item with a policy code from the Policy Agendas Project coding scheme, it was possible to estimate 61 unique series as well as five additional series relating to abortion and gay rights spanning 1940 to 2015. More information about survey items, administrations and time periods can be found in the corresponding data codebook.To access the custom mood application, please visit this page. This application allows users to generate custom mood series from the policy moods data. Users can select any combination of survey questions found in the Policy Agendas Project’s Policy Moods dataset, and create customized series on demand.
Please cite data as:
Stimson, James A., and K. Elizabeth Coggins. 2023.
3143 observations spanning
the years 1940
to 2018
download
dataset
download codebook
download master topics
codebook