Data Structures

Vision Zero

To prepare the Vision Zero dataset, we must combine an input file of traffic injuries with an input file of traffic fatalities. Each row of the input files contains monthly counts of the number of people injured or killed at each intersection.

But in months when an injury or fatality does not occur at an intersection, the input files do not contain a "zero" value for that intersection. We need those zeroes for our comparison because we want to measure the change over time and by intersection.

To identify those "zeroes," we first need a list of intersections. We will have to compile that list from the input files themselves.

This method of compiling the list excludes intersections that did not have a single injury or fatality over the nine years of data, but those are not the intersections of interest. The ones that we are most concerned about are the ones where injuries and fatalities have occurred.

One advantage of developing the intersection list from the input files is that we can develop the list as we read in the data.

We just store the counts from both input files in the same hash, using the intersection identifier ("node ID") as the first hash key. That list of hash keys then serves as our list of intersections.

For convenience, we create a fetch_csv subroutine that reads the data from both files:

## fetch injuries and fatalities
my %nycdot ;
%nycdot = fetch_csv( $injur_file, \%nycdot );
%nycdot = fetch_csv( $fatal_file, \%nycdot );

As the fetch_csv reads the data from both files, it stores the data in the same hash, using the intersection identifier as the first hash key:

if ($colnames[3] eq "Injuries") {
    $nycdot{$ch{"NODEID"}}{$year}{$month}{"Injuries"} = $ch{"Injuries"} ;
} elsif ($colnames[3] eq "Fatalities") {
    $nycdot{$ch{"NODEID"}}{$year}{$month}{"Fatalities"} = $ch{"Fatalities"} ;

The year and month serve the second and third hash keys, so the combination of the three hash keys allow us to store monthly counts for each intersection.

But when we assemble the data, we will need zeroes for the months in which an injury or fatality did not occur at that intersection. For that purpose, it's helpful to create a subroutine:

## if value not defined, then return zero, otherwise return the value
sub make_zero {
    my $val = ( ! defined $_[0] ) ? 0 : $_[0] ;
    return $val ;

When assembling the dataset, we use that subroutine to (as appropriate) either retrieve the monthly count from the hash or return a zero:

## write it all out
open( OTFILE , ">$otfile" ) || die "could not overwrite $otfile" ;
print OTFILE $otheader . "\n" ;

foreach my $node (sort {$a <=> $b} keys %nycdot) {
    foreach my $yr (2009..2017) {
        foreach my $mo (1..12) {

            ## create output array
            my @otarray = ( $node , $yr , $mo ) ;

            ## retrieve counts, make zero if not defined
            my $fatalities = make_zero( $nycdot{$node}{$yr}{$mo}{"Fatalities"} );
            my $injuries   = make_zero( $nycdot{$node}{$yr}{$mo}{"Injuries"} );

            ## add them together
            my $casualties = $fatalities + $injuries ;

            ## push them all onto the output array
            push( @otarray , $casualties , $fatalities , $injuries ) ;

            ## prepare output string
            my $otline = join( "," , @otarray ) ;

            ## print to output file
            print OTFILE $otline . "\n" ;
close OTFILE ;

More details on how I assembled the Vision Zero dataset can be found in my Perl script. For further reading, I recommend the documentation and examples that Tom Christiansen provides in his "Perl Data Structures Cookbook."

And in our discussion of patterns in text we will write a Perl script that identifies the information that we wish to store and edits that information.