An analysis of Chicago crime data using Apache Spark. The data can be pulled from the Chicago Crime Data Link. A link to code for analysis is present with every result(Please refer the uncommented code)
- Scala
- Spark Core APIs
There were 594681 cases with no mention of community numbers so they are excluded from the analysis
Community Code | Community Name | Crime Numbers |
---|---|---|
25 | Austin | 368568 |
43 | South Shore | 186096 |
23 | Humboldt Park | 181174 |
67 | West Englewood | 169988 |
24 | West Town | 168683 |
28 | Near West Side | 164947 |
29 | North Lawndale | 163847 |
71 | Auburn Gresham | 161269 |
68 | Englewood | 152333 |
49 | Roseland | 149107 |
Community Code | Community Name | Crime Numbers |
---|---|---|
47 | Burnside | 8525 |
12 | Forest Glen | 10414 |
74 | Mount Greenwood | 12113 |
55 | Hegewisch | 12433 |
36 | Oakland | 12672 |
18 | Montclare | 13842 |
13 | North Park | 18322 |
57 | Archer Heights | 19661 |
37 | Fuller Park | 19693 |
72 | Beverly | 20673 |
Crime Type | Percentage (%) |
---|---|
THEFT | 20.91 |
BATTERY | 18.25 |
CRIMINAL DAMAGE | 11.47 |
Month Number | Number of cases | Percentage (%) |
---|---|---|
7 | 605183 | 9.3 |
8 | 598999 | 9.2 |
5 | 578254 | 8.88 |
Month Number | Number of cases | Percentage (%) |
---|---|---|
2 | 446463 | 6.86 |
12 | 478061 | 7.35 |
1 | 507500 | 7.8 |
Seeing the above results, we see a pattern that criminals have preferred warmer months to colder months such as Dec for their activities
Time | Number of cases | Percentage(%) |
---|---|---|
08 PM | 371513 | 5.71 |
07 PM | 370697 | 5.7 |
12 PM | 369403 | 5.68 |
09 PM | 363716 | 5.59 |
12 AM | 361560 | 5.56 |
Time | Number of cases | Percentage(%) |
---|---|---|
05 AM | 86597 | 1.33 |
06 AM | 102302 | 1.57 |
04 AM | 104799 | 1.61 |
03 AM | 139686 | 2.15 |
07 AM | 147395 | 2.26 |
So according to the data, while your morning walk will be pleasant and safe, you need to be careful when you leave your office in the evening or return back home after a drink at the local bar
Crime Type | Number of cases | Percentage(%) |
---|---|---|
THEFT | 5237 | 35.03 |
OTHER OFFENSE | 2568 | 17.17 |
CRIMINAL TRESPASS | 1811 | 12.11 |
DECEPTIVE PRACTICE | 1386 | 9.27 |
BATTERY | 931 | 6.23 |
NARCOTICS | 853 | 5.7 |
CRIMINAL DAMAGE | 812 | 5.43 |
MOTOR VEHICLE THEFT | 455 | 3.04 |
Crime Type | Number of cases | Percentage(%) |
---|---|---|
CRIMINAL DAMAGE | 1 | 100.0 |
Arrest Done | Number of cases | Percentage(%) |
---|---|---|
No | 4519824 | 71.9 |
Yes | 1766253 | 28.1 |
Going by the numbers, we see that arrest was done on a very small percentage of criminal cases
Open an issue for more questions which you think can be answered through this data and raise a PR if you find any bugs/improvements in the code