Getting Started w/ Python, Spark, and Databricks: Difference between revisions

No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 7: Line 7:
=== Reading Parquet ===
=== Reading Parquet ===


<pre>df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston" )</pre>
<pre>df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston-2016.parquet" )</pre>
 
== Viewing Results ==
 
=== Text Based (Generic) ===
<pre>show(df)</pre> 
 
=== Databricks Native Viewer ===
<pre>display(df)</pre>
 
== Manipulating Dataframes ==
 
=== Select ===
<pre>df.select("*","firstName","last_name")</pre>
 
=== Filter ===
<pre>df.select("*").filter("firstName='Brian'").filter('lastName='Popp')</pre>
 
== Combining Dataframes ==
 
=== Union ===
<pre>homicidesBostonDF = homicidesNewYorkDF.union ( homicidesBostonDF )</pre>

Latest revision as of 19:23, 2 August 2019

Loading Data

Viewing Contents of a Parquet Folder

%fs ls /mnt/training/crime-data-20016

Reading Parquet

df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston-2016.parquet" )

Viewing Results

Text Based (Generic)

show(df)

Databricks Native Viewer

display(df)

Manipulating Dataframes

Select

df.select("*","firstName","last_name")

Filter

df.select("*").filter("firstName='Brian'").filter('lastName='Popp')

Combining Dataframes

Union

homicidesBostonDF = homicidesNewYorkDF.union ( homicidesBostonDF )