Getting Started w/ Python, Spark, and Databricks: Difference between revisions
(Created page with "=== Viewing Contents of a Parquet Folder === <pre>%fs ls /mnt/training/crime-data-20016</pre>") |
(→Filter) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Loading Data == | |||
=== Viewing Contents of a Parquet Folder === | === Viewing Contents of a Parquet Folder === | ||
<pre>%fs ls /mnt/training/crime-data-20016</pre> | <pre>%fs ls /mnt/training/crime-data-20016</pre> | ||
=== Reading Parquet === | |||
<pre>df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston-2016.parquet" )</pre> | |||
== Viewing Results == | |||
=== Text Based (Generic) === | |||
<pre>show(df)</pre> | |||
=== Databricks Native Viewer === | |||
<pre>display(df)</pre> | |||
== Manipulating Dataframes == | |||
=== Select === | |||
<pre>df.select("*","firstName","last_name")</pre> | |||
=== Filter === | |||
<pre>df.select("*").filter("firstName='Brian'").filter('lastName='Popp')</pre> | |||
== Combining Dataframes == | |||
=== Union === | |||
<pre>homicidesBostonDF = homicidesNewYorkDF.union ( homicidesBostonDF )</pre> |
Latest revision as of 19:23, 2 August 2019
Loading Data
Viewing Contents of a Parquet Folder
%fs ls /mnt/training/crime-data-20016
Reading Parquet
df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston-2016.parquet" )
Viewing Results
Text Based (Generic)
show(df)
Databricks Native Viewer
display(df)
Manipulating Dataframes
Select
df.select("*","firstName","last_name")
Filter
df.select("*").filter("firstName='Brian'").filter('lastName='Popp')
Combining Dataframes
Union
homicidesBostonDF = homicidesNewYorkDF.union ( homicidesBostonDF )