Date: Tue, 19 Mar 2024 01:05:14 +0000 (UTC) Message-ID: <887618155.52586.1710810314671@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_52585_864553438.1710810314671" ------=_Part_52585_864553438.1710810314671 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Building Atlas
git clone https://git-wip-us.ap= ache.org/repos/asf/atlas.git atlas
cd atlas
If you're using Atlas at version 0.8.x or below
export MAVEN_OPTS=3D"-Xmx1024m -XX:MaxPermSize=3D256m" && mvn= clean install
If you're on levels above this, including master as of 27 Oct 2017, use:=
export MAVEN_OPTS=3D"-Xmx1024m" && mvn clean install
as Atlas has now moved to Java8, and the -XX:MaxPermSize is no longer va= lid with this level
Once the build successfully completes, artifacts can be packaged for deplo=
yment.
mvn clean package -DskipTests -DskipCheck=3Dtrue
Tar can be found in atlas/distro/target/apache-atlas-${project.version}-=
bin.tar.gz
Tar is structured as follows
|- bin
|- atlas_start.py
|- atlas_stop.py
|- atlas_config.py
|- quick_start.py
|- cputil.py
|- conf
|- application.properties
|- client.properties
|- atlas-env.sh
|- log4j.xml
|- solr
|- currency.xml
|- lang
|- stopwords_en.txt
|- protowords.txt
|- schema.xml
|- solrconfig.xml
|- stopwords.txt
|- synonyms.txt
|- docs
|- server
|- webapp
|- atlas.war
|- README
|- NOTICE.txt
|- LICENSE.txt
|- DISCLAIMER.txt
|- CHANGES.txt
Installing & Running Atlas
Installin=
g Atlas
tar -xzvf apache-atlas-${project.version}-bin.tar.gz
* cd atlas-${project.version}
Configuring Atlas
By default config directory used by Atlas is {package dir}/conf. To overri=
de this set environment
variable METADATA_CONF to the path of the conf dir.
atlas-env.sh has been added to the Atlas conf. This file can be used to se=
t various environment
variables that you need for you services. In addition you can set any othe=
r environment
variables you might need. This file will be sourced by atlas scripts befor=
e any commands are
executed. The following environment variables are available to set.
# The java implementation to use. If JAVA_HOME is not found we expect java=
and jar to be in path
#export JAVA_HOME=3D
# any additional java opts you want to set. This will apply to both client=
and server operations
#export METADATA_OPTS=3D
# any additional java opts that you want to set for client only
#export METADATA_CLIENT_OPTS=3D
# java heap size we want to set for the client. Default is 1024MB
#export METADATA_CLIENT_HEAP=3D
# any additional opts you want to set for atlas service.
#export METADATA_SERVER_OPTS=3D
# java heap size we want to set for the atlas server. Default is 1024MB
#export METADATA_SERVER_HEAP=3D
# What is is considered as atlas home dir. Default is the base locaion of =
the installed software
#export METADATA_HOME_DIR=3D
# Where log files are stored. Defatult is logs directory under the base in=
stall location
#export METADATA_LOG_DIR=3D
# Where pid files are stored. Defatult is logs directory under the base in=
stall location
#export METADATA_PID_DIR=3D
# where the atlas titan db data is stored. Defatult is logs/data directory=
under the base install location
#export METADATA_DATA_DIR=3D
# Where do you want to expand the war file. By Default it is in /server/we=
bapp dir under the base install dir.
#export METADATA_EXPANDED_WEBAPP_DIR=3D
*NOTE for Mac O=
S users*
If you are using a Mac OS, you will need to configure the METADATA_SERVER_=
OPTS (explained above).
In {package dir}/conf/atlas-env.sh uncomment the following line
#export METADATA_SERVER_OPTS=3D
and change it to look as below
export METADATA_SERVER_OPTS=3D"-Djava.awt.headless=3Dtrue -Djava.security.=
krb5.realm=3D -Djava.security.krb5.kdc=3D"
Configuring ATLAS application properties
All configuration in Atlas uses java properties style configuration.
The main configuration file is application.properties which is in the&nb= sp;conf dir at the deployed location. It consists of = the following sections:
Refer link fo= r more details. The example below uses BerkeleyDBJE.
atlas.graph.storage.backend=3Dberkeleyje
atlas.graph.storage.directory=3Ddata/berkley
Basic configuration
atlas.graph.storage.backend=3Dhbase
#For standalone mode , specif= y localhost
#for distrib= uted mode, specify zookeeper quorum here - For more information refer http:= //s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2<= br>atlas.graph.storage.hostname= =3D<ZooKeeper Quorum>
Advanced configuration
Refer h= ttp://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_= hbase
This section sets up the graph db - titan - to use an search indexing sy= stem. The example configuration below setsup to use an embedded Elastic sea= rch indexing system.
atlas.graph.index.search.backend=3Delasticsearch
atlas.graph.index.search.directory=3Ddata/es
atlas.graph.index.search.elasticsearch.client-only=3Dfalse
atlas.graph.index.search.elasticsearch.local-mode=3Dtrue
atlas.graph.index.search.elasticsearch.create.sleep=3D2000
For Solr, please refer the "Configuring SOLR as the Indexing Bac= kend for the Graph Repository" section below.
The higher layer services like hive lineage, schema, etc. are driven by = the type system and this section encodes the specific types for the hive da= ta model.
# This models reflects the base super types for Data and Process
atlas.lineage.hive.table.type.name=3DDataSet atlas.lineage.hive.proces= s.type.name=3DProcess atlas.lineage.hive.process.inputs.name=3Dinputs atlas= .lineage.hive.process.outputs.name=3Doutputs ## Schema atlas.lineage.= hive.table.schema.query=3Dhive_table where name=3D?, columns
The following property is used to toggle the SSL feature.
atlas.enableTLS=3Dfalse
Configuring SOLR as the Indexing Backend for the Graph Repos=
itory
By default, Atlas uses Titan as the graph repository and is the only graph=
repository implementation available currently.
For configuring Titan to work with Solr, please follow the instructions be=
low
* Install solr if not already running. Versions of SOLR supported are 4.8.=
1 or 5.2.1.
* Start solr in cloud mode.
SolrCloud mode uses a ZooKeeper Service as a highly available, cent=
ral location for cluster management.
For a small cluster, running with an existing ZooKeeper quorum shou=
ld be fine. For larger clusters, you would want to run separate multiple Zo=
oKeeper quorum with atleast 3 servers.
Note: Atlas currently supports solr in "cloud" mode only. "http" mo=
de is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud
<=
br>* Run the following commands from SOLR_HOME directory to create =
collections in Solr corresponding to the indexes that Atlas uses
bin/solr create -c vertex_index -d ATLAS_HOME/conf/solr -shards #nu=
mShards -replicationFactor #replicationFactor
bin/solr create -c edge_index -d ATLAS_HOME/conf/solr -shards #numS=
hards -replicationFactor #replicationFactor
bin/solr create -c fulltext_index -d ATLAS_HOME/conf/solr -shards #=
numShards -replicationFactor #replicationFactor
Note: If numShards and replicationFactor are not specified, they de=
fault to 1 which suffices if you are trying out solr with ATLAS on a single=
node instance.
Otherwise specify numShards according to the number of hosts that a=
re in the Solr cluster and the maxShardsPerNode configuration.
The number of shards cannot exceed the total number of Solr nodes i=
n your SolrCloud cluster
* Change ATLAS configuration to poi=
nt to the Solr instance setup.
Please make sure the following configurations are set to the below value= s in ATLAS_HOME//conf/application.properties
atlas.graph.index.search.backend=3D<'solr' for solr 4.8.1>/<=
;'solr5' for solr 5.2.1>
atlas.graph.index.search.solr.mode=3Dcloud
atlas.graph.index.search.solr.zookeeper-url=3D<the ZK quorum setu=
p for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
For more information on Titan solr configuration , please refer &nb= sp;http://s3.thinkaurelius.com/docs/titan/= 0.5.4/solr.html
Starting Atlas Server
bin/atlas_start.py [-port <port>]
By default,
* To change the port, use -port option.
* atlas server starts with conf from {package dir}/conf. To override this =
(to use the same conf
with multiple atlas upgrades), set environment variable METADATA_CONF to t=
he path of conf dir
Stopping Atlas Server
bin/atlas_stop.py
Using Atlas
* Verify if the server is up and running
curl -v http://localhost:21000/api/atlas/a=
dmin/version
{"Version":"v0.1"}
* List the types in the repository
curl -v http://localhost:21000/api/atlas/types=
{"results":["Process","Infrastructure","DataSet"],"count":3,"reques=
tId":"1867493731@qtp-262860041-0 - 82d43a27-7c34-4573-85d1-a01525705091"}
* List the instances for a given type
curl -v http://localhost:2100=
0/api/atlas/entities?type=3Dhive_table
{"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb=
-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-4=
5a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}
curl -v http://localhost:21000/api=
/atlas/entities/list/hive_db
* Search for entities (instances) in the repository
curl -v http://localhost:21000/api/atlas/discov=
ery/search/dsl?query=3D"from hive_table"
Dashboard=
strong>
Once atlas is started, you can view the status of atlas entities using the=
Web-based
dashboard. \You can open your browser at the corresponding port to use the=
web UI.